Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling-Reference-Cited by-同舟云学术

Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling

Published:2024-07-13 Issue: Volume: Page:1-11
ISSN:
Container-title:Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers
language:
Short-container-title:

Author:

Shi Xiaoyu¹^ORCID,Huang Zhaoyang¹^ORCID,Wang Fu-Yun¹^ORCID,Bian Weikang¹^ORCID,Li Dasong¹^ORCID,Zhang Yi²^ORCID,Zhang Manyuan¹^ORCID,Cheung Ka Chun³^ORCID,See Simon³^ORCID,Qin Hongwei²^ORCID,Dai Jifeng⁴^ORCID,Li Hongsheng¹^ORCID

Affiliation:

1. The Chinese University of Hong Kong, Hong Kong

2. SenseTime, Hong Kong

3. NVIDIA, Hong Kong

4. Tsinghua University, China, China

Publisher

ACM

Link

https://dl.acm.org/doi/pdf/10.1145/3641519.3657497

Reference82 articles.

1. Jianhong Bai, Tianyu He, Yuchi Wang, Junliang Guo, Haoji Hu, Zuozhu Liu, and Jiang Bian. 2024. UniEdit: A Unified Tuning-Free Framework for Video Motion and Appearance Editing. arXiv preprint arXiv:2402.13185 (2024).

2. Max Bain, Arsha Nagrani, Gül Varol, and Andrew Zisserman. 2021. Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval. In IEEE International Conference on Computer Vision.

3. Michael J Black and Padmanabhan Anandan. 1993. A framework for the robust estimation of optical flow. In 1993 (4th) International Conference on Computer Vision. IEEE, 231–236.

4. Andreas Blattmann, Tim Dockhorn, Sumith Kulal, Daniel Mendelevitch, Maciej Kilian, Dominik Lorenz, Yam Levi, Zion English, Vikram Voleti, Adam Letts, 2023a. Stable video diffusion: Scaling latent video diffusion models to large datasets. arXiv preprint arXiv:2311.15127 (2023).

5. Understanding Object Dynamics for Interactive Image-to-Video Synthesis

Cited by 16 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. LLM Integration in Extended Reality: A Comprehensive Review of Current Trends, Challenges, and Future Perspectives;Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems;2025-04-25

2. LaMD: Latent Motion Diffusion for Image-Conditional Video Generation;International Journal of Computer Vision;2025-03-03

3. Neural-Network-Enhanced Metalens Camera for High-Definition, Dynamic Imaging in the Long-Wave Infrared Spectrum;ACS Photonics;2025-01-03

4. Adaptive Multi-modal Control of Digital Human Hand Synthesis Using a Region-Aware Cycle Loss;Lecture Notes in Computer Science;2025

5. MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model;Lecture Notes in Computer Science;2024-12-06