From Denoising to Refining: A Corrective Framework for Vision-Language Diffusion Model Paper • 2510.19871 • Published 20 days ago • 29
INT v.s. FP: A Comprehensive Study of Fine-Grained Low-bit Quantization Formats Paper • 2510.25602 • Published 12 days ago • 65
Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model Paper • 2502.10248 • Published Feb 14 • 55
Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation Paper • 2501.12202 • Published Jan 21 • 47
Video Depth Anything: Consistent Depth Estimation for Super-Long Videos Paper • 2501.12375 • Published Jan 21 • 22
MangaNinja: Line Art Colorization with Precise Reference Following Paper • 2501.08332 • Published Jan 14 • 60
LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis Paper • 2412.15214 • Published Dec 19, 2024 • 15
MagicQuill: An Intelligent Interactive Image Editing System Paper • 2411.09703 • Published Nov 14, 2024 • 79
PrefixQuant: Static Quantization Beats Dynamic through Prefixed Outliers in LLMs Paper • 2410.05265 • Published Oct 7, 2024 • 33
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model Paper • 2408.11039 • Published Aug 20, 2024 • 63
InFusion: Inpainting 3D Gaussians via Learning Depth Completion from Diffusion Prior Paper • 2404.11613 • Published Apr 17, 2024 • 11
Instruct-Imagen: Image Generation with Multi-modal Instruction Paper • 2401.01952 • Published Jan 3, 2024 • 32
Moonshot: Towards Controllable Video Generation and Editing with Multimodal Conditions Paper • 2401.01827 • Published Jan 3, 2024 • 18
Breathing New Life into 3D Assets with Generative Repainting Paper • 2309.08523 • Published Sep 15, 2023 • 5