First Frame Is the Place to Go for Video Content Customization Paper • 2511.15700 • Published 6 days ago • 50
A Style is Worth One Code: Unlocking Code-to-Style Image Generation with Discrete Style Space Paper • 2511.10555 • Published 12 days ago • 56
MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation Paper • 2511.09611 • Published 13 days ago • 64
One Small Step in Latent, One Giant Leap for Pixels: Fast Latent Upscale Adapter for Your Diffusion Models Paper • 2511.10629 • Published 12 days ago • 111
Depth Anything 3: Recovering the Visual Space from Any Views Paper • 2511.10647 • Published 12 days ago • 84
Time-to-Move: Training-Free Motion Controlled Video Generation via Dual-Clock Denoising Paper • 2511.08633 • Published 16 days ago • 52
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B Paper • 2511.06221 • Published 16 days ago • 117
InstantDrag: Improving Interactivity in Drag-based Image Editing Paper • 2409.08857 • Published Sep 13, 2024 • 35
MotionStream: Real-Time Video Generation with Interactive Motion Controls Paper • 2511.01266 • Published 22 days ago • 26
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model Paper • 2510.14528 • Published Oct 16 • 95
Agent Lightning: Train ANY AI Agents with Reinforcement Learning Paper • 2508.03680 • Published Aug 5 • 120