LRQ-DiT: Log-Rotation Post-Training Quantization of Diffusion Transformers for Image and Video Generation Paper • 2508.03485 • Published Aug 5 • 2
From Denoising to Refining: A Corrective Framework for Vision-Language Diffusion Model Paper • 2510.19871 • Published 18 days ago • 29
Video-As-Prompt Collection The model zoo for "Video-As-Prompt: Unified Semantic Control for Video Generation" • 3 items • Updated 14 days ago • 11
Video-As-Prompt: Unified Semantic Control for Video Generation Paper • 2510.20888 • Published 17 days ago • 44
HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning Paper • 2509.08519 • Published Sep 10 • 127
AudioStory: Generating Long-Form Narrative Audio with Large Language Models Paper • 2508.20088 • Published Aug 27 • 20
Quantization Meets dLLMs: A Systematic Study of Post-training Quantization for Diffusion LLMs Paper • 2508.14896 • Published Aug 20 • 22
ToonComposer: Streamlining Cartoon Production with Generative Post-Keyframing Paper • 2508.10881 • Published Aug 14 • 52
ARC-Hunyuan-Video-7B: Structured Video Comprehension of Real-World Shorts Paper • 2507.20939 • Published Jul 28 • 56
GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning Paper • 2506.16141 • Published Jun 19 • 27
PAROAttention: Pattern-Aware ReOrdering for Efficient Sparse and Quantized Attention in Visual Generation Models Paper • 2506.16054 • Published Jun 19 • 60
SRPO: Enhancing Multimodal LLM Reasoning via Reflection-Aware Reinforcement Learning Paper • 2506.01713 • Published Jun 2 • 48
Through the Valley: Path to Effective Long CoT Training for Small Language Models Paper • 2506.07712 • Published Jun 9 • 18
TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation Paper • 2505.05422 • Published May 8 • 8