EmbRACE-3K: Embodied Reasoning and Action in Complex Environments Paper • 2507.10548 • Published Jul 14 • 36
Autoregressive Adversarial Post-Training for Real-Time Interactive Video Generation Paper • 2506.09350 • Published Jun 11 • 48
VividFace: A Diffusion-Based Hybrid Framework for High-Fidelity Video Face Swapping Paper • 2412.11279 • Published Dec 15, 2024 • 13
Causal Diffusion Transformers for Generative Modeling Paper • 2412.12095 • Published Dec 16, 2024 • 23
EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM Paper • 2412.09618 • Published Dec 12, 2024 • 21
Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models Paper • 2406.11831 • Published Jun 17, 2024 • 22
VisCoT Collection Visual CoT: Unleashing Chain-of-Thought Reasoning in the Multi-Modal Language Model • 5 items • Updated Jun 13, 2024 • 5