Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data Paper • 2511.12609 • Published 3 days ago • 90
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm Paper • 2511.04570 • Published 13 days ago • 191
Too Good to be Bad: On the Failure of LLMs to Role-Play Villains Paper • 2511.04962 • Published 12 days ago • 50
FlashWorld: High-quality 3D Scene Generation within Seconds Paper • 2510.13678 • Published Oct 15 • 70
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain Paper • 2509.26507 • Published Sep 30 • 531
Set Block Decoding is a Language Model Inference Accelerator Paper • 2509.04185 • Published Sep 4 • 52
HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels Paper • 2507.21809 • Published Jul 29 • 132
Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal Document Understanding Paper • 2506.16035 • Published Jun 19 • 88
Stream-Omni: Simultaneous Multimodal Interactions with Large Language-Vision-Speech Model Paper • 2506.13642 • Published Jun 16 • 26
Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers Paper • 2505.21497 • Published May 27 • 109
VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents Paper • 2410.10594 • Published Oct 14, 2024 • 29
Skywork-VL Reward: An Effective Reward Model for Multimodal Understanding and Reasoning Paper • 2505.07263 • Published May 12 • 30
WebGen-Bench: Evaluating LLMs on Generating Interactive and Functional Websites from Scratch Paper • 2505.03733 • Published May 6 • 17
Learning Dynamics in Continual Pre-Training for Large Language Models Paper • 2505.07796 • Published May 12 • 19
MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining Paper • 2505.07608 • Published May 12 • 82