Black-Box On-Policy Distillation of Large Language Models Paper • 2511.10643 • Published 6 days ago • 39
Efficient Attention Mechanisms for Large Language Models: A Survey Paper • 2507.19595 • Published Jul 25 • 6
Decoder-Hybrid-Decoder Architecture for Efficient Reasoning with Long Generation Paper • 2507.06607 • Published Jul 9 • 10
Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation Paper • 2506.09991 • Published Jun 11 • 55
Multimodal Latent Language Modeling with Next-Token Diffusion Paper • 2412.08635 • Published Dec 11, 2024 • 48