Efficient Long-context Language Model Training by Core Attention Disaggregation Paper • 2510.18121 • Published about 1 month ago • 118
NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale Paper • 2508.10711 • Published Aug 14 • 142
Running on Zero 4 D2F LLaDA Instruct 8B 👁 4 Diffusion LLMs Can Do Faster-Than-AR Inference via Discret
Diffusion LLMs Can Do Faster-Than-AR Inference via Discrete Diffusion Forcing Paper • 2508.09192 • Published Aug 8 • 30
Running on Zero 4 D2F LLaDA Instruct 8B 👁 4 Diffusion LLMs Can Do Faster-Than-AR Inference via Discret
LoHoVLA: A Unified Vision-Language-Action Model for Long-Horizon Embodied Tasks Paper • 2506.00411 • Published May 31 • 31
Which Data Attributes Stimulate Math and Code Reasoning? An Investigation via Influence Functions Paper • 2505.19949 • Published May 26 • 16
Done Is Better than Perfect: Unlocking Efficient Reasoning by Structured Multi-Turn Decomposition Paper • 2505.19788 • Published May 26 • 13
MatryoshkaKV: Adaptive KV Compression via Trainable Orthogonal Projection Paper • 2410.14731 • Published Oct 16, 2024
Improved Visual-Spatial Reasoning via R1-Zero-Like Training Paper • 2504.00883 • Published Apr 1 • 66
Improved Visual-Spatial Reasoning via R1-Zero-Like Training Paper • 2504.00883 • Published Apr 1 • 66 • 3
Improved Visual-Spatial Reasoning via R1-Zero-Like Training Paper • 2504.00883 • Published Apr 1 • 66
Improved Visual-Spatial Reasoning via R1-Zero-Like Training Paper • 2504.00883 • Published Apr 1 • 66 • 3
The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding Paper • 2502.08946 • Published Feb 13 • 193
Learning Neural Eigenfunctions for Unsupervised Semantic Segmentation Paper • 2304.02841 • Published Apr 6, 2023