The Strong Lottery Ticket Hypothesis for Multi-Head Attention Mechanisms Paper • 2511.04217 • Published 3 days ago • 9
LiveTradeBench: Seeking Real-World Alpha with Large Language Models Paper • 2511.03628 • Published 3 days ago • 9
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm Paper • 2511.04570 • Published 2 days ago • 145
VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation Paper • 2511.02778 • Published 4 days ago • 95
view article Article The 1 Billion Token Challenge: Finding the Perfect Pre-training Mix By codelion • 6 days ago • 30
The End of Manual Decoding: Towards Truly End-to-End Language Models Paper • 2510.26697 • Published 9 days ago • 114
Kimi Linear: An Expressive, Efficient Attention Architecture Paper • 2510.26692 • Published 9 days ago • 99
Video-Thinker: Sparking "Thinking with Videos" via Reinforcement Learning Paper • 2510.23473 • Published 12 days ago • 83
Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning Paper • 2510.25992 • Published 10 days ago • 40
Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations Paper • 2510.23607 • Published 12 days ago • 172
Reasoning with Sampling: Your Base Model is Smarter Than You Think Paper • 2510.14901 • Published 23 days ago • 45
Every Attention Matters: An Efficient Hybrid Architecture for Long-Context Reasoning Paper • 2510.19338 • Published 18 days ago • 110
ReCode: Unify Plan and Action for Universal Granularity Control Paper • 2510.23564 • Published 12 days ago • 118