Part II: ROLL Flash -- Accelerating RLVR and Agentic Training with Asynchrony Paper • 2510.11345 • Published Oct 13 • 15
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines Paper • 2502.14739 • Published Feb 20 • 104
ProgCo: Program Helps Self-Correction of Large Language Models Paper • 2501.01264 • Published Jan 2 • 26
MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models Paper • 2410.11710 • Published Oct 15, 2024 • 20