ACG: Action Coherence Guidance for Flow-based VLA models Paper • 2510.22201 • Published 15 days ago • 36
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning Paper • 2508.08221 • Published Aug 11 • 48
view article Article DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge By NormalUhr • Feb 7 • 247
MM-Eval: A Multilingual Meta-Evaluation Benchmark for LLM-as-a-Judge and Reward Models Paper • 2410.17578 • Published Oct 23, 2024 • 1
LLM-as-an-Interviewer: Beyond Static Testing Through Dynamic LLM Evaluation Paper • 2412.10424 • Published Dec 10, 2024 • 2
Running 3.45k 3.45k The Ultra-Scale Playbook 🌌 The ultimate guide to training LLM on large GPU Clusters
EXAONE-3.5 Collection EXAONE 3.5 language model series including instruction-tuned models of 2.4B, 7.8B, and 32B • 11 items • Updated Jul 7 • 119
Evaluating Language Models as Synthetic Data Generators Paper • 2412.03679 • Published Dec 4, 2024 • 48