LaSeR: Reinforcement Learning with Last-Token Self-Rewarding Paper • 2510.14943 • Published Oct 16 • 38
Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR Paper • 2508.14029 • Published Aug 19 • 118
STAR-R1: Spatial TrAnsformation Reasoning by Reinforcing Multimodal LLMs Paper • 2505.15804 • Published May 21 • 10