LaSeR Collection Models from the paper "LaSeR: Reinforcement Learning with Last-Token Self-Rewarding" • 5 items • Updated 24 days ago • 1