The Path Not Taken: RLVR Provably Learns Off the Principals Paper • 2511.08567 • Published 7 days ago • 25
AEPO Collection The official datasets and model checkpoints of AEPO • 4 items • Updated 29 days ago • 3
CE-GPPO: Controlling Entropy via Gradient-Preserving Clipping Policy Optimization in Reinforcement Learning Paper • 2509.20712 • Published Sep 25 • 19
Finedeep: Mitigating Sparse Activation in Dense LLMs via Multi-Layer Fine-Grained Experts Paper • 2502.12928 • Published Feb 18 • 1
Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization Paper • 2508.07629 • Published Aug 11 • 41
CartesianMoE: Boosting Knowledge Sharing among Experts via Cartesian Product Routing in Mixture-of-Experts Paper • 2410.16077 • Published Oct 21, 2024 • 1
MaskMoE: Boosting Token-Level Learning via Routing Mask in Mixture-of-Experts Paper • 2407.09816 • Published Jul 13, 2024 • 1