ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs Paper • 2506.18896 • Published Jun 23 • 29
SlimMoE: Structured Compression of Large MoE Models via Expert Slimming and Distillation Paper • 2506.18349 • Published Jun 23 • 13
LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning Paper • 2506.18841 • Published Jun 23 • 56