AlphaQuanter: An End-to-End Tool-Orchestrated Agentic Reinforcement Learning Framework for Stock Trading Paper • 2510.14264 • Published Oct 16 • 9
Think-RM: Enabling Long-Horizon Reasoning in Generative Reward Models Paper • 2505.16265 • Published May 22 • 8
Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space Paper • 2505.15778 • Published May 21 • 18