PuzzleClone: An SMT-Powered Framework for Synthesizing Verifiable Data Paper • 2508.15180 • Published Aug 21 • 1
BizFinBench: A Business-Driven Real-World Financial Benchmark for Evaluating LLMs Paper • 2505.19457 • Published May 26 • 62
MME-Finance: A Multimodal Finance Benchmark for Expert-level Understanding and Reasoning Paper • 2411.03314 • Published Nov 5, 2024 • 1
MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning Paper • 2502.19634 • Published Feb 26 • 63