-
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper • 2505.24726 • Published • 274 -
Reinforcement Pre-Training
Paper • 2506.08007 • Published • 262 -
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
Paper • 2507.01006 • Published • 237 -
A Survey of Context Engineering for Large Language Models
Paper • 2507.13334 • Published • 258
Collections
Discover the best community collections!
Collections including paper arXiv:2508.04026
-
UI-AGILE: Advancing GUI Agents with Effective Reinforcement Learning and Precise Inference-Time Grounding
Paper • 2507.22025 • Published • 4 -
InfiGUI-G1: Advancing GUI Grounding with Adaptive Exploration Policy Optimization
Paper • 2508.05731 • Published • 25 -
VeriGUI: Verifiable Long-Chain GUI Dataset
Paper • 2508.04026 • Published • 158 -
SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience
Paper • 2508.04700 • Published • 52
-
Reinforcement Pre-Training
Paper • 2506.08007 • Published • 262 -
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models
Paper • 2506.06395 • Published • 131 -
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models
Paper • 2506.05176 • Published • 74 -
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper • 2505.24726 • Published • 274
-
MS MARCO Web Search: a Large-scale Information-rich Web Dataset with Millions of Real Click Labels
Paper • 2405.07526 • Published • 21 -
Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach
Paper • 2405.15613 • Published • 17 -
A Touch, Vision, and Language Dataset for Multimodal Alignment
Paper • 2402.13232 • Published • 16 -
How Do Large Language Models Acquire Factual Knowledge During Pretraining?
Paper • 2406.11813 • Published • 31
-
MobileGUI-RL: Advancing Mobile GUI Agent through Reinforcement Learning in Online Environment
Paper • 2507.05720 • Published • 2 -
GUI-G^2: Gaussian Reward Modeling for GUI Grounding
Paper • 2507.15846 • Published • 132 -
VeriGUI: Verifiable Long-Chain GUI Dataset
Paper • 2508.04026 • Published • 158 -
UI-Venus Technical Report: Building High-performance UI Agents with RFT
Paper • 2508.10833 • Published • 43
-
DeepDistill: Enhancing LLM Reasoning Capabilities via Large-Scale Difficulty-Graded Data Training
Paper • 2504.17565 • Published • 2 -
AI-MO/NuminaMath-1.5
Viewer • Updated • 896k • 3.65k • 161 -
PrimeIntellect/synthetic-code-understanding
Viewer • Updated • 60.6k • 192 • 17 -
Go to Zero: Towards Zero-shot Motion Generation with Million-scale Data
Paper • 2507.07095 • Published • 54
-
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper • 2505.24726 • Published • 274 -
Reinforcement Pre-Training
Paper • 2506.08007 • Published • 262 -
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
Paper • 2507.01006 • Published • 237 -
A Survey of Context Engineering for Large Language Models
Paper • 2507.13334 • Published • 258
-
MobileGUI-RL: Advancing Mobile GUI Agent through Reinforcement Learning in Online Environment
Paper • 2507.05720 • Published • 2 -
GUI-G^2: Gaussian Reward Modeling for GUI Grounding
Paper • 2507.15846 • Published • 132 -
VeriGUI: Verifiable Long-Chain GUI Dataset
Paper • 2508.04026 • Published • 158 -
UI-Venus Technical Report: Building High-performance UI Agents with RFT
Paper • 2508.10833 • Published • 43
-
UI-AGILE: Advancing GUI Agents with Effective Reinforcement Learning and Precise Inference-Time Grounding
Paper • 2507.22025 • Published • 4 -
InfiGUI-G1: Advancing GUI Grounding with Adaptive Exploration Policy Optimization
Paper • 2508.05731 • Published • 25 -
VeriGUI: Verifiable Long-Chain GUI Dataset
Paper • 2508.04026 • Published • 158 -
SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience
Paper • 2508.04700 • Published • 52
-
Reinforcement Pre-Training
Paper • 2506.08007 • Published • 262 -
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models
Paper • 2506.06395 • Published • 131 -
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models
Paper • 2506.05176 • Published • 74 -
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper • 2505.24726 • Published • 274
-
DeepDistill: Enhancing LLM Reasoning Capabilities via Large-Scale Difficulty-Graded Data Training
Paper • 2504.17565 • Published • 2 -
AI-MO/NuminaMath-1.5
Viewer • Updated • 896k • 3.65k • 161 -
PrimeIntellect/synthetic-code-understanding
Viewer • Updated • 60.6k • 192 • 17 -
Go to Zero: Towards Zero-shot Motion Generation with Million-scale Data
Paper • 2507.07095 • Published • 54
-
MS MARCO Web Search: a Large-scale Information-rich Web Dataset with Millions of Real Click Labels
Paper • 2405.07526 • Published • 21 -
Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach
Paper • 2405.15613 • Published • 17 -
A Touch, Vision, and Language Dataset for Multimodal Alignment
Paper • 2402.13232 • Published • 16 -
How Do Large Language Models Acquire Factual Knowledge During Pretraining?
Paper • 2406.11813 • Published • 31