Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arXiv:2508.04026

Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning

Paper • 2505.24726 • Published May 30 • 274
Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9 • 262
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Paper • 2507.01006 • Published Jul 1 • 237
A Survey of Context Engineering for Large Language Models

Paper • 2507.13334 • Published Jul 17 • 258

UI-AGILE: Advancing GUI Agents with Effective Reinforcement Learning and Precise Inference-Time Grounding

Paper • 2507.22025 • Published Jul 29 • 4
InfiGUI-G1: Advancing GUI Grounding with Adaptive Exploration Policy Optimization

Paper • 2508.05731 • Published Aug 7 • 25
VeriGUI: Verifiable Long-Chain GUI Dataset

Paper • 2508.04026 • Published Aug 6 • 158
SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience

Paper • 2508.04700 • Published Aug 6 • 52

Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9 • 262
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models

Paper • 2506.06395 • Published Jun 5 • 131
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

Paper • 2506.05176 • Published Jun 5 • 74
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning

Paper • 2505.24726 • Published May 30 • 274

MS MARCO Web Search: a Large-scale Information-rich Web Dataset with Millions of Real Click Labels

Paper • 2405.07526 • Published May 13, 2024 • 21
Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach

Paper • 2405.15613 • Published May 24, 2024 • 17
A Touch, Vision, and Language Dataset for Multimodal Alignment

Paper • 2402.13232 • Published Feb 20, 2024 • 16
How Do Large Language Models Acquire Factual Knowledge During Pretraining?

Paper • 2406.11813 • Published Jun 17, 2024 • 31

MobileGUI-RL: Advancing Mobile GUI Agent through Reinforcement Learning in Online Environment

Paper • 2507.05720 • Published Jul 8 • 2
GUI-G^2: Gaussian Reward Modeling for GUI Grounding

Paper • 2507.15846 • Published Jul 21 • 132
VeriGUI: Verifiable Long-Chain GUI Dataset

Paper • 2508.04026 • Published Aug 6 • 158
UI-Venus Technical Report: Building High-performance UI Agents with RFT

Paper • 2508.10833 • Published Aug 14 • 43

VeriGUI: Verifiable Long-Chain GUI Dataset

Paper • 2508.04026 • Published Aug 6 • 158

DeepDistill: Enhancing LLM Reasoning Capabilities via Large-Scale Difficulty-Graded Data Training

Paper • 2504.17565 • Published Apr 24 • 2
AI-MO/NuminaMath-1.5

Viewer • Updated Feb 10 • 896k • 3.65k • 161
PrimeIntellect/synthetic-code-understanding

Viewer • Updated Feb 15 • 60.6k • 192 • 17
Go to Zero: Towards Zero-shot Motion Generation with Million-scale Data

Paper • 2507.07095 • Published Jul 9 • 54

Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning

Paper • 2505.24726 • Published May 30 • 274
Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9 • 262
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Paper • 2507.01006 • Published Jul 1 • 237
A Survey of Context Engineering for Large Language Models

Paper • 2507.13334 • Published Jul 17 • 258

MobileGUI-RL: Advancing Mobile GUI Agent through Reinforcement Learning in Online Environment

Paper • 2507.05720 • Published Jul 8 • 2
GUI-G^2: Gaussian Reward Modeling for GUI Grounding

Paper • 2507.15846 • Published Jul 21 • 132
VeriGUI: Verifiable Long-Chain GUI Dataset

Paper • 2508.04026 • Published Aug 6 • 158
UI-Venus Technical Report: Building High-performance UI Agents with RFT

Paper • 2508.10833 • Published Aug 14 • 43

UI-AGILE: Advancing GUI Agents with Effective Reinforcement Learning and Precise Inference-Time Grounding

Paper • 2507.22025 • Published Jul 29 • 4
InfiGUI-G1: Advancing GUI Grounding with Adaptive Exploration Policy Optimization

Paper • 2508.05731 • Published Aug 7 • 25
VeriGUI: Verifiable Long-Chain GUI Dataset

Paper • 2508.04026 • Published Aug 6 • 158
SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience

Paper • 2508.04700 • Published Aug 6 • 52

VeriGUI: Verifiable Long-Chain GUI Dataset

Paper • 2508.04026 • Published Aug 6 • 158

Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9 • 262
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models

Paper • 2506.06395 • Published Jun 5 • 131
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

Paper • 2506.05176 • Published Jun 5 • 74
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning

Paper • 2505.24726 • Published May 30 • 274

DeepDistill: Enhancing LLM Reasoning Capabilities via Large-Scale Difficulty-Graded Data Training

Paper • 2504.17565 • Published Apr 24 • 2
AI-MO/NuminaMath-1.5

Viewer • Updated Feb 10 • 896k • 3.65k • 161
PrimeIntellect/synthetic-code-understanding

Viewer • Updated Feb 15 • 60.6k • 192 • 17
Go to Zero: Towards Zero-shot Motion Generation with Million-scale Data

Paper • 2507.07095 • Published Jul 9 • 54

MS MARCO Web Search: a Large-scale Information-rich Web Dataset with Millions of Real Click Labels

Paper • 2405.07526 • Published May 13, 2024 • 21
Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach

Paper • 2405.15613 • Published May 24, 2024 • 17
A Touch, Vision, and Language Dataset for Multimodal Alignment

Paper • 2402.13232 • Published Feb 20, 2024 • 16
How Do Large Language Models Acquire Factual Knowledge During Pretraining?

Paper • 2406.11813 • Published Jun 17, 2024 • 31

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs