ARISE: An Adaptive Resolution-Aware Metric for Test-Time Scaling Evaluation in Large Reasoning Models Paper • 2510.06014 • Published Oct 7 • 9
OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows Paper • 2510.24411 • Published 12 days ago • 70
JanusCoder: Towards a Foundational Visual-Programmatic Interface for Code Intelligence Paper • 2510.23538 • Published 12 days ago • 95
ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data Paper • 2509.15221 • Published Sep 18 • 109
OS-MAP: How Far Can Computer-Using Agents Go in Breadth and Depth? Paper • 2507.19132 • Published Jul 25
CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning Paper • 2508.20096 • Published Aug 27 • 36
CodeEvo: Interaction-Driven Synthesis of Code-centric Data through Hybrid and Iterative Feedback Paper • 2507.22080 • Published Jul 25 • 9
ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows Paper • 2505.19897 • Published May 26 • 104
Automated Peer Reviewing in Paper SEA: Standardization, Evaluation, and Analysis Paper • 2407.12857 • Published Jul 9, 2024 • 1
Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning Paper • 2504.08672 • Published Apr 11 • 55
Breaking the Data Barrier -- Building GUI Agents Through Task Generalization Paper • 2504.10127 • Published Apr 14 • 17
CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era Paper • 2503.12329 • Published Mar 16 • 27
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis Paper • 2412.19723 • Published Dec 27, 2024 • 87
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents Paper • 2410.23218 • Published Oct 30, 2024 • 49
AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer Assistant Paper • 2410.18603 • Published Oct 24, 2024 • 32
TransCoder: Towards Unified Transferable Code Representation Learning Inspired by Human Skills Paper • 2306.07285 • Published May 23, 2023 • 2
Aggregation of Reasoning: A Hierarchical Framework for Enhancing Answer Selection in Large Language Models Paper • 2405.12939 • Published May 21, 2024 • 1
Interactive Evolution: A Neural-Symbolic Self-Training Framework For Large Language Models Paper • 2406.11736 • Published Jun 17, 2024 • 6
KS-Lottery: Finding Certified Lottery Tickets for Multilingual Language Models Paper • 2402.02801 • Published Feb 5, 2024 • 1