-
AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning
Paper • 2402.15506 • Published • 18 -
AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent
Paper • 2404.03648 • Published • 30 -
Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts
Paper • 2405.19893 • Published • 33 -
Parrot: Efficient Serving of LLM-based Applications with Semantic Variable
Paper • 2405.19888 • Published • 7
Collections
Discover the best community collections!
Collections including paper arXiv:2412.04454
-
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction
Paper • 2412.04454 • Published • 71 -
GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents
Paper • 2506.03143 • Published • 53 -
Enhancing Visual Grounding for GUI Agents via Self-Evolutionary Reinforcement Learning
Paper • 2505.12370 • Published -
UIShift: Enhancing VLM-based GUI Agents through Self-supervised Reinforcement Learning
Paper • 2505.12493 • Published
-
ShowUI: One Vision-Language-Action Model for GUI Visual Agent
Paper • 2411.17465 • Published • 90 -
OmniParser for Pure Vision Based GUI Agent
Paper • 2408.00203 • Published • 25 -
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction
Paper • 2412.04454 • Published • 71 -
zai-org/cogagent-9b-20241220
Image-Text-to-Text • 14B • Updated • 871 • 53
-
ShowUI: One Vision-Language-Action Model for GUI Visual Agent
Paper • 2411.17465 • Published • 90 -
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction
Paper • 2412.04454 • Published • 71 -
AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials
Paper • 2412.09605 • Published • 30
-
LNS-Madam: Low-Precision Training in Logarithmic Number System using Multiplicative Weight Update
Paper • 2106.13914 • Published • 1 -
HeurAgenix: Leveraging LLMs for Solving Complex Combinatorial Optimization Challenges
Paper • 2506.15196 • Published • 3 -
Ascend HiFloat8 Format for Deep Learning
Paper • 2409.16626 • Published • 1 -
Recipes for Pre-training LLMs with MXFP8
Paper • 2506.08027 • Published • 1
-
Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V
Paper • 2310.11441 • Published • 29 -
UI-TARS: Pioneering Automated GUI Interaction with Native Agents
Paper • 2501.12326 • Published • 65 -
GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices
Paper • 2406.08451 • Published • 25 -
GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents
Paper • 2406.10819 • Published • 2
-
xlangai/Aguvis-7B-720P
8B • Updated • 52 • 9 -
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction
Paper • 2412.04454 • Published • 71 -
SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents
Paper • 2401.10935 • Published • 5 -
cckevinn/SeeClick
Text Generation • 10B • Updated • 115 • 18
-
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction
Paper • 2412.04454 • Published • 71 -
Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents
Paper • 2410.05243 • Published • 19 -
UI-TARS: Pioneering Automated GUI Interaction with Native Agents
Paper • 2501.12326 • Published • 65
-
AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning
Paper • 2402.15506 • Published • 18 -
AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent
Paper • 2404.03648 • Published • 30 -
Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts
Paper • 2405.19893 • Published • 33 -
Parrot: Efficient Serving of LLM-based Applications with Semantic Variable
Paper • 2405.19888 • Published • 7
-
LNS-Madam: Low-Precision Training in Logarithmic Number System using Multiplicative Weight Update
Paper • 2106.13914 • Published • 1 -
HeurAgenix: Leveraging LLMs for Solving Complex Combinatorial Optimization Challenges
Paper • 2506.15196 • Published • 3 -
Ascend HiFloat8 Format for Deep Learning
Paper • 2409.16626 • Published • 1 -
Recipes for Pre-training LLMs with MXFP8
Paper • 2506.08027 • Published • 1
-
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction
Paper • 2412.04454 • Published • 71 -
GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents
Paper • 2506.03143 • Published • 53 -
Enhancing Visual Grounding for GUI Agents via Self-Evolutionary Reinforcement Learning
Paper • 2505.12370 • Published -
UIShift: Enhancing VLM-based GUI Agents through Self-supervised Reinforcement Learning
Paper • 2505.12493 • Published
-
Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V
Paper • 2310.11441 • Published • 29 -
UI-TARS: Pioneering Automated GUI Interaction with Native Agents
Paper • 2501.12326 • Published • 65 -
GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices
Paper • 2406.08451 • Published • 25 -
GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents
Paper • 2406.10819 • Published • 2
-
xlangai/Aguvis-7B-720P
8B • Updated • 52 • 9 -
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction
Paper • 2412.04454 • Published • 71 -
SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents
Paper • 2401.10935 • Published • 5 -
cckevinn/SeeClick
Text Generation • 10B • Updated • 115 • 18
-
ShowUI: One Vision-Language-Action Model for GUI Visual Agent
Paper • 2411.17465 • Published • 90 -
OmniParser for Pure Vision Based GUI Agent
Paper • 2408.00203 • Published • 25 -
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction
Paper • 2412.04454 • Published • 71 -
zai-org/cogagent-9b-20241220
Image-Text-to-Text • 14B • Updated • 871 • 53
-
ShowUI: One Vision-Language-Action Model for GUI Visual Agent
Paper • 2411.17465 • Published • 90 -
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction
Paper • 2412.04454 • Published • 71 -
AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials
Paper • 2412.09605 • Published • 30
-
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction
Paper • 2412.04454 • Published • 71 -
Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents
Paper • 2410.05243 • Published • 19 -
UI-TARS: Pioneering Automated GUI Interaction with Native Agents
Paper • 2501.12326 • Published • 65