Ponder & Press: Advancing Visual GUI Agent towards General Computer Control Paper • 2412.01268 • Published Dec 2, 2024 • 1
UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning Paper • 2505.14231 • Published May 20 • 52
Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning Paper • 2508.04416 • Published Aug 6 • 1
Flash-VStream: Efficient Real-Time Understanding for Long Video Streams Paper • 2506.23825 • Published Jun 30
Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams Paper • 2406.08085 • Published Jun 12, 2024 • 17