1 37 1

Rui Sun PRO

ThreeSR

https://threesr.github.io/

AI & ML interests

Vision and Language Multimodal Learning, CV, NLP, LLM

Recent Activity

upvoted a paper about 12 hours ago

TiDAR: Think in Diffusion, Talk in Autoregression

upvoted a paper 3 days ago

OlmoEarth: Stable Latent Image Modeling for Multimodal Earth Observation

upvoted a paper 4 days ago

Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds

View all activity

Organizations

upvoted a paper about 12 hours ago

TiDAR: Think in Diffusion, Talk in Autoregression

Paper • 2511.08923 • Published 9 days ago • 96

upvoted a paper 3 days ago

OlmoEarth: Stable Latent Image Modeling for Multimodal Earth Observation

Paper • 2511.13655 • Published 4 days ago • 9

upvoted 2 papers 4 days ago

Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds

Paper • 2511.08892 • Published 9 days ago • 172

PAN: A World Model for General, Interactable, and Long-Horizon World Simulation

Paper • 2511.09057 • Published 9 days ago • 67

upvoted 7 papers 14 days ago

Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark

Paper • 2510.26802 • Published 22 days ago • 32

Kimi Linear: An Expressive, Efficient Attention Architecture

Paper • 2510.26692 • Published 22 days ago • 108

Emu3.5: Native Multimodal Models are World Learners

Paper • 2510.26583 • Published 22 days ago • 103

upvoted a paper about 1 month ago

Paper2Video: Automatic Video Generation from Scientific Papers

Paper • 2510.05096 • Published Oct 6 • 112

upvoted 2 papers about 2 months ago

Video models are zero-shot learners and reasoners

Paper • 2509.20328 • Published Sep 24 • 96

MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer

Paper • 2509.16197 • Published Sep 19 • 54

upvoted 2 papers 3 months ago

VeriGUI: Verifiable Long-Chain GUI Dataset

Paper • 2508.04026 • Published Aug 6 • 158

BeyondWeb: Lessons from Scaling Synthetic Data for Trillion-scale Pretraining

Paper • 2508.10975 • Published Aug 14 • 60

upvoted 2 papers 4 months ago

A Survey on Long-Video Storytelling Generation: Architectures, Consistency, and Cinematic Quality

Paper • 2507.07202 • Published Jul 9 • 24

GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Paper • 2507.01006 • Published Jul 1 • 238

upvoted 2 papers 5 months ago

WebSailor: Navigating Super-human Reasoning for Web Agent

Paper • 2507.02592 • Published Jul 3 • 122

Embodied Web Agents: Bridging Physical-Digital Realms for Integrated Agent Intelligence

Paper • 2506.15677 • Published Jun 18 • 23

Rui Sun PRO

AI & ML interests

Recent Activity

Organizations

ThreeSR's activity