Zekun Qi's picture

Zekun Qi

qizekun

·

https://qizekun.github.io/

qizekun

AI & ML interests

Embodied Intelligence, Large Langugae Model, 3D Computer Vision

Recent Activity

liked a Space 11 days ago

yyfz233/Pi3

authored a paper 29 days ago

Reasoning in Space via Grounding in the World

liked a model about 1 month ago

Qwen/Qwen3-VL-4B-Instruct

View all activity

Organizations

upvoted a collection about 1 month ago

GS-Reasoner

Collections of paper "Reasoning in Space via Grounding in the World" • 6 items • Updated 30 days ago • 2

upvoted a paper about 1 month ago

Reasoning in Space via Grounding in the World

Paper • 2510.13800 • Published Oct 15 • 14

upvoted 3 papers 3 months ago

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Paper • 2508.18265 • Published Aug 25 • 205

ODYSSEY: Open-World Quadrupeds Exploration and Manipulation for Long-Horizon Tasks

Paper • 2508.08240 • Published Aug 11 • 45

NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale

Paper • 2508.10711 • Published Aug 14 • 142

upvoted 3 papers 4 months ago

Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models

Paper • 2507.13344 • Published Jul 17 • 56

Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning

Paper • 2507.05255 • Published Jul 7 • 74

DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge

Paper • 2507.04447 • Published Jul 6 • 44

upvoted 3 papers 6 months ago

OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models

Paper • 2506.03135 • Published Jun 3 • 39

ShapeLLM-Omni: A Native Multimodal LLM for 3D Generation and Understanding

Paper • 2506.01853 • Published Jun 2 • 32

AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

Paper • 2505.24863 • Published May 30 • 97

upvoted 2 papers 7 months ago

Step1X-Edit: A Practical Framework for General Image Editing

Paper • 2504.17761 • Published Apr 24 • 92

One-Minute Video Generation with Test-Time Training

Paper • 2504.05298 • Published Apr 7 • 110

upvoted a collection 8 months ago

DreamLLM

[ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creation (https://arxiv.org/abs/2309.11499) • 6 items • Updated Mar 22, 2024 • 3

upvoted a paper 8 months ago

Unleashing Vecset Diffusion Model for Fast Shape Generation

Paper • 2503.16302 • Published Mar 20 • 43

upvoted an article 8 months ago

Article

Proximal Policy Optimization (PPO)

Aug 5, 2022

•

66

upvoted a paper 8 months ago

Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey

Paper • 2503.12605 • Published Mar 16 • 35

upvoted a collection 9 months ago

SoFar

Collections of NeurIPS 2025 Spotlight paper: "SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation" • 5 items • Updated Sep 24 • 3

upvoted a paper 9 months ago

SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation

Paper • 2502.13143 • Published Feb 18 • 31

upvoted a collection over 1 year ago

ShapeLLM

Model collections of ECCV 2024 paper: "ShapeLLM: Universal 3D Object Understanding for Embodied Interaction". • 8 items • Updated Jul 16, 2024 • 5