Stoney Kang
sikang99
AI & ML interests
Remote Control based on Vision
Recent Activity
upvoted
a
paper
about 6 hours ago
SIMS-V: Simulated Instruction-Tuning for Spatial Video Understanding
upvoted
a
paper
about 14 hours ago
Diffusion Language Models are Super Data Learners
upvoted
a
paper
1 day ago
NVIDIA Nemotron Nano V2 VL
Organizations
Diffusion Models
-
Lost in Latent Space: An Empirical Study of Latent Diffusion Models for Physics Emulation
Paper • 2507.02608 • Published • 21 -
HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model
Paper • 2503.10631 • Published -
Mobile Video Diffusion
Paper • 2412.07583 • Published • 20 -
ObjFiller-3D: Consistent Multi-view 3D Inpainting via Video Diffusion Models
Paper • 2508.18271 • Published • 8
Diffusion Model
Vision Processing
VLA Models
Vision Language Models for Robotics
-
Unified Vision-Language-Action Model
Paper • 2506.19850 • Published • 27 -
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics
Paper • 2506.01844 • Published • 141 -
3D-VLA: A 3D Vision-Language-Action Generative World Model
Paper • 2403.09631 • Published • 11 -
QUAR-VLA: Vision-Language-Action Model for Quadruped Robots
Paper • 2312.14457 • Published • 1
3D Generation
-
Hunyuan3D 2.5: Towards High-Fidelity 3D Assets Generation with Ultimate Details
Paper • 2506.16504 • Published • 26 -
Hunyuan3D 2.1: From Images to High-Fidelity 3D Assets with Production-Ready PBR Material
Paper • 2506.15442 • Published • 12 -
Dens3R: A Foundation Model for 3D Geometry Prediction
Paper • 2507.16290 • Published • 8 -
GSFixer: Improving 3D Gaussian Splatting with Reference-Guided Video Diffusion Priors
Paper • 2508.09667 • Published • 6
SLAM
3DGS NeRF
Diffusion Models
-
Lost in Latent Space: An Empirical Study of Latent Diffusion Models for Physics Emulation
Paper • 2507.02608 • Published • 21 -
HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model
Paper • 2503.10631 • Published -
Mobile Video Diffusion
Paper • 2412.07583 • Published • 20 -
ObjFiller-3D: Consistent Multi-view 3D Inpainting via Video Diffusion Models
Paper • 2508.18271 • Published • 8
VLM, MLLM
Diffusion Model
Reinforcement Learning
Vision Processing
Simulation
VLA Models
Vision Language Models for Robotics
-
Unified Vision-Language-Action Model
Paper • 2506.19850 • Published • 27 -
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics
Paper • 2506.01844 • Published • 141 -
3D-VLA: A 3D Vision-Language-Action Generative World Model
Paper • 2403.09631 • Published • 11 -
QUAR-VLA: Vision-Language-Action Model for Quadruped Robots
Paper • 2312.14457 • Published • 1
AI Agents
3D Generation
-
Hunyuan3D 2.5: Towards High-Fidelity 3D Assets Generation with Ultimate Details
Paper • 2506.16504 • Published • 26 -
Hunyuan3D 2.1: From Images to High-Fidelity 3D Assets with Production-Ready PBR Material
Paper • 2506.15442 • Published • 12 -
Dens3R: A Foundation Model for 3D Geometry Prediction
Paper • 2507.16290 • Published • 8 -
GSFixer: Improving 3D Gaussian Splatting with Reference-Guided Video Diffusion Priors
Paper • 2508.09667 • Published • 6
Video Generation