Stoney Kang's picture

464 25

Stoney Kang

sikang99

·

AI & ML interests

Remote Control based on Vision

Recent Activity

upvoted a paper about 6 hours ago

SIMS-V: Simulated Instruction-Tuning for Spatial Video Understanding

upvoted a paper about 13 hours ago

Diffusion Language Models are Super Data Learners

upvoted a paper 1 day ago

NVIDIA Nemotron Nano V2 VL

View all activity

Organizations

upvoted a paper about 6 hours ago

SIMS-V: Simulated Instruction-Tuning for Spatial Video Understanding

Paper • 2511.04668 • Published 2 days ago • 3

upvoted a paper about 13 hours ago

Diffusion Language Models are Super Data Learners

Paper • 2511.03276 • Published 3 days ago • 92

upvoted a paper 1 day ago

NVIDIA Nemotron Nano V2 VL

Paper • 2511.03929 • Published 3 days ago • 13

upvoted 10 papers 3 days ago

RoboChallenge: Large-scale Real-robot Evaluation of Embodied Policies

Paper • 2510.17950 • Published 19 days ago • 4

Don't Blind Your VLA: Aligning Visual Representations for OOD Generalization

Paper • 2510.25616 • Published 10 days ago • 88

The Collaboration Gap

Paper • 2511.02687 • Published 4 days ago • 19

iFlyBot-VLA Technical Report

Paper • 2511.01914 • Published 7 days ago • 4

TWIST2: Scalable, Portable, and Holistic Humanoid Data Collection System

Paper • 2511.02832 • Published 4 days ago • 8

Unified Diffusion VLA: Vision-Language-Action Model via Joint Discrete Denoising Diffusion Process

Paper • 2511.01718 • Published 5 days ago • 6

MotionStream: Real-Time Video Generation with Interactive Motion Controls

Paper • 2511.01266 • Published 5 days ago • 20

NaviTrace: Evaluating Embodied Navigation of Vision-Language Models

Paper • 2510.26909 • Published 9 days ago • 13

World Simulation with Video Foundation Models for Physical AI

Paper • 2511.00062 • Published 11 days ago • 35

PHUMA: Physically-Grounded Humanoid Locomotion Dataset

Paper • 2510.26236 • Published 9 days ago • 27

upvoted 2 papers 5 days ago

Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning

Paper • 2510.27606 • Published 8 days ago • 27

Dual-Stream Diffusion for World-Model Augmented Vision-Language-Action Model

Paper • 2510.27607 • Published 8 days ago • 8

upvoted 3 papers 6 days ago

π_RL: Online RL Fine-tuning for Flow-based Vision-Language-Action Models

Paper • 2510.25889 • Published 10 days ago • 58

Continuous Autoregressive Language Models

Paper • 2510.27688 • Published 8 days ago • 59

INT v.s. FP: A Comprehensive Study of Fine-Grained Low-bit Quantization Formats

Paper • 2510.25602 • Published 10 days ago • 63

upvoted 2 papers 7 days ago

POWSM: A Phonetic Open Whisper-Style Speech Foundation Model

Paper • 2510.24992 • Published 11 days ago • 2

CityRiSE: Reasoning Urban Socio-Economic Status in Vision-Language Models via Reinforcement Learning

Paper • 2510.22282 • Published 14 days ago • 2