Ming Chen

ChenMing-thu14

AI & ML interests

3D Human Pose Estimation

Recent Activity

upvoted a paper 8 days ago

Cambrian-S: Towards Spatial Supersensing in Video

upvoted a paper 16 days ago

The End of Manual Decoding: Towards Truly End-to-End Language Models

upvoted a paper 19 days ago

Emu3.5: Native Multimodal Models are World Learners

View all activity

Organizations

upvoted a paper 8 days ago

Cambrian-S: Towards Spatial Supersensing in Video

Paper • 2511.04670 • Published 13 days ago • 34

upvoted a paper 16 days ago

The End of Manual Decoding: Towards Truly End-to-End Language Models

Paper • 2510.26697 • Published 20 days ago • 113

upvoted a paper 19 days ago

Emu3.5: Native Multimodal Models are World Learners

Paper • 2510.26583 • Published 20 days ago • 103

upvoted a paper 22 days ago

LongCat-Video Technical Report

Paper • 2510.22200 • Published 25 days ago • 24

upvoted a paper 26 days ago

Every Question Has Its Own Value: Reinforcement Learning with Explicit Human Values

Paper • 2510.20187 • Published 27 days ago • 18

upvoted 3 papers 27 days ago

authored a paper 28 days ago

Kling-Avatar: Grounding Multimodal Instructions for Cascaded Long-Duration Avatar Animation Synthesis

Paper • 2509.09595 • Published Sep 11 • 48

upvoted 2 papers 29 days ago

Embody 3D: A Large-scale Multimodal Motion and Behavior Dataset

Paper • 2510.16258 • Published Oct 17 • 7

FineVision: Open Data Is All You Need

Paper • 2510.17269 • Published 30 days ago • 65

upvoted 8 papers about 1 month ago

Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model

Paper • 2510.12276 • Published Oct 14 • 143

OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM

Paper • 2510.15870 • Published Oct 17 • 87

Latent Diffusion Model without Variational Autoencoder

Paper • 2510.15301 • Published Oct 17 • 48

Ponimator: Unfolding Interactive Pose for Versatile Human-human Interaction Animation

Paper • 2510.14976 • Published Oct 16 • 3

From Pixels to Words -- Towards Native Vision-Language Primitives at Scale

Paper • 2510.14979 • Published Oct 16 • 65

UniFusion: Vision-Language Model as Unified Encoder in Image Generation

Paper • 2510.12789 • Published Oct 14 • 17

D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI

Paper • 2510.05684 • Published Oct 7 • 139

UniVideo: Unified Understanding, Generation, and Editing for Videos

Paper • 2510.08377 • Published Oct 9 • 70

upvoted a paper about 2 months ago

Rolling Forcing: Autoregressive Long Video Diffusion in Real Time

Paper • 2509.25161 • Published Sep 29 • 23

Ming Chen

AI & ML interests

Recent Activity

Organizations

ChenMing-thu14's activity