Donghao Zhou's picture

1 79 5

Donghao Zhou

donghao-zhou

·

https://correr-zhou.github.io

Correr-Zhou

AI & ML interests

Generative AI

Recent Activity

upvoted a paper 9 days ago

Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark

upvoted a paper 9 days ago

Emu3.5: Native Multimodal Models are World Learners

upvoted a paper 13 days ago

Video-As-Prompt: Unified Semantic Control for Video Generation

View all activity

Organizations

None yet

upvoted 2 papers 9 days ago

Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark

Paper • 2510.26802 • Published 9 days ago • 32

Emu3.5: Native Multimodal Models are World Learners

Paper • 2510.26583 • Published 9 days ago • 102

upvoted a paper 13 days ago

Video-As-Prompt: Unified Semantic Control for Video Generation

Paper • 2510.20888 • Published 16 days ago • 44

upvoted a paper 24 days ago

Detect Anything via Next Point Prediction

Paper • 2510.12798 • Published 25 days ago • 44

upvoted 2 papers 25 days ago

DeepMMSearch-R1: Empowering Multimodal LLMs in Multimodal Web Search

Paper • 2510.12801 • Published 25 days ago • 13

SRUM: Fine-Grained Self-Rewarding for Unified Multimodal Models

Paper • 2510.12784 • Published 25 days ago • 20

upvoted 3 papers 27 days ago

PhysToolBench: Benchmarking Physical Tool Understanding for MLLMs

Paper • 2510.09507 • Published 29 days ago • 10

UniVideo: Unified Understanding, Generation, and Editing for Videos

Paper • 2510.08377 • Published about 1 month ago • 70

VideoCanvas: Unified Video Completion from Arbitrary Spatiotemporal Patches via In-Context Conditioning

Paper • 2510.08555 • Published about 1 month ago • 62

upvoted 8 papers about 1 month ago

Ovi: Twin Backbone Cross-Modal Fusion for Audio-Video Generation

Paper • 2510.01284 • Published Sep 30 • 31

Self-Forcing++: Towards Minute-Scale High-Quality Video Generation

Paper • 2510.02283 • Published Oct 2 • 92

SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer

Paper • 2509.24695 • Published Sep 29 • 44

LongLive: Real-time Interactive Long Video Generation

Paper • 2509.22622 • Published Sep 26 • 181

LucidFlux: Caption-Free Universal Image Restoration via a Large-Scale Diffusion Transformer

Paper • 2509.22414 • Published Sep 26 • 21

SceneWeaver: All-in-One 3D Scene Synthesis with an Extensible and Self-Reflective Agent

Paper • 2509.20414 • Published Sep 24 • 9

Video models are zero-shot learners and reasoners

Paper • 2509.20328 • Published Sep 24 • 96

Seedream 4.0: Toward Next-generation Multimodal Image Generation

Paper • 2509.20427 • Published Sep 24 • 76

upvoted a paper about 2 months ago

Hyper-Bagel: A Unified Acceleration Framework for Multimodal Understanding and Generation

Paper • 2509.18824 • Published Sep 23 • 22

upvoted a paper 2 months ago

Does DINOv3 Set a New Medical Vision Standard?

Paper • 2509.06467 • Published Sep 8 • 36

authored a paper 2 months ago

HERO: Hierarchical Extrapolation and Refresh for Efficient World Models

Paper • 2508.17588 • Published Aug 25 • 2