3 31 14

Moritz Reuss

mbreuss

https://mbreuss.github.io/

AI & ML interests

Robotics, Imitation Learning, Diffusion Embodied AI

Recent Activity

upvoted an article about 1 hour ago

Text-to-image Architectural Experiments

liked a Space 7 days ago

HuggingFaceTB/smol-training-playbook

authored a paper 14 days ago

BEAST: Efficient Tokenization of B-Splines Encoded Action Sequences for Imitation Learning

View all activity

Organizations

upvoted an article about 1 hour ago

Article

Text-to-image Architectural Experiments

5 days ago

•

upvoted 3 papers 14 days ago

World Simulation with Video Foundation Models for Physical AI

Paper • 2511.00062 • Published 21 days ago • 39

Unified Diffusion VLA: Vision-Language-Action Model via Joint Discrete Denoising Diffusion Process

Paper • 2511.01718 • Published 15 days ago • 6

NaviTrace: Evaluating Embodied Navigation of Vision-Language Models

Paper • 2510.26909 • Published 19 days ago • 13

upvoted 2 papers about 2 months ago

Do You Need Proprioceptive States in Visuomotor Policies?

Paper • 2509.18644 • Published Sep 23 • 49

MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer

Paper • 2509.16197 • Published Sep 19 • 54

upvoted a paper 2 months ago

FLOWER: Democratizing Generalist Robot Policies with Efficient Vision-Language-Action Flow Policies

Paper • 2509.04996 • Published Sep 5 • 13

upvoted a paper 3 months ago

DINOv3

Paper • 2508.10104 • Published Aug 13 • 280

upvoted a paper 5 months ago

BridgeVLA: Input-Output Alignment for Efficient 3D Manipulation Learning with Vision-Language Models

Paper • 2506.07961 • Published Jun 9 • 11

upvoted 3 papers 6 months ago

upvoted a paper 7 months ago

Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model

Paper • 2504.08685 • Published Apr 11 • 130

upvoted a collection 8 months ago

FLOWER VLA

Collection

Collection of checkpoints for the FLOWER VLA policy. A small and versatile VLA for language-conditioned robot manipulation with less than 1B parameter • 10 items • Updated Sep 17 • 4

upvoted 3 papers 8 months ago

VGGT: Visual Geometry Grounded Transformer

Paper • 2503.11651 • Published Mar 14 • 33

Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k

Paper • 2503.09642 • Published Mar 12 • 19

Transformers without Normalization

Paper • 2503.10622 • Published Mar 13 • 171

upvoted 3 papers 9 months ago

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Paper • 2502.14786 • Published Feb 20 • 154

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Paper • 2502.11089 • Published Feb 16 • 165

HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation

Paper • 2502.12148 • Published Feb 17 • 17

Moritz Reuss

AI & ML interests

Recent Activity

Organizations

mbreuss's activity

Text-to-image Architectural Experiments