File size: 1,662 Bytes
0e00015 b82b436 0e00015 b82b436 f9ba55b 0e00015 508da3f 21a937b 508da3f 21a937b 508da3f 21a937b 508da3f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
---
pipeline_tag: image-to-3d
tags:
- model_hub_mixin
- pytorch_model_hub_mixin
library_name: pytorch
license: cc-by-nc-4.0
---
<div align="center">
<h1>Streaming 4D Visual Geometry Transformer</h1>
</div>
Dong Zhuo\*, [Wenzhao Zheng](https://wzzheng.net/)\*†, Jiahe Guo, Yuqi Wu, [Jie Zhou](https://scholar.google.com/citations?user=6a79aPwAAAAJ&hl=en&authuser=1), [Jiwen Lu](http://ivg.au.tsinghua.edu.cn/Jiwen_Lu/)
\* Equal contribution. † Project leader.
[Paper](https://arxiv.org/abs/2507.11539) | [Project Page](https://wzzheng.net/StreamVGGT)
**StreamVGGT**, a causal transformer architecture for **real-time streaming 4D visual geometry perception** compatiable with LLM-targeted attention mechanism (e.g., [FlashAttention](https://github.com/Dao-AILab/flash-attention)), delivers both fast inference and high-quality 4D reconstruction.
## Overview
Given a sequence of images, unlike offline models that require reprocessing the entire sequence and reconstructing the entire scene upon receiving each new image, our StreamVGGT employs temporal
causal attention and leverages cached memory token to support efficient incremental on-the-fly reconstruction, enabling interative and real-time online applitions.
## Quick start
Please refer to our [Github Repo](https://github.com/wzzheng/StreamVGGT).
## Citation
If you find this project helpful, please consider citing the following paper:
```
@article{streamVGGT,
title={Streaming 4D Visual Geometry Transformer},
author={Dong Zhuo and Wenzhao Zheng and Jiahe Guo and Yuqi Wu and Jie Zhou and Jiwen Lu},
journal={arXiv preprint arXiv:2507.11539},
year={2025}
}
``` |