fix README.md
Browse files
README.md
CHANGED
|
@@ -9,18 +9,14 @@ pipeline_tag: image-to-3d
|
|
| 9 |
<h1>Streaming 4D Visual Geometry Transformer</h1>
|
| 10 |
</div>
|
| 11 |
|
| 12 |
-
|
| 13 |
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
>Dong Zhuo<sup>\*</sup>, [Wenzhao Zheng](https://wzzheng.net/)<sup>*</sup>$\dagger$, Jiahe Guo, Yuqi Wu, [Jie Zhou](https://scholar.google.com/citations?user=6a79aPwAAAAJ&hl=en&authuser=1), [Jiwen Lu](http://ivg.au.tsinghua.edu.cn/Jiwen_Lu/)
|
| 17 |
-
|
| 18 |
-
<sup>*</sup> Equal contribution. $\dagger$ Project leader.
|
| 19 |
|
|
|
|
| 20 |
|
| 21 |
**StreamVGGT**, a causal transformer architecture for **real-time streaming 4D visual geometry perception** compatiable with LLM-targeted attention mechanism (e.g., [FlashAttention](https://github.com/Dao-AILab/flash-attention)), delivers both fast inference and high-quality 4D reconstruction.
|
| 22 |
|
| 23 |
-
|
| 24 |
## Overview
|
| 25 |
|
| 26 |
Given a sequence of images, unlike offline models that require reprocessing the entire sequence and reconstructing the entire scene upon receiving each new image, our StreamVGGT employs temporal
|
|
|
|
| 9 |
<h1>Streaming 4D Visual Geometry Transformer</h1>
|
| 10 |
</div>
|
| 11 |
|
| 12 |
+
Dong Zhuo\*, [Wenzhao Zheng](https://wzzheng.net/)\*†, Jiahe Guo, Yuqi Wu, [Jie Zhou](https://scholar.google.com/citations?user=6a79aPwAAAAJ&hl=en&authuser=1), [Jiwen Lu](http://ivg.au.tsinghua.edu.cn/Jiwen_Lu/)
|
| 13 |
|
| 14 |
+
\* Equal contribution. † Project leader.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
|
| 16 |
+
[Paper](https://arxiv.org/abs/2507.11539) | [Project Page](https://wzzheng.net/StreamVGGT)
|
| 17 |
|
| 18 |
**StreamVGGT**, a causal transformer architecture for **real-time streaming 4D visual geometry perception** compatiable with LLM-targeted attention mechanism (e.g., [FlashAttention](https://github.com/Dao-AILab/flash-attention)), delivers both fast inference and high-quality 4D reconstruction.
|
| 19 |
|
|
|
|
| 20 |
## Overview
|
| 21 |
|
| 22 |
Given a sequence of images, unlike offline models that require reprocessing the entire sequence and reconstructing the entire scene upon receiving each new image, our StreamVGGT employs temporal
|