CoVT: Chain-of-Visual-Thought
Collection
Enrich VLMs’ vision-centric reasoning capabilities via Chain-of-Visual-Thought!
•
7 items
•
Updated
•
3
This CoVT checkpoint is aligned with 8 Segmentation tokens, 4 Depth tokens, 4 DINO tokens, and 4 Edge tokens.
These task-specific tokens are integrated into the model’s embedding space to enhance 2D-awareness, 3D-awareness, patch-level feature representations, and structure-awareness.