EN | 中文
SenseNova-SI: Scaling Spatial Intelligence with Multimodal Foundation Models
[EASI Codebase] [EASI Leaderboard]
Overview
Despite remarkable progress, leading multimodal models still exhibit notable deficiencies in spatial intelligence: the ability to make metric estimations, understand spatial relationships, handle viewpoint changes, and integrate information across complex scenes. We take a scaling perspective: constructing and curating a large-scale, comprehensive collection of spatial intelligence data, and through continued training on powerful multimodal foundations, cultivating multi-faceted spatial understanding within the SenseNova-SI family of models. In the future, SenseNova-SI will be integrated with larger-scale in-house models.
Release Information
Currently, we build SenseNova-SI upon popular open-source foundation models to maximize compatibility with existing research pipelines. In this release, we present SenseNova-SI-InternVL3-2B and SenseNova-SI-InternVL3-8B, which achieve state-of-the-art performance among open-source models of comparable size across four recent spatial intelligence benchmarks: VSI, MMSI, MindCube, and ViewSpatial.
| Model | VSI | MMSI | MindCube-Tiny | ViewSpatial |
|---|---|---|---|---|
| Open-source Models (~2B) | ||||
| InternVL3-2B | 32.98 | 26.50 | 37.50 | 32.56 |
| Qwen3-VL-2B-Instruct | 50.36 | 28.90 | 34.52 | 36.97 |
| MindCube-3B-RawQA-SFT | 17.24 | 1.70 | 51.73 | 24.14 |
| MindCube-3B-Aug-CGMap-FFR-Out-SFT | 29.60 | 29.10 | 41.06 | 30.90 |
| MindCube-3B-Plain-CGMap-FFR-Out-SFT | 29.93 | 30.40 | 39.90 | 31.20 |
| SpatialLadder-3B | 44.86 | 27.40 | 43.46 | 39.85 |
| SpatialMLLM-4B | 45.98 | 26.10 | 33.46 | 34.66 |
| SenseNova-SI-InternVL3-2B | 58.47 | 35.50 | 71.35 | 40.62 |
| Open-source Models (~8B) | ||||
| InternVL3-8B | 42.14 | 28.00 | 41.54 | 38.66 |
| Qwen3-VL-8B-Instruct | 57.90 | 31.10 | 29.42 | 42.20 |
| BAGEL-7B | 30.90 | 33.10 | 34.71 | 41.32 |
| SpaceR-7B | 36.29 | 27.40 | 37.98 | 35.85 |
| ViLaSR-7B | 44.63 | 30.20 | 35.10 | 35.71 |
| SenseNova-SI-InternVL3-8B | 62.80 | 37.90 | 89.33 | 53.92 |
| Proprietary Models | ||||
| Gemini-2.5-pro-2025-06 | 53.57 | 38.00 | 57.60 | 46.06 |
| Grok-4-2025-07-09 | 47.92 | 37.80 | 63.56 | 43.23 |
| GPT-5-2025-08-07 | 55.03 | 41.80 | 56.30 | 45.59 |
What's Next?
We will release the accompanying technical report shortly. Please stay tuned!
- Downloads last month
- 42
Model tree for sensenova/SenseNova-SI-InternVL3-8B
Base model
OpenGVLab/InternVL3-8B-Pretrained