File size: 1,510 Bytes
e8f7c7f
 
b1e50f1
 
 
e8f7c7f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b1e50f1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
---
license: apache-2.0
tags:
- Reinforcement Learning
- Visual-langauge Reasoning
---
# Model Card for WeThink-Qwen2.5VL-7B

Repository: https://github.com/yangjie-cv/WeThink

Paper: https://arxiv.org/abs/2506.07905



## πŸ† Performance Highlights
**WeThink-Qwen2.5VL-7B** achieves:
- πŸ₯‡ **1st place** on [OpenCompass Multimodal Reasoning Leaderboard](https://rank.opencompass.org.cn/leaderboard-multimodal-reasoning/?m=REALTIME)
- πŸ… **5th place** on [OpenCompass Multi-modal Academic Leaderboard](https://rank.opencompass.org.cn/leaderboard-multimodal/?m=REALTIME)  
*(As of May 30th, 2025)*


## πŸš€ Quick Start
### Inference

```bash
git clone https://github.com/yangjie-cv/WeThink
cd WeThink
python inference.py
```
πŸ’‘ ​​Note​​: System prompt is required during inference. 

### πŸ“Š Evaluation
We have integrated WeThink-Qwen2.5VL-7B into the [VLMEvalKit](https://github.com/open-compass/VLMEvalKit). Please follow its [Quickstart guide](https://github.com/open-compass/VLMEvalKit/blob/main/docs/en/Quickstart.md) to evaluate WeThink-Qwen2.5VL-7B on various benchmarks.


## Citation
```
@misc{yang2025wethink,
      title={WeThink: Toward General-purpose Vision-Language Reasoning via Reinforcement Learning}, 
      author={Jie Yang and Feipeng Ma and Zitian Wang and Dacheng Yin and Kang Rong and Fengyun Rao and Ruimao Zhang},
      year={2025},
      eprint={2506.07905},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2506.07905}, 
}
```