CodeGoat24
/

UnifiedReward-2.0-qwen3vl-32b

Model card Files Files and versions

UnifiedReward-2.0-qwen3vl-32b / README.md

CodeGoat24's picture

Create README.md

96c5e8e verified 16 days ago

|

history blame contribute delete

2.15 kB

	---
	license: mit
	base_model:
	- Qwen/Qwen3-VL-32B-Instruct
	---


	## Model Summary

	`UnifiedReward-2.0-qwen3vl-32b` is the first unified reward model based on [Qwen/Qwen3-VL-32B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-32B-Instruct) for multimodal understanding and generation assessment, enabling both pairwise ranking and pointwise scoring, which can be employed for vision model preference alignment.

	For further details, please refer to the following resources:
	- 📰 Paper: https://arxiv.org/pdf/2503.05236
	- 🪐 Project Page: https://codegoat24.github.io/UnifiedReward/
	- 🤗 Model Collections: https://huggingface.co/collections/CodeGoat24/unifiedreward-models-67c3008148c3a380d15ac63a
	- 🤗 Dataset Collections: https://huggingface.co/collections/CodeGoat24/unifiedreward-training-data-67c300d4fd5eff00fa7f1ede
	- 👋 Point of Contact: [Yibin Wang](https://codegoat24.github.io)


	## 🏁 Compared with Current Reward Models

	\| Reward Model \| Method\| Image Generation \| Image Understanding \| Video Generation \| Video Understanding
	\| :-----: \| :-----: \|:-----: \|:-----: \| :-----: \| :-----: \|
	\| [PickScore](https://github.com/yuvalkirstain/PickScore) \|Point \| √ \| \| \|\|
	\| [HPS](https://github.com/tgxs002/HPSv2) \| Point \| √ \| \|\|\|
	\| [ImageReward](https://github.com/THUDM/ImageReward) \| Point\| √\| \|\|\|
	\| [LLaVA-Critic](https://huggingface.co/lmms-lab/llava-critic-7b) \| Pair/Point \| \| √ \|\|\|
	\| [IXC-2.5-Reward](https://github.com/InternLM/InternLM-XComposer) \| Pair/Point \| \| √ \|\|√\|
	\| [VideoScore](https://github.com/TIGER-AI-Lab/VideoScore) \| Point \| \| \|√ \|\|
	\| [LiFT](https://github.com/CodeGoat24/LiFT) \| Point \| \| \|√\| \|
	\| [VisionReward](https://github.com/THUDM/VisionReward) \| Point \|√ \| \|√\|\|
	\| [VideoReward](https://github.com/KwaiVGI/VideoAlign) \| Point \| \| \|√ \|\|
	\| UnifiedReward (Ours) \| Pair/Point \| √ \| √ \|√\|√\|


	## Citation

	```
	@article{unifiedreward,
	title={Unified reward model for multimodal understanding and generation},
	author={Wang, Yibin and Zang, Yuhang and Li, Hao and Jin, Cheng and Wang, Jiaqi},
	journal={arXiv preprint arXiv:2503.05236},
	year={2025}
	}
	```