CodeGoat24
/

UnifiedReward-2.0-qwen3vl-32b

Model card Files Files and versions

CodeGoat24 commited on 8 days ago

Commit

96c5e8e

·

verified ·

1 Parent(s): 68cb2bc

Create README.md

Files changed (1) hide show

README.md +45 -0

README.md ADDED Viewed

	@@ -0,0 +1,45 @@

+---
+license: mit
+base_model:
+- Qwen/Qwen3-VL-32B-Instruct
+---
+## Model Summary
+`UnifiedReward-2.0-qwen3vl-32b` is the first unified reward model based on [Qwen/Qwen3-VL-32B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-32B-Instruct) for multimodal understanding and generation assessment, enabling both pairwise ranking and pointwise scoring, which can be employed for vision model preference alignment.
+For further details, please refer to the following resources:
+- 📰 Paper: https://arxiv.org/pdf/2503.05236
+- 🪐 Project Page: https://codegoat24.github.io/UnifiedReward/
+- 🤗 Model Collections: https://huggingface.co/collections/CodeGoat24/unifiedreward-models-67c3008148c3a380d15ac63a
+- 🤗 Dataset Collections: https://huggingface.co/collections/CodeGoat24/unifiedreward-training-data-67c300d4fd5eff00fa7f1ede
+- 👋 Point of Contact: [Yibin Wang](https://codegoat24.github.io)
+## 🏁 Compared with Current Reward Models
+|  Reward Model | Method| Image Generation | Image Understanding | Video Generation | Video Understanding
+| :-----: | :-----: |:-----: |:-----: | :-----: | :-----: |
+|  [PickScore](https://github.com/yuvalkirstain/PickScore) |Point | √ |  | ||
+|  [HPS](https://github.com/tgxs002/HPSv2) | Point | √ |  |||
+|  [ImageReward](https://github.com/THUDM/ImageReward) |  Point| √|  |||
+|  [LLaVA-Critic](https://huggingface.co/lmms-lab/llava-critic-7b) | Pair/Point | | √  |||
+|  [IXC-2.5-Reward](https://github.com/InternLM/InternLM-XComposer) | Pair/Point | | √  ||√|
+|  [VideoScore](https://github.com/TIGER-AI-Lab/VideoScore) | Point |  |  |√ ||
+|  [LiFT](https://github.com/CodeGoat24/LiFT) | Point |  |  |√| |
+|  [VisionReward](https://github.com/THUDM/VisionReward) | Point |√  | |√||
+|  [VideoReward](https://github.com/KwaiVGI/VideoAlign) | Point |  |  |√ ||
+|  UnifiedReward (Ours) | Pair/Point | √ | √ |√|√|
+## Citation
+```
+@article{unifiedreward,
+  title={Unified reward model for multimodal understanding and generation},
+  author={Wang, Yibin and Zang, Yuhang and Li, Hao and Jin, Cheng and Wang, Jiaqi},
+  journal={arXiv preprint arXiv:2503.05236},
+  year={2025}
+}
+```