sienna223 commited on
Commit
2c60715
·
verified ·
1 Parent(s): 99e6ae7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +209 -45
README.md CHANGED
@@ -13,52 +13,216 @@ model-index:
13
  results: []
14
  ---
15
 
16
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
17
- should probably proofread and complete it, then remove this comment. -->
18
-
19
- # editscore_qwen3_8B_ins
20
-
21
- This model is a fine-tuned version of [Qwen/Qwen3-VL-8B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct) on the EditScore dataset.
22
-
23
- ## Model description
24
-
25
- More information needed
26
-
27
- ## Intended uses & limitations
28
-
29
- More information needed
30
-
31
- ## Training and evaluation data
32
-
33
- More information needed
34
-
35
- ## Training procedure
36
-
37
- ### Training hyperparameters
38
-
39
- The following hyperparameters were used during training:
40
- - learning_rate: 0.0001
41
- - train_batch_size: 1
42
- - eval_batch_size: 8
43
- - seed: 42
44
- - distributed_type: multi-GPU
45
- - num_devices: 8
46
- - gradient_accumulation_steps: 16
47
- - total_train_batch_size: 128
48
- - total_eval_batch_size: 64
49
- - optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
50
- - lr_scheduler_type: cosine
51
- - lr_scheduler_warmup_ratio: 0.1
52
- - num_epochs: 3.0
53
-
54
- ### Training results
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
55
 
 
56
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
57
 
58
- ### Framework versions
59
 
60
- - PEFT 0.17.1
61
- - Transformers 4.57.1
62
- - Pytorch 2.6.0+cu124
63
- - Datasets 4.0.0
64
- - Tokenizers 0.22.1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  results: []
14
  ---
15
 
16
+ <p align="center">
17
+ <img src="https://raw.githubusercontent.com/VectorSpaceLab/EditScore/refs/heads/main/assets/logo.png" width="65%">
18
+ </p>
19
+
20
+ <p align="center">
21
+ <a href="https://vectorspacelab.github.io/EditScore"><img src="https://img.shields.io/badge/Project%20Page-EditScore-yellow" alt="project page"></a>
22
+ <a href="https://arxiv.org/abs/2509.23909"><img src="https://img.shields.io/badge/arXiv%20paper-2509.23909-b31b1b.svg" alt="arxiv"></a>
23
+ <a href="https://huggingface.co/collections/EditScore/editscore-68d8e27ee676981221db3cfe"><img src="https://img.shields.io/badge/EditScore-🤗-yellow" alt="model"></a>
24
+ <a href="https://huggingface.co/datasets/EditScore/EditReward-Bench"><img src="https://img.shields.io/badge/EditReward--Bench-🤗-yellow" alt="dataset"></a>
25
+ <a href="https://huggingface.co/datasets/EditScore/EditScore-Reward-Data"><img src="https://img.shields.io/badge/EditScore--Reward--Data-🤗-yellow" alt="dataset"></a>
26
+ <a href="https://huggingface.co/datasets/EditScore/EditScore-RL-Data"><img src="https://img.shields.io/badge/EditScore--RL--Data-🤗-yellow" alt="dataset"></a>
27
+ </p>
28
+
29
+ <h4 align="center">
30
+ <p>
31
+ <a href=#-news>News</a> |
32
+ <a href=#-quick-start>Quick Start</a> |
33
+ <a href=#-benchmark-your-image-editing-reward-model usage>Benchmark Usage</a> |
34
+ <a href=#%EF%B8%8F-citing-us>Citation</a>
35
+ <p>
36
+ </h4>
37
+
38
+ **EditScore** is a series of state-of-the-art open-source reward models (7B–72B) designed to evaluate and enhance instruction-guided image editing.
39
+ ## Highlights
40
+ - **State-of-the-Art Performance**: Effectively matches the performance of leading proprietary VLMs. With a self-ensembling strategy, **our largest model surpasses even GPT-5** on our comprehensive benchmark, **EditReward-Bench**.
41
+ - **A Reliable Evaluation Standard**: We introduce **EditReward-Bench**, the first public benchmark specifically designed for evaluating reward models in image editing, featuring 13 subtasks, 11 state-of-the-art editing models (*including proprietary models*) and expert human annotations.
42
+ - **Simple and Easy-to-Use**: Get an accurate quality score for your image edits with just a few lines of code.
43
+ - **Versatile Applications**: Ready to use as a best-in-class reranker to improve editing outputs, or as a high-fidelity reward signal for **stable and effective Reinforcement Learning (RL) fine-tuning**.
44
+
45
+ ## 🔥 News
46
+ - **2025-10-31**: We’re thrilled to announce the **Qwen3-VL** variants of **EditScore**! 🚀 Powered by Qwen3-VL, the new 4B and 8B models achieve outstanding efficiency and performance. Impressively, the 4B model already matches the performance of the original 32B version, while the 8B model delivers results comparable to the original 72B model. The models are now available on [huggingface](https://huggingface.co/EditScore/models), see [Usage Example](#-usage-example) for how to use. Detailed comparisons with Qwen2.5-VL variants are in the [performance table](https://raw.githubusercontent.com/VectorSpaceLab/EditScore/refs/heads/main/assets/table_editscore_qwen3_vl.png).
47
+ - **2025-10-27**: Released [OmniGen2-EditScore7B-v1.1](https://huggingface.co/OmniGen2/OmniGen2-EditScore7B-v1.1), achieving a **7.01 (+0.73) GEdit score** within **700 steps**, by incorporating the **reweighting strategy** from [TempFlow](https://arxiv.org/abs/2508.04324). Additionally, the **JSON repair process** has been enhanced using [json_repair](https://github.com/mangiucugna/json_repair), improving **EditScore’s stability** under various conditions. Upgrade via `pip install -U editscore`.
48
+ - **2025-10-22**: **Introducing Our Reinforcement Learning Training Framework!**
49
+ We're excited to release our complete RL pipeline, the result of a massive effort to simplify fine-tuning for image editing models. Key features include:
50
+ - **Ready-to-Use RL Dataset**: Includes the complete dataset used in the EditScore project, along with clear usage guidelines and preparation scripts.
51
+ - **An Easy-to-Use Reward Model**: Seamlessly integrate **EditScore** as a reward signal.
52
+ - **A Scalable Reward Server**: Built with native multi-node support for high-throughput training.
53
+ - **Flexible Training Code**: Supports distributed training, variable image resolutions and mixed tasks (t2i, edit, in-context generation) out-of-the-box.
54
+ Dive into our comprehensive guide on [RL Fine-Tuning](examples/OmniGen2-RL#application-2-reinforcement-fine-tuning) to get started.
55
+
56
+ - 2025-10-16: Training datasets [EditScore-Reward-Data](https://huggingface.co/datasets/EditScore/EditScore-Reward-Data) and [EditScore-RL-Data](https://huggingface.co/datasets/EditScore/EditScore-RL-Data) are available.
57
+ - 2025-10-15: **EditScore** is now available on PyPI — install it easily with `pip install editscore`.
58
+ - 2025-10-15: Best-of-N inference scripts for OmniGen2, Flux-dev-Kontext, and Qwen-Image-Edit are now available! See [this](#apply-editscore-to-image-editing) for details.
59
+ - 2025-09-30: We release **OmniGen2-EditScore7B**, unlocking online RL For Image Editing via high-fidelity EditScore. LoRA weights are available at [Hugging Face](https://huggingface.co/OmniGen2/OmniGen2-EditScore7B) and [ModelScope](https://www.modelscope.cn/models/OmniGen2/OmniGen2-EditScore7B).
60
+ - 2025-09-30: We are excited to release **EditScore** and **EditReward-Bench**! Model weights and the benchmark dataset are now publicly available. You can access them on Hugging Face: [Models Collection](https://huggingface.co/collections/EditScore/editscore-68d8e27ee676981221db3cfe) and [Benchmark Dataset](https://huggingface.co/datasets/EditScore/EditReward-Bench), and on ModelScope: [Models Collection](https://www.modelscope.cn/collections/EditScore-8b0d53aa945d4e) and [Benchmark Dataset](https://www.modelscope.cn/datasets/EditScore/EditReward-Bench).
61
+
62
+ ## 📖 Introduction
63
+ While Reinforcement Learning (RL) holds immense potential for this domain, its progress has been severely hindered by the absence of a high-fidelity, efficient reward signal.
64
+
65
+ To overcome this barrier, we provide a systematic, two-part solution:
66
+
67
+ - **A Rigorous Evaluation Standard**: We first introduce **EditReward-Bench**, a new public benchmark for the direct and reliable evaluation of reward models. It features 13 diverse subtasks and expert human annotations, establishing a gold standard for measuring reward signal quality.
68
+
69
+ - **A Powerful & Versatile Tool**: Guided by our benchmark, we developed the **EditScore** model series. Through meticulous data curation and an effective self-ensembling strategy, EditScore sets a new state of the art for open-source reward models, even surpassing the accuracy of leading proprietary VLMs.
70
+
71
+ <p align="center">
72
+ <img src="https://raw.githubusercontent.com/VectorSpaceLab/EditScore/refs/heads/main/assets/table_reward_model_results.png" width="95%">
73
+ <br>
74
+ <em>Benchmark results on EditReward-Bench.</em>
75
+ </p>
76
+
77
+ We demonstrate the practical utility of EditScore through two key applications:
78
+
79
+ - **As a State-of-the-Art Reranker**: Use EditScore to perform Best-of-*N* selection and instantly improve the output quality of diverse editing models.
80
+ - **As a High-Fidelity Reward for RL**: Use EditScore as a robust reward signal to fine-tune models via RL, enabling stable training and unlocking significant performance gains where general-purpose VLMs fail.
81
+
82
+ This repository releases both the **EditScore** models and the **EditReward-Bench** dataset to facilitate future research in reward modeling, policy optimization, and AI-driven model improvement.
83
+
84
+ <p align="center">
85
+ <img src="https://raw.githubusercontent.com/VectorSpaceLab/EditScore/refs/heads/main/assets/figure_edit_results.png" width="95%">
86
+ <br>
87
+ <em>EditScore as a superior reward signal for image editing.</em>
88
+ </p>
89
+
90
+
91
+ ## 📌 TODO
92
+ We are actively working on improving EditScore and expanding its capabilities. Here's what's next:
93
+
94
+ - [x] Qwen3-VL variants of EditScore.
95
+ - [x] Release training data for reward model and online RL.
96
+ - [x] Release RL training code applying EditScore to OmniGen2.
97
+ - [x] Provide Best-of-N inference scripts for OmniGen2, Flux-dev-Kontext, and Qwen-Image-Edit.
98
+
99
+ ## 🚀 Quick Start
100
+
101
+ ### 🛠️ Environment Setup
102
+ We offer two ways to install EditScore. Choose the one that best fits your needs.
103
+ **Method 1: Install from PyPI (Recommended for Users)**: If you want to use EditScore as a library in your own project.
104
+ **Method 2: Install from Source (For Developers)**: If you plan to contribute to the code, modify it, or run the examples in this repository
105
+
106
+ #### Prerequisites: Installing PyTorch
107
+ Both installation methods require PyTorch to be installed first, as its version is dependent on your system's CUDA setup.
108
+ ```bash
109
+ # (Optional) Create a clean Python environment
110
+ conda create -n editscore python=3.12
111
+ conda activate editscore
112
+
113
+ # Choose the command that matches your CUDA version.
114
+ # This example is for CUDA 12.6.
115
+ pip install torch==2.7.1 torchvision --extra-index-url https://download.pytorch.org/whl/cu126
116
+ ````
117
+
118
+ <details>
119
+ <summary>🌏 For users in Mainland China</summary>
120
+ ```bash
121
+ # Install PyTorch from a domestic mirror
122
+ pip install torch==2.7.1 torchvision --index-url https://mirror.sjtu.edu.cn/pytorch-wheels/cu126
123
+ ```
124
+ </details>
125
+
126
+ #### Method 1: Install from PyPI (Recommended for Users)
127
+ ```bash
128
+ pip install -U editscore
129
+ ```
130
+
131
+ #### Method 2: Install from Source (For Developers)
132
+ This method gives you a local, editable version of the project.
133
+ 1. Clone the repository
134
+ ```bash
135
+ git clone https://github.com/VectorSpaceLab/EditScore.git
136
+ cd EditScore
137
+ ```
138
+
139
+ 2. Install EditScore in editable mode
140
+ ```bash
141
+ pip install -e .
142
+ ```
143
+
144
+ #### ✅ (Recommended) Install Optional High-Performance Dependencies
145
+ For the best performance, especially during inference, we highly recommend installing vllm.
146
+ ```bash
147
+ pip install -U vllm
148
+ ```
149
 
150
+ ---
151
 
152
+ ### 🧪 Usage Example
153
+ Using EditScore is straightforward. The model will be automatically downloaded from the Hugging Face Hub on its first run.
154
+ ```python
155
+ from PIL import Image
156
+ from editscore import EditScore
157
+
158
+ # Load the EditScore model. It will be downloaded automatically.
159
+ # Replace with the specific model version you want to use.
160
+ model_path = "Qwen/Qwen3-VL-4B-Instruct"
161
+ lora_path = "EditScore/EditScore-Qwen3-VL-4B-Instruct"
162
+
163
+ scorer = EditScore(
164
+ backbone="qwen3vl", # set to "qwen3vl_vllm" for faster inference
165
+ model_name_or_path=model_path,
166
+ lora_path=lora_path,
167
+ score_range=25,
168
+ num_pass=1, # Increase for better performance via self-ensembling
169
+ )
170
+
171
+ # Below is Qwen2.5-VL version
172
+
173
+ # model_path = "Qwen/Qwen2.5-VL-7B-Instruct"
174
+ # lora_path = "EditScore/EditScore-7B"
175
+
176
+ # scorer = EditScore(
177
+ # backbone="qwen25vl", # set to "qwen25vl_vllm" for faster inference
178
+ # model_name_or_path=model_path,
179
+ # lora_path=lora_path,
180
+ # score_range=25,
181
+ # num_pass=1, # Increase for better performance via self-ensembling
182
+ # )
183
+
184
+ input_image = Image.open("example_images/input.png")
185
+ output_image = Image.open("example_images/output.png")
186
+ instruction = "Adjust the background to a glass wall."
187
+
188
+ result = scorer.evaluate([input_image, output_image], instruction)
189
+ print(f"Edit Score: {result['overall']}")
190
+ # Expected output: A dictionary containing the final score and other details.
191
+ ```
192
 
193
+ ---
194
 
195
+ ## 📊 Benchmark Your Image-Editing Reward Model
196
+ #### Install benchmark dependencies
197
+ To use example code for benchmark, run following
198
+ ```bash
199
+ pip install -r requirements.txt
200
+ ```
201
+
202
+ We provide an evaluation script to benchmark reward models on **EditReward-Bench**. To evaluate your own custom reward model, simply create a scorer class with a similar interface and update the script.
203
+ ```bash
204
+ # This script will evaluate the default EditScore model on the benchmark
205
+ bash evaluate.sh
206
+
207
+ # Or speed up inference with VLLM
208
+ bash evaluate_vllm.sh
209
+ ```
210
+
211
+ ## Apply EditScore to Image Editing
212
+ We offer two example use cases for your exploration:
213
+ - **Best-of-N selection**: Use EditScore to automatically pick the most preferred image among multiple candidates.
214
+ - **Reinforcement fine-tuning**: Use EditScore as a reward model to guide RL-based optimization.
215
+
216
+ For detailed instructions and examples, please refer to the [documentation](examples/OmniGen2-RL/README.md).
217
+
218
+ ## ❤️ Citing Us
219
+ If you find this repository or our work useful, please consider giving a star ⭐ and citation 🦖, which would be greatly appreciated:
220
+
221
+ ```bibtex
222
+ @article{luo2025editscore,
223
+ title={EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling},
224
+ author={Xin Luo and Jiahao Wang and Chenyuan Wu and Shitao Xiao and Xiyan Jiang and Defu Lian and Jiajun Zhang and Dong Liu and Zheng Liu},
225
+ journal={arXiv preprint arXiv:2509.23909},
226
+ year={2025}
227
+ }
228
+ ```