Update README.md
Browse files
README.md
CHANGED
|
@@ -114,6 +114,28 @@ All results below, except those for `Xwen-7B-Chat`, are sourced from [Arena-Hard
|
|
| 114 |
| Starling-LM-7B-beta π | 26.1 | (-2.6, 2.0) |
|
| 115 |
|
| 116 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 117 |
|
| 118 |
## References
|
| 119 |
|
|
|
|
| 114 |
| Starling-LM-7B-beta π | 26.1 | (-2.6, 2.0) |
|
| 115 |
|
| 116 |
|
| 117 |
+
### 3.2 AlignBench-v1.1
|
| 118 |
+
|
| 119 |
+
> [!IMPORTANT]
|
| 120 |
+
> We replaced the original judge model, `GPT-4-0613`, in AlignBench with the more powerful model, `GPT-4o-0513`. To keep fairness, all the results below are generated by ``GPT-4o-0513``. As a result, the following results may differ from the AlignBench-v1.1 scores reported elsewhere.
|
| 121 |
+
|
| 122 |
+
| | Score |
|
| 123 |
+
| ------------------ | -------- |
|
| 124 |
+
| **Xwen-7B-Chat** π | **6.88** |
|
| 125 |
+
| Qwen2.5-7B-Chat π | 6.56 |
|
| 126 |
+
|
| 127 |
+
### 3.3 MT-Bench
|
| 128 |
+
|
| 129 |
+
> [!IMPORTANT]
|
| 130 |
+
> We replaced the original judge model, `GPT-4`, in MT-Bench with the more powerful model, `GPT-4o-0513`. To keep fairness, all the results below are generated by ``GPT-4o-0513``. As a result, the following results may differ from the MT-Bench scores reported elsewhere.
|
| 131 |
+
|
| 132 |
+
| | Score |
|
| 133 |
+
| ------------------ | -------- |
|
| 134 |
+
| **Xwen-7B-Chat** π | **7.98** |
|
| 135 |
+
| Qwen2.5-7B-Chat π | 7.71 |
|
| 136 |
+
|
| 137 |
+
|
| 138 |
+
|
| 139 |
|
| 140 |
## References
|
| 141 |
|