Slim Frikha
commited on
fix rounding typos
Browse files
README.md
CHANGED
|
@@ -161,9 +161,9 @@ We report in the following table our internal pipeline benchmarks.
|
|
| 161 |
</tr>
|
| 162 |
<tr>
|
| 163 |
<td>GPQA (0-shot)</td>
|
| 164 |
-
<td><b>33.
|
| 165 |
-
<td>31.9</td>
|
| 166 |
<td>32</td>
|
|
|
|
| 167 |
</tr>
|
| 168 |
<tr>
|
| 169 |
<td>GPQA (0-shot, COT)</td>
|
|
@@ -174,13 +174,13 @@ We report in the following table our internal pipeline benchmarks.
|
|
| 174 |
<tr>
|
| 175 |
<td>MUSR (0-shot)</td>
|
| 176 |
<td>38.6</td>
|
| 177 |
-
<td>
|
| 178 |
<td><b>46.4</b></td>
|
| 179 |
</tr>
|
| 180 |
<tr>
|
| 181 |
<td>BBH (3-shot)</td>
|
| 182 |
<td>48.6</td>
|
| 183 |
-
<td><b>54.
|
| 184 |
<td>52.4</td>
|
| 185 |
</tr>
|
| 186 |
<tr>
|
|
|
|
| 161 |
</tr>
|
| 162 |
<tr>
|
| 163 |
<td>GPQA (0-shot)</td>
|
| 164 |
+
<td><b>33.5</b></td>
|
|
|
|
| 165 |
<td>32</td>
|
| 166 |
+
<td>31.9</td>
|
| 167 |
</tr>
|
| 168 |
<tr>
|
| 169 |
<td>GPQA (0-shot, COT)</td>
|
|
|
|
| 174 |
<tr>
|
| 175 |
<td>MUSR (0-shot)</td>
|
| 176 |
<td>38.6</td>
|
| 177 |
+
<td>41</td>
|
| 178 |
<td><b>46.4</b></td>
|
| 179 |
</tr>
|
| 180 |
<tr>
|
| 181 |
<td>BBH (3-shot)</td>
|
| 182 |
<td>48.6</td>
|
| 183 |
+
<td><b>54.1</b></td>
|
| 184 |
<td>52.4</td>
|
| 185 |
</tr>
|
| 186 |
<tr>
|