Fix metrics table
Browse files
README.md
CHANGED
|
@@ -114,116 +114,17 @@ We advise adding the `rope_scaling` configuration only when processing long cont
|
|
| 114 |
|
| 115 |
### Comparison of Qiskit models across benchmarks
|
| 116 |
|
| 117 |
-
|
| 118 |
-
|
| 119 |
-
|
| 120 |
-
|
| 121 |
-
|
| 122 |
-
|
| 123 |
-
|
| 124 |
-
border-radius: 12px;
|
| 125 |
-
overflow: hidden;
|
| 126 |
-
table-layout: auto;
|
| 127 |
-
box-sizing: border-box;
|
| 128 |
-
margin: 16px 0;
|
| 129 |
-
"
|
| 130 |
-
>
|
| 131 |
-
<thead>
|
| 132 |
-
<tr>
|
| 133 |
-
<th style="text-align:left; padding:12px 16px; background:linear-gradient(90deg,#f6f8fb,#eef3f9); color:#0b1220; font-weight:700; border-bottom:1px solid rgba(15,23,42,0.06);">
|
| 134 |
-
Model
|
| 135 |
-
</th>
|
| 136 |
-
<th style="text-align:center; padding:12px 16px; background:linear-gradient(90deg,#f6f8fb,#eef3f9); color:#0b1220; font-weight:700; border-bottom:1px solid rgba(15,23,42,0.06);">
|
| 137 |
-
QiskitHumanEval-Hard
|
| 138 |
-
</th>
|
| 139 |
-
<th style="text-align:center; padding:12px 16px; background:linear-gradient(90deg,#f6f8fb,#eef3f9); color:#0b1220; font-weight:700; border-bottom:1px solid rgba(15,23,42,0.06);">
|
| 140 |
-
QiskitHumanEval
|
| 141 |
-
</th>
|
| 142 |
-
<th style="text-align:center; padding:12px 16px; background:linear-gradient(90deg,#f6f8fb,#eef3f9); color:#0b1220; font-weight:700; border-bottom:1px solid rgba(15,23,42,0.06);">
|
| 143 |
-
HumanEval
|
| 144 |
-
</th>
|
| 145 |
-
<th style="text-align:center; padding:12px 16px; background:linear-gradient(90deg,#f6f8fb,#eef3f9); color:#0b1220; font-weight:700; border-bottom:1px solid rgba(15,23,42,0.06);">
|
| 146 |
-
ASDiv
|
| 147 |
-
</th>
|
| 148 |
-
<th style="text-align:center; padding:12px 16px; background:linear-gradient(90deg,#f6f8fb,#eef3f9); color:#0b1220; font-weight:700; border-bottom:1px solid rgba(15,23,42,0.06);">
|
| 149 |
-
MathQA
|
| 150 |
-
</th>
|
| 151 |
-
<th style="text-align:center; padding:12px 16px; background:linear-gradient(90deg,#f6f8fb,#eef3f9); color:#0b1220; font-weight:700; border-bottom:1px solid rgba(15,23,42,0.06);">
|
| 152 |
-
SciQ
|
| 153 |
-
</th>
|
| 154 |
-
<th style="text-align:center; padding:12px 16px; background:linear-gradient(90deg,#f6f8fb,#eef3f9); color:#0b1220; font-weight:700; border-bottom:1px solid rgba(15,23,42,0.06);">
|
| 155 |
-
MBPP
|
| 156 |
-
</th>
|
| 157 |
-
<th style="text-align:center; padding:12px 16px; background:linear-gradient(90deg,#f6f8fb,#eef3f9); color:#0b1220; font-weight:700; border-bottom:1px solid rgba(15,23,42,0.06);">
|
| 158 |
-
IFEval
|
| 159 |
-
</th>
|
| 160 |
-
<th style="text-align:center; padding:12px 16px; background:linear-gradient(90deg,#f6f8fb,#eef3f9); color:#0b1220; font-weight:700; border-bottom:1px solid rgba(15,23,42,0.06);">
|
| 161 |
-
CrowsPairs (English)
|
| 162 |
-
</th>
|
| 163 |
-
<th style="text-align:center; padding:12px 16px; background:linear-gradient(90deg,#f6f8fb,#eef3f9); color:#0b1220; font-weight:700; border-bottom:1px solid rgba(15,23,42,0.06);">
|
| 164 |
-
TruthfulQA (MC1 acc)
|
| 165 |
-
</th>
|
| 166 |
-
</tr>
|
| 167 |
-
</thead>
|
| 168 |
-
<tbody>
|
| 169 |
-
<tr style="background:#f7fafc;">
|
| 170 |
-
<td style="padding:12px 16px; font-weight:700; color:#07102a;">Qwen2.5-Coder-14B-Qiskit</td>
|
| 171 |
-
<td style="padding:12px 16px; text-align:center; font-weight:700; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">25.17</td>
|
| 172 |
-
<td style="padding:12px 16px; text-align:center; font-weight:700; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">49.01</td>
|
| 173 |
-
<td style="padding:12px 16px; text-align:center; font-weight:700; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">91.46</td>
|
| 174 |
-
<td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">4.21</td>
|
| 175 |
-
<td style="padding:12px 16px; text-align:center; font-weight:700; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">53.90</td>
|
| 176 |
-
<td style="padding:12px 16px; text-align:center; font-weight:700; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">97.00</td>
|
| 177 |
-
<td style="padding:12px 16px; text-align:center; font-weight:700; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">77.60</td>
|
| 178 |
-
<td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">49.64</td>
|
| 179 |
-
<td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">65.18</td>
|
| 180 |
-
<td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">37.82</td>
|
| 181 |
-
</tr>
|
| 182 |
-
<tr style="background:#ffffff;">
|
| 183 |
-
<td style="padding:12px 16px; color:#0f172a;">mistral-small-3.2-24b-qiskit</td>
|
| 184 |
-
<td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">20.53</td>
|
| 185 |
-
<td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">40.39</td>
|
| 186 |
-
<td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">77.49</td>
|
| 187 |
-
<td style="padding:12px 16px; text-align:center; font-weight:700; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">20.69</td>
|
| 188 |
-
<td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">53.40</td>
|
| 189 |
-
<td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">96.40</td>
|
| 190 |
-
<td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">63.40</td>
|
| 191 |
-
<td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">31.66</td>
|
| 192 |
-
<td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">67.56</td>
|
| 193 |
-
<td style="padding:12px 16px; text-align:center; font-weight:700; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">42.84</td>
|
| 194 |
-
</tr>
|
| 195 |
-
<tr style="background:#ffffff;">
|
| 196 |
-
<td style="padding:12px 16px; color:#0f172a;">granite-3.3-8b-qiskit</td>
|
| 197 |
-
<td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">14.57</td>
|
| 198 |
-
<td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">27.15</td>
|
| 199 |
-
<td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">62.80</td>
|
| 200 |
-
<td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">0.48</td>
|
| 201 |
-
<td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">38.66</td>
|
| 202 |
-
<td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">93.30</td>
|
| 203 |
-
<td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">52.40</td>
|
| 204 |
-
<td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">59.71</td>
|
| 205 |
-
<td style="padding:12px 16px; text-align:center; font-weight:700; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">59.75</td>
|
| 206 |
-
<td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">39.05</td>
|
| 207 |
-
</tr>
|
| 208 |
-
<tr style="background:#fbfdff;">
|
| 209 |
-
<td style="padding:12px 16px; color:#0f172a;">granite-3.2-8b-qiskit</td>
|
| 210 |
-
<td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">9.93</td>
|
| 211 |
-
<td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">24.50</td>
|
| 212 |
-
<td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">57.32</td>
|
| 213 |
-
<td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">0.09</td>
|
| 214 |
-
<td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">41.41</td>
|
| 215 |
-
<td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">96.30</td>
|
| 216 |
-
<td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">51.80</td>
|
| 217 |
-
<td style="padding:12px 16px; text-align:center; font-weight:700; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">60.79</td>
|
| 218 |
-
<td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">66.79</td>
|
| 219 |
-
<td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">40.51</td>
|
| 220 |
-
</tr>
|
| 221 |
-
</tbody>
|
| 222 |
-
</table>
|
| 223 |
-
|
| 224 |
*Note: All models listed in the benchmark table were evaluated using their respective system prompt, defined in their Hugging Face model.*
|
| 225 |
|
| 226 |
|
|
|
|
| 227 |
## Training Data
|
| 228 |
|
| 229 |
- **Data Collection and Filtering:** Our code data is sourced from a combination of publicly available datasets (e.g., Code available on <https://github.com>), and additional synthetic data generated at IBM Quantum. We exclude code that is older than 2023.
|
|
|
|
| 114 |
|
| 115 |
### Comparison of Qiskit models across benchmarks
|
| 116 |
|
| 117 |
+
| **Model** | **QiskitHumanEval-Hard** | **QiskitHumanEval** | **HumanEval** | **ASDiv** | **MathQA** | **SciQ** | **MBPP** | **IFEval** | **CrowsPairs (English)** | **TruthfulQA (MC1 acc)** |
|
| 118 |
+
|-----------|---------------------------|----------------------|---------------|-----------|------------|----------|----------|------------|---------------------------|---------------------------|
|
| 119 |
+
| **qwen2.5-coder-14b-qiskit** | **25.17** | **49.01** | **91.46** | 4.21 | **53.90** | **97.00** | **77.60** | 49.64 | 65.18 | 37.82 |
|
| 120 |
+
| mistral-small-3.2-24b-qiskit | 20.53 | 40.39 | 77.49 | **20.69** | 53.40 | 96.40 | 63.40 | 31.66 | 67.56 | **42.84** |
|
| 121 |
+
| granite-3.3-8b-qiskit | 14.57 | 27.15 | 62.80 | 0.48 | 38.66 | 93.30 | 52.40 | 59.71 | **59.75** | 39.05 |
|
| 122 |
+
| granite-3.2-8b-qiskit | 9.93 | 24.50 | 57.32 | 0.09 | 41.41 | 96.30 | 51.80 | **60.79** | 66.79 | 40.51 |
|
| 123 |
+
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 124 |
*Note: All models listed in the benchmark table were evaluated using their respective system prompt, defined in their Hugging Face model.*
|
| 125 |
|
| 126 |
|
| 127 |
+
|
| 128 |
## Training Data
|
| 129 |
|
| 130 |
- **Data Collection and Filtering:** Our code data is sourced from a combination of publicly available datasets (e.g., Code available on <https://github.com>), and additional synthetic data generated at IBM Quantum. We exclude code that is older than 2023.
|