Update README.md
Browse files
README.md
CHANGED
|
@@ -67,8 +67,8 @@ table th, table td {
|
|
| 67 |
<tr>
|
| 68 |
<td><b>🚀TinyR1-32B(Ours)</b></td>
|
| 69 |
<td><b>32B</b></td>
|
| 70 |
-
<td><b>
|
| 71 |
-
<td><b>82.
|
| 72 |
<td><b>69.4</b></td>
|
| 73 |
<td><b>70.4</b></td>
|
| 74 |
<td><b>89.2</b></td>
|
|
@@ -92,7 +92,7 @@ To address this challenge, we propose the **Control Token method**—introducing
|
|
| 92 |
|
| 93 |
After **only 20K** high-quality fine-tuning samples and three rounds of SFT training, TinyR1-32B has achieved comprehensive improvements in reasoning, instruction-following, and safety, **surpassing Qwen3-32B** across core performance dimensions. In particular, its instruction-following and safety performance significantly **exceed the full-strength DeepSeek-R1-0528**. Specifically:
|
| 94 |
|
| 95 |
-
- **Reasoning ability**: Achieves **
|
| 96 |
- **General alignment**: Achieves a score of **89.2** on IFEval, significantly higher than DeepSeek-R1-0528’s 80.9.
|
| 97 |
- **Safety alignment**: Achieves a Constructive Safety score of **nearly 90**, far surpassing other large open-source models. TinyR1-32B not only ensures safety but also provides constructive and positive safety guidance, moving beyond the simple paradigm of refusal.
|
| 98 |
|
|
|
|
| 67 |
<tr>
|
| 68 |
<td><b>🚀TinyR1-32B(Ours)</b></td>
|
| 69 |
<td><b>32B</b></td>
|
| 70 |
+
<td><b>90.9</b></td>
|
| 71 |
+
<td><b>82.7</b></td>
|
| 72 |
<td><b>69.4</b></td>
|
| 73 |
<td><b>70.4</b></td>
|
| 74 |
<td><b>89.2</b></td>
|
|
|
|
| 92 |
|
| 93 |
After **only 20K** high-quality fine-tuning samples and three rounds of SFT training, TinyR1-32B has achieved comprehensive improvements in reasoning, instruction-following, and safety, **surpassing Qwen3-32B** across core performance dimensions. In particular, its instruction-following and safety performance significantly **exceed the full-strength DeepSeek-R1-0528**. Specifically:
|
| 94 |
|
| 95 |
+
- **Reasoning ability**: Achieves **94%** of DeepSeek-R1-0528’s performance across mathematics, science, and coding reasoning tasks.
|
| 96 |
- **General alignment**: Achieves a score of **89.2** on IFEval, significantly higher than DeepSeek-R1-0528’s 80.9.
|
| 97 |
- **Safety alignment**: Achieves a Constructive Safety score of **nearly 90**, far surpassing other large open-source models. TinyR1-32B not only ensures safety but also provides constructive and positive safety guidance, moving beyond the simple paradigm of refusal.
|
| 98 |
|