Safetensors
qwen2
whlll commited on
Commit
224f91d
·
verified ·
1 Parent(s): 8b8b2d8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -67,8 +67,8 @@ table th, table td {
67
  <tr>
68
  <td><b>🚀TinyR1-32B(Ours)</b></td>
69
  <td><b>32B</b></td>
70
- <td><b>88.5</b></td>
71
- <td><b>82.9</b></td>
72
  <td><b>69.4</b></td>
73
  <td><b>70.4</b></td>
74
  <td><b>89.2</b></td>
@@ -92,7 +92,7 @@ To address this challenge, we propose the **Control Token method**—introducing
92
 
93
  After **only 20K** high-quality fine-tuning samples and three rounds of SFT training, TinyR1-32B has achieved comprehensive improvements in reasoning, instruction-following, and safety, **surpassing Qwen3-32B** across core performance dimensions. In particular, its instruction-following and safety performance significantly **exceed the full-strength DeepSeek-R1-0528**. Specifically:
94
 
95
- - **Reasoning ability**: Achieves **93%** of DeepSeek-R1-0528’s performance across mathematics, science, and coding reasoning tasks.
96
  - **General alignment**: Achieves a score of **89.2** on IFEval, significantly higher than DeepSeek-R1-0528’s 80.9.
97
  - **Safety alignment**: Achieves a Constructive Safety score of **nearly 90**, far surpassing other large open-source models. TinyR1-32B not only ensures safety but also provides constructive and positive safety guidance, moving beyond the simple paradigm of refusal.
98
 
 
67
  <tr>
68
  <td><b>🚀TinyR1-32B(Ours)</b></td>
69
  <td><b>32B</b></td>
70
+ <td><b>90.9</b></td>
71
+ <td><b>82.7</b></td>
72
  <td><b>69.4</b></td>
73
  <td><b>70.4</b></td>
74
  <td><b>89.2</b></td>
 
92
 
93
  After **only 20K** high-quality fine-tuning samples and three rounds of SFT training, TinyR1-32B has achieved comprehensive improvements in reasoning, instruction-following, and safety, **surpassing Qwen3-32B** across core performance dimensions. In particular, its instruction-following and safety performance significantly **exceed the full-strength DeepSeek-R1-0528**. Specifically:
94
 
95
+ - **Reasoning ability**: Achieves **94%** of DeepSeek-R1-0528’s performance across mathematics, science, and coding reasoning tasks.
96
  - **General alignment**: Achieves a score of **89.2** on IFEval, significantly higher than DeepSeek-R1-0528’s 80.9.
97
  - **Safety alignment**: Achieves a Constructive Safety score of **nearly 90**, far surpassing other large open-source models. TinyR1-32B not only ensures safety but also provides constructive and positive safety guidance, moving beyond the simple paradigm of refusal.
98