qihoo360
/

TinyR1-32B

Model card Files Files and versions

whlll commited on Sep 28

Commit

224f91d

·

verified ·

1 Parent(s): 8b8b2d8

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -67,8 +67,8 @@ table th, table td {
     <tr>
       <td><b>🚀TinyR1-32B(Ours)</b></td>
       <td><b>32B</b></td>
-      <td><b>88.5</b></td>
-      <td><b>82.9</b></td>
       <td><b>69.4</b></td>
       <td><b>70.4</b></td>
       <td><b>89.2</b></td>
@@ -92,7 +92,7 @@ To address this challenge, we propose the **Control Token method**—introducing
 After **only 20K** high-quality fine-tuning samples and three rounds of SFT training, TinyR1-32B has achieved comprehensive improvements in reasoning, instruction-following, and safety, **surpassing Qwen3-32B** across core performance dimensions. In particular, its instruction-following and safety performance significantly **exceed the full-strength DeepSeek-R1-0528**. Specifically:
-- **Reasoning ability**: Achieves **93%** of DeepSeek-R1-0528’s performance across mathematics, science, and coding reasoning tasks.
 - **General alignment**: Achieves a score of **89.2** on IFEval, significantly higher than DeepSeek-R1-0528’s 80.9.
 - **Safety alignment**: Achieves a Constructive Safety score of **nearly 90**, far surpassing other large open-source models. TinyR1-32B not only ensures safety but also provides constructive and positive safety guidance, moving beyond the simple paradigm of refusal.

     <tr>
       <td><b>🚀TinyR1-32B(Ours)</b></td>
       <td><b>32B</b></td>
+      <td><b>90.9</b></td>
+      <td><b>82.7</b></td>
       <td><b>69.4</b></td>
       <td><b>70.4</b></td>
       <td><b>89.2</b></td>
 After **only 20K** high-quality fine-tuning samples and three rounds of SFT training, TinyR1-32B has achieved comprehensive improvements in reasoning, instruction-following, and safety, **surpassing Qwen3-32B** across core performance dimensions. In particular, its instruction-following and safety performance significantly **exceed the full-strength DeepSeek-R1-0528**. Specifically:
+- **Reasoning ability**: Achieves **94%** of DeepSeek-R1-0528’s performance across mathematics, science, and coding reasoning tasks.
 - **General alignment**: Achieves a score of **89.2** on IFEval, significantly higher than DeepSeek-R1-0528’s 80.9.
 - **Safety alignment**: Achieves a Constructive Safety score of **nearly 90**, far surpassing other large open-source models. TinyR1-32B not only ensures safety but also provides constructive and positive safety guidance, moving beyond the simple paradigm of refusal.