PRIME-RL
/

P1-235B-A22B

Text Generation

reinforcement-learning

Model card Files Files and versions

JC-Chen commited on 29 days ago

Commit

f7f6ff7

·

verified ·

1 Parent(s): 1e93960

Update README.md

Files changed (1) hide show

README.md +15 -4

README.md CHANGED Viewed

@@ -10,8 +10,6 @@ tags:
 - competition
 license: apache-2.0
 pipeline_tag: text-generation
-base_model:
-- Qwen/Qwen3-235B-A22B-Thinking-2507
 ---
 <div align="center">
@@ -33,7 +31,7 @@ base_model:
 ## Model Description
-**P1-235B-A22B** is the flagship model of the P1 series, a state-of-the-art open-source large language model specialized in physics reasoning. Developed through reinforcement learning on physics competition data, this model demonstrates exceptional performance on complex physics problems and marks a historic achievement as the first open-source model to win gold at the International Physics Olympiad (IPhO 2025). **P1-235B-A22B** is obtained from *Qwen3-235B-A22B-Thinking-2507* through multi-stage reinforcement learning fine-tuning on a curated dataset of physics competition problems.
 ### Key Highlights
@@ -70,6 +68,19 @@ base_model:
 </div>
 ## Usage
@@ -121,4 +132,4 @@ print(solution)
 }
 ```
-</div>

 - competition
 license: apache-2.0
 pipeline_tag: text-generation
 ---
 <div align="center">
 ## Model Description
+**P1-235B-A22B** is the flagship model of the P1 series, a state-of-the-art open-source large language model specialized in physics reasoning. Built on *Qwen3-235B-A22B-Thinking-2507* and tuned through multi-stage reinforcement learning on curated physics competition data, P1-235B-A22B marks a historic achievement as the first open-source model to win gold at the International Physics Olympiad (IPhO 2025).
 ### Key Highlights
 </div>
+### Generalization to STEM Tasks
+P1-235B-A22B demonstrates excellent general capabilities across various benchmarks. As shown below, P1-235B-A22B achieves better performance than its base model Qwen3-235B-A22B-Thinking-2507 on multiple tasks, further validating the strong generalization of P1 series models.
+<div align="center">
+| Model | AIME24 | AIME25 | HMMT | GPQA | HLE | LiveCodeBench | LiveBench |
+|:-----:|:------:|:------:|:----:|:----:|:---:|:-------------:|:---------:|
+| Qwen3-235B-A22B-Thinking-2507 (Base) | 94.6 | 94.2 | 81.7 | 79.4 | 17.5 | 76.2 | 80.3 |
+| **P1-235B-A22B** | **95.0** | **95.0** | **80.8** | **81.4** | **19.1** | **75.8** | **79.8** |
+</div>
 ## Usage
 }
 ```
+</div>