Update README.md
Browse files
README.md
CHANGED
|
@@ -10,8 +10,6 @@ tags:
|
|
| 10 |
- competition
|
| 11 |
license: apache-2.0
|
| 12 |
pipeline_tag: text-generation
|
| 13 |
-
base_model:
|
| 14 |
-
- Qwen/Qwen3-235B-A22B-Thinking-2507
|
| 15 |
---
|
| 16 |
|
| 17 |
<div align="center">
|
|
@@ -33,7 +31,7 @@ base_model:
|
|
| 33 |
|
| 34 |
## Model Description
|
| 35 |
|
| 36 |
-
**P1-235B-A22B** is the flagship model of the P1 series, a state-of-the-art open-source large language model specialized in physics reasoning.
|
| 37 |
|
| 38 |
### Key Highlights
|
| 39 |
|
|
@@ -70,6 +68,19 @@ base_model:
|
|
| 70 |
|
| 71 |
</div>
|
| 72 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 73 |
|
| 74 |
## Usage
|
| 75 |
|
|
@@ -121,4 +132,4 @@ print(solution)
|
|
| 121 |
}
|
| 122 |
```
|
| 123 |
|
| 124 |
-
</div>
|
|
|
|
| 10 |
- competition
|
| 11 |
license: apache-2.0
|
| 12 |
pipeline_tag: text-generation
|
|
|
|
|
|
|
| 13 |
---
|
| 14 |
|
| 15 |
<div align="center">
|
|
|
|
| 31 |
|
| 32 |
## Model Description
|
| 33 |
|
| 34 |
+
**P1-235B-A22B** is the flagship model of the P1 series, a state-of-the-art open-source large language model specialized in physics reasoning. Built on *Qwen3-235B-A22B-Thinking-2507* and tuned through multi-stage reinforcement learning on curated physics competition data, P1-235B-A22B marks a historic achievement as the first open-source model to win gold at the International Physics Olympiad (IPhO 2025).
|
| 35 |
|
| 36 |
### Key Highlights
|
| 37 |
|
|
|
|
| 68 |
|
| 69 |
</div>
|
| 70 |
|
| 71 |
+
### Generalization to STEM Tasks
|
| 72 |
+
|
| 73 |
+
P1-235B-A22B demonstrates excellent general capabilities across various benchmarks. As shown below, P1-235B-A22B achieves better performance than its base model Qwen3-235B-A22B-Thinking-2507 on multiple tasks, further validating the strong generalization of P1 series models.
|
| 74 |
+
|
| 75 |
+
<div align="center">
|
| 76 |
+
|
| 77 |
+
| Model | AIME24 | AIME25 | HMMT | GPQA | HLE | LiveCodeBench | LiveBench |
|
| 78 |
+
|:-----:|:------:|:------:|:----:|:----:|:---:|:-------------:|:---------:|
|
| 79 |
+
| Qwen3-235B-A22B-Thinking-2507 (Base) | 94.6 | 94.2 | 81.7 | 79.4 | 17.5 | 76.2 | 80.3 |
|
| 80 |
+
| **P1-235B-A22B** | **95.0** | **95.0** | **80.8** | **81.4** | **19.1** | **75.8** | **79.8** |
|
| 81 |
+
|
| 82 |
+
</div>
|
| 83 |
+
|
| 84 |
|
| 85 |
## Usage
|
| 86 |
|
|
|
|
| 132 |
}
|
| 133 |
```
|
| 134 |
|
| 135 |
+
</div>
|