---
language:
- en
- multilingual
tags:
- physics
- reinforcement-learning
- olympiad
- reasoning
- competition
license: apache-2.0
pipeline_tag: text-generation
---
P1: Mastering Physics Olympiads with Reinforcement Learning
🌐 P1 Project Page |
🏆 HiPhO Leaderboard
Achieving gold medal at the International Physics Olympiad (IPhO 2025)
## Model Description
**P1-235B-A22B** is the flagship model of the P1 series, a state-of-the-art open-source large language model specialized in physics reasoning. Built on *Qwen3-235B-A22B-Thinking-2507* and tuned through multi-stage reinforcement learning on curated physics competition data, P1-235B-A22B marks a historic achievement as the first open-source model to win gold at the International Physics Olympiad (IPhO 2025).
### Key Highlights
- 🏆 **IPhO 2025 Gold Medal**: First open-source model to achieve gold medal status (21.2/30 points)
- 🥇 **HiPhO Benchmark Leader**: 12 gold medals and 1 silver medal across 13 top international physics contests
- 🥇 **Overall Champion**: When paired with PhysicsMinions multi-agent system, achieves #1 ranking with 38.4 points, surpassing Gemini-2.5-Pro (37.7) and GPT-5 (37.4)
## Performance Benchmarks
### IPhO 2025 Results
| Model | Score | Medal | Rank |
|:-----:|:-----:|:-----:|:----:|
| **P1-235B-A22B + PhysicsMinions** | **23.2** | **🥇 Gold** | **1st** |
| Gemini-2.5-Pro | 22.2 | 🥇 Gold | 2nd |
| GPT-5 | 22.3 | 🥇 Gold | 3rdh |
| **P1-235B-A22B** | **21.2** | **🥇 Gold** | **4th** |
### HiPhO Comprehensive Results
| Category | P1-235B-A22B | P1-235B-A22B + PhysicsMinions | Gemini-2.5-Pro | GPT-5 |
|:--------:|:------------:|:-----------------------------:|:--------------:|:-----:|
| **Overall Score** | **35.9** | **38.4** 🏆 | 37.7 | 37.4 |
| Gold Medals (🥇) | 12 | 12 | 12 | 11 |
| Silver Medals (🥈) | 1 | 1 | 1 | 2 |
| Total Contests | 13 | 13 | 13 | 13 |
### Generalization to STEM Tasks
P1-235B-A22B demonstrates excellent general capabilities across various benchmarks. As shown below, P1-235B-A22B achieves better performance than its base model Qwen3-235B-A22B-Thinking-2507 on multiple tasks, further validating the strong generalization of P1 series models.
| Model | AIME24 | AIME25 | HMMT | GPQA | HLE | LiveCodeBench | LiveBench |
|:-----:|:------:|:------:|:----:|:----:|:---:|:-------------:|:---------:|
| Qwen3-235B-A22B-Thinking-2507 (Base) | 94.6 | 94.2 | 81.7 | 79.4 | 17.5 | 76.2 | 80.3 |
| **P1-235B-A22B** | **95.0** | **95.0** | **80.8** | **81.4** | **19.1** | **75.8** | **79.8** |
## Usage
### Basic Inference
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Load model and tokenizer
model_name = "P1-235B-A22B"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True
)
# Physics problem solving
prompt = """Solve this physics problem:
A block of mass m = 2.0 kg slides down a rough incline at angle θ = 30°
with coefficient of friction μ = 0.2. Calculate the acceleration of the block.
Provide a detailed solution with reasoning steps."""
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_length=81920,
temperature=0.6,
top_p=0.9,
do_sample=True
)
solution = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(solution)
```
## Citation
```bibtex
@misc{p1-2025,
title={P1: Mastering Physics Olympiads with Reinforcement Learning},
author={P1 Team},
year={2025},
url={https://prime-rl.github.io/P1/}
}
```