--- language: - en - multilingual tags: - physics - reinforcement-learning - olympiad - reasoning - competition license: apache-2.0 pipeline_tag: text-generation ---

P1: Mastering Physics Olympiads with Reinforcement Learning

🌐 P1 Project Page | 🏆 HiPhO Leaderboard

Achieving gold medal at the International Physics Olympiad (IPhO 2025)

## Model Description **P1-235B-A22B** is the flagship model of the P1 series, a state-of-the-art open-source large language model specialized in physics reasoning. Built on *Qwen3-235B-A22B-Thinking-2507* and tuned through multi-stage reinforcement learning on curated physics competition data, P1-235B-A22B marks a historic achievement as the first open-source model to win gold at the International Physics Olympiad (IPhO 2025). ### Key Highlights - 🏆 **IPhO 2025 Gold Medal**: First open-source model to achieve gold medal status (21.2/30 points) - 🥇 **HiPhO Benchmark Leader**: 12 gold medals and 1 silver medal across 13 top international physics contests - 🥇 **Overall Champion**: When paired with PhysicsMinions multi-agent system, achieves #1 ranking with 38.4 points, surpassing Gemini-2.5-Pro (37.7) and GPT-5 (37.4) ## Performance Benchmarks ### IPhO 2025 Results

| Model | Score | Medal | Rank | |:-----:|:-----:|:-----:|:----:| | **P1-235B-A22B + PhysicsMinions** | **23.2** | **🥇 Gold** | **1st** | | Gemini-2.5-Pro | 22.2 | 🥇 Gold | 2nd | | GPT-5 | 22.3 | 🥇 Gold | 3rdh | | **P1-235B-A22B** | **21.2** | **🥇 Gold** | **4th** |

### HiPhO Comprehensive Results

| Category | P1-235B-A22B | P1-235B-A22B + PhysicsMinions | Gemini-2.5-Pro | GPT-5 | |:--------:|:------------:|:-----------------------------:|:--------------:|:-----:| | **Overall Score** | **35.9** | **38.4** 🏆 | 37.7 | 37.4 | | Gold Medals (🥇) | 12 | 12 | 12 | 11 | | Silver Medals (🥈) | 1 | 1 | 1 | 2 | | Total Contests | 13 | 13 | 13 | 13 |

### Generalization to STEM Tasks P1-235B-A22B demonstrates excellent general capabilities across various benchmarks. As shown below, P1-235B-A22B achieves better performance than its base model Qwen3-235B-A22B-Thinking-2507 on multiple tasks, further validating the strong generalization of P1 series models.

| Model | AIME24 | AIME25 | HMMT | GPQA | HLE | LiveCodeBench | LiveBench | |:-----:|:------:|:------:|:----:|:----:|:---:|:-------------:|:---------:| | Qwen3-235B-A22B-Thinking-2507 (Base) | 94.6 | 94.2 | 81.7 | 79.4 | 17.5 | 76.2 | 80.3 | | **P1-235B-A22B** | **95.0** | **95.0** | **80.8** | **81.4** | **19.1** | **75.8** | **79.8** |

## Usage ### Basic Inference ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch # Load model and tokenizer model_name = "P1-235B-A22B" tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True ) # Physics problem solving prompt = """Solve this physics problem: A block of mass m = 2.0 kg slides down a rough incline at angle θ = 30° with coefficient of friction μ = 0.2. Calculate the acceleration of the block. Provide a detailed solution with reasoning steps.""" inputs = tokenizer(prompt, return_tensors="pt").to(model.device) outputs = model.generate( **inputs, max_length=81920, temperature=0.6, top_p=0.9, do_sample=True ) solution = tokenizer.decode(outputs[0], skip_special_tokens=True) print(solution) ``` ## Citation ```bibtex @misc{p1-2025, title={P1: Mastering Physics Olympiads with Reinforcement Learning}, author={P1 Team}, year={2025}, url={https://prime-rl.github.io/P1/} } ```