Update README.md
Browse files
README.md
CHANGED
|
@@ -11,8 +11,6 @@ metrics:
|
|
| 11 |
- accuracy
|
| 12 |
---
|
| 13 |
|
| 14 |
-
### Evaluation
|
| 15 |
-
**Evaluation is coming soon, stay tuned.**
|
| 16 |
|
| 17 |
# ✨ Klear-Reasoner-8B
|
| 18 |
We present Klear-Reasoner, a model with long reasoning capabilities that demonstrates careful deliberation during problem solving, achieving outstanding performance across multiple benchmarks. We investigate two key issues with current clipping mechanisms in RL: Clipping suppresses critical exploration signals and ignores suboptimal trajectories. To address these challenges, we propose **G**radient-**P**reserving clipping **P**olicy **O**ptimization (**GPPO**) that gently backpropagates gradients from clipped tokens.
|
|
@@ -47,6 +45,8 @@ The model combines:
|
|
| 47 |
|
| 48 |
---
|
| 49 |
|
|
|
|
|
|
|
| 50 |
|
| 51 |
## 📊 Benchmark Results (Pass@1)
|
| 52 |
|
|
|
|
| 11 |
- accuracy
|
| 12 |
---
|
| 13 |
|
|
|
|
|
|
|
| 14 |
|
| 15 |
# ✨ Klear-Reasoner-8B
|
| 16 |
We present Klear-Reasoner, a model with long reasoning capabilities that demonstrates careful deliberation during problem solving, achieving outstanding performance across multiple benchmarks. We investigate two key issues with current clipping mechanisms in RL: Clipping suppresses critical exploration signals and ignores suboptimal trajectories. To address these challenges, we propose **G**radient-**P**reserving clipping **P**olicy **O**ptimization (**GPPO**) that gently backpropagates gradients from clipped tokens.
|
|
|
|
| 45 |
|
| 46 |
---
|
| 47 |
|
| 48 |
+
### Evaluation
|
| 49 |
+
**Evaluation is coming soon, stay tuned.**
|
| 50 |
|
| 51 |
## 📊 Benchmark Results (Pass@1)
|
| 52 |
|