Safetensors
English
qwen3
Suu commited on
Commit
9ba09c8
·
verified ·
1 Parent(s): 4b85a24

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -11,8 +11,6 @@ metrics:
11
  - accuracy
12
  ---
13
 
14
- ### Evaluation
15
- **Evaluation is coming soon, stay tuned.**
16
 
17
  # ✨ Klear-Reasoner-8B
18
  We present Klear-Reasoner, a model with long reasoning capabilities that demonstrates careful deliberation during problem solving, achieving outstanding performance across multiple benchmarks. We investigate two key issues with current clipping mechanisms in RL: Clipping suppresses critical exploration signals and ignores suboptimal trajectories. To address these challenges, we propose **G**radient-**P**reserving clipping **P**olicy **O**ptimization (**GPPO**) that gently backpropagates gradients from clipped tokens.
@@ -47,6 +45,8 @@ The model combines:
47
 
48
  ---
49
 
 
 
50
 
51
  ## 📊 Benchmark Results (Pass@1)
52
 
 
11
  - accuracy
12
  ---
13
 
 
 
14
 
15
  # ✨ Klear-Reasoner-8B
16
  We present Klear-Reasoner, a model with long reasoning capabilities that demonstrates careful deliberation during problem solving, achieving outstanding performance across multiple benchmarks. We investigate two key issues with current clipping mechanisms in RL: Clipping suppresses critical exploration signals and ignores suboptimal trajectories. To address these challenges, we propose **G**radient-**P**reserving clipping **P**olicy **O**ptimization (**GPPO**) that gently backpropagates gradients from clipped tokens.
 
45
 
46
  ---
47
 
48
+ ### Evaluation
49
+ **Evaluation is coming soon, stay tuned.**
50
 
51
  ## 📊 Benchmark Results (Pass@1)
52