sky-2002 commited on
Commit
6f8c140
·
verified ·
1 Parent(s): 410e2ba

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +8 -6
README.md CHANGED
@@ -26,13 +26,15 @@ print(output["generated_text"])
26
  ```
27
 
28
  ## Evals
29
- Referring this [blog post](https://datawizz.ai/blog/grpo-fine-tuning-qwen-0-5b-vs-openai-o1-preview), used a similar evaluation method:
30
 
31
- | Model | Average ROUGE-L |
32
- |-------|-----------------|
33
- | Qwen-0.5B | 0.3313 |
34
- | SmolLM2-360M-GRPO-v0 | 0.1644 |
35
- | SmolLM2-360M-GRPO-v1 | 0.1672 |
 
 
36
 
37
  ## Quick start
38
 
 
26
  ```
27
 
28
  ## Evals
29
+ Referring this [blog post](https://datawizz.ai/blog/grpo-fine-tuning-qwen-0-5b-vs-openai-o1-preview), used a similar evaluation method.
30
 
31
+ However, since llm-judge was being used in one of the reward functions, I tried with different models as judges and observed the changes.
32
+
33
+ | Model | Average ROUGE-L | LLM-Judge Model |
34
+ |-------|-----------------|-----------------|
35
+ | Qwen-0.5B finetuned | 0.3313 | Qwen-0.5B |
36
+ | SmolLM2-360M-GRPO-v0 | 0.1644 | llama3.2:1B |
37
+ | SmolLM2-360M-GRPO-v1 | 0.1672 | deepseek-r1:1.5b |
38
 
39
  ## Quick start
40