Upload folder using huggingface_hub
Browse files
README.md
CHANGED
|
@@ -26,13 +26,15 @@ print(output["generated_text"])
|
|
| 26 |
```
|
| 27 |
|
| 28 |
## Evals
|
| 29 |
-
Referring this [blog post](https://datawizz.ai/blog/grpo-fine-tuning-qwen-0-5b-vs-openai-o1-preview), used a similar evaluation method
|
| 30 |
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
|
|
| 34 |
-
|
| 35 |
-
|
|
|
|
|
|
|
|
| 36 |
|
| 37 |
## Quick start
|
| 38 |
|
|
|
|
| 26 |
```
|
| 27 |
|
| 28 |
## Evals
|
| 29 |
+
Referring this [blog post](https://datawizz.ai/blog/grpo-fine-tuning-qwen-0-5b-vs-openai-o1-preview), used a similar evaluation method.
|
| 30 |
|
| 31 |
+
However, since llm-judge was being used in one of the reward functions, I tried with different models as judges and observed the changes.
|
| 32 |
+
|
| 33 |
+
| Model | Average ROUGE-L | LLM-Judge Model |
|
| 34 |
+
|-------|-----------------|-----------------|
|
| 35 |
+
| Qwen-0.5B finetuned | 0.3313 | Qwen-0.5B |
|
| 36 |
+
| SmolLM2-360M-GRPO-v0 | 0.1644 | llama3.2:1B |
|
| 37 |
+
| SmolLM2-360M-GRPO-v1 | 0.1672 | deepseek-r1:1.5b |
|
| 38 |
|
| 39 |
## Quick start
|
| 40 |
|