salzubi401 commited on
Commit
d153923
·
verified ·
1 Parent(s): a85a56a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -7
README.md CHANGED
@@ -143,13 +143,15 @@ This means that our community owns the fingerprints that they can use to verify
143
  **Dobby-Unhinged-Llama-3.3-70B** retains the base performance of Llama-3.3-70B-Instruct across the evaluated tasks.
144
 
145
  We use lm-eval-harness to evaluate between performance on models:
146
- | Benchmark | Llama3.3-70B-Instruct| Hermes3-3.1-70B| Dobby-Unhinged-Llama-3.3-70B|
147
- |-------------------------------------------------|----------------------|----------------|--------------------|
148
- | IFEVAL (inst_level_strict/loss avg) | 0.9340 | 0.8153 | 0.8543 |
149
- | MMLU-pro | 0.5474 | 0.4737 | 0.5499 |
150
- | GPQA (average among diamond, extended and main) | 0.3838 | 0.4040 | 0.3939 |
151
- | MuSR | 0.4881 | 0.5094 | 0.5053 |
152
- | BBH (average across all tasks) | 0.7018 | 0.6797 | 0.7021 |
 
 
153
 
154
  ### Freedom Bench
155
 
 
143
  **Dobby-Unhinged-Llama-3.3-70B** retains the base performance of Llama-3.3-70B-Instruct across the evaluated tasks.
144
 
145
  We use lm-eval-harness to evaluate between performance on models:
146
+
147
+ | Benchmark | Llama3.3-70B-Instruct | Dobby-Unhinged-Llama-3.3-70B |
148
+ |-------------------------------------------------|----------------------|--------------------|
149
+ | IFEVAL (inst_level_strict/loss avg) | 0.9340 | 0.8543 |
150
+ | MMLU-pro | 0.5474 | 0.5499 |
151
+ | GPQA (average among diamond, extended and main) | 0.3838 | 0.3939 |
152
+ | MuSR | 0.4881 | 0.5053 |
153
+ | BBH (average across all tasks) | 0.7018 | 0.7021 |
154
+
155
 
156
  ### Freedom Bench
157