SentientAGI
/

Dobby-Unhinged-Llama-3.3-70B

Model card Files Files and versions

salzubi401 commited on Feb 12

Commit

d153923

·

verified ·

1 Parent(s): a85a56a

Update README.md

Files changed (1) hide show

README.md +9 -7

README.md CHANGED Viewed

@@ -143,13 +143,15 @@ This means that our community owns the fingerprints that they can use to verify
 **Dobby-Unhinged-Llama-3.3-70B** retains the base performance of Llama-3.3-70B-Instruct across the evaluated tasks.
 We use lm-eval-harness to evaluate between performance on models:
-| Benchmark                                       | Llama3.3-70B-Instruct| Hermes3-3.1-70B| Dobby-Unhinged-Llama-3.3-70B|
-|-------------------------------------------------|----------------------|----------------|--------------------|
-| IFEVAL (inst_level_strict/loss avg)             | 0.9340               | 0.8153         | 0.8543             |
-| MMLU-pro                                        | 0.5474               | 0.4737         | 0.5499             |
-| GPQA (average among diamond, extended and main) | 0.3838               | 0.4040         | 0.3939             |
-| MuSR                                            | 0.4881               | 0.5094         | 0.5053             |
-| BBH (average across all tasks)                  | 0.7018               | 0.6797         | 0.7021             |
 ### Freedom Bench

 **Dobby-Unhinged-Llama-3.3-70B** retains the base performance of Llama-3.3-70B-Instruct across the evaluated tasks.
 We use lm-eval-harness to evaluate between performance on models:
+| Benchmark                                       | Llama3.3-70B-Instruct | Dobby-Unhinged-Llama-3.3-70B |
+|-------------------------------------------------|----------------------|--------------------|
+| IFEVAL (inst_level_strict/loss avg)             | 0.9340               | 0.8543             |
+| MMLU-pro                                        | 0.5474               | 0.5499             |
+| GPQA (average among diamond, extended and main) | 0.3838               | 0.3939             |
+| MuSR                                            | 0.4881               | 0.5053             |
+| BBH (average across all tasks)                  | 0.7018               | 0.7021             |
 ### Freedom Bench