200m-ish parameter model (I think the param count in the graphic here is wrong, but the bench values are correct) with the token embedding and language modelling head of Llama2-70b attached, with linear transformations from Llama2-70b's 8192d space down to this model's 1024d space.

| Tasks | Version | Filter | n-shot | Metric | Value | Stderr | |
|---|---|---|---|---|---|---|---|
| arc_challenge | Yaml | none | 25 | acc | 0.1775 | ± | 0.0112 |
| none | 25 | acc_norm | 0.2133 | ± | 0.0120 | ||
| truthfulqa_mc2 | Yaml | none | 0 | acc | 0.4457 | ± | 0.0152 |
| winogrande | Yaml | none | 5 | acc | 0.5154 | ± | 0.014 |
| hellaswag | Yaml | none | 10 | acc | 0.2832 | ± | 0.0045 |
| none | 10 | acc_norm | 0.3024 | ± | 0.0046 |
MMLU
(avg accuracy: 26.17%)
| Tasks | Version | Filter | n-shot | Metric | Value | Stderr | |
|---|---|---|---|---|---|---|---|
| abstract_algebra | Yaml | none | 5 | acc | 0.2200 | ± | 0.0416 |
| anatomy | Yaml | none | 5 | acc | 0.2222 | ± | 0.0359 |
| astronomy | Yaml | none | 5 | acc | 0.1776 | ± | 0.0311 |
| business_ethics | Yaml | none | 5 | acc | 0.2300 | ± | 0.0423 |
| clinical_knowledge | Yaml | none | 5 | acc | 0.2415 | ± | 0.0263 |
| college_biology | Yaml | none | 5 | acc | 0.3194 | ± | 0.0390 |
| college_chemistry | Yaml | none | 5 | acc | 0.2000 | ± | 0.0402 |
| college_computer_science | Yaml | none | 5 | acc | 0.2800 | ± | 0.0451 |
| college_mathematics | Yaml | none | 5 | acc | 0.2800 | ± | 0.0451 |
| college_medicine | Yaml | none | 5 | acc | 0.2254 | ± | 0.0319 |
| college_physics | Yaml | none | 5 | acc | 0.2157 | ± | 0.0409 |
| computer_security | Yaml | none | 5 | acc | 0.2200 | ± | 0.0416 |
| conceptual_physics | Yaml | none | 5 | acc | 0.2553 | ± | 0.0285 |
| econometrics | Yaml | none | 5 | acc | 0.2368 | ± | 0.0400 |
| electrical_engineering | Yaml | none | 5 | acc | 0.2345 | ± | 0.0353 |
| elementary_mathematics | Yaml | none | 5 | acc | 0.2646 | ± | 0.0227 |
| formal_logic | Yaml | none | 5 | acc | 0.2302 | ± | 0.0376 |
| global_facts | Yaml | none | 5 | acc | 0.1700 | ± | 0.0378 |
| high_school_biology | Yaml | none | 5 | acc | 0.2903 | ± | 0.0258 |
| high_school_chemistry | Yaml | none | 5 | acc | 0.2611 | ± | 0.0309 |
| high_school_computer_science | Yaml | none | 5 | acc | 0.2300 | ± | 0.0423 |
| high_school_european_history | Yaml | none | 5 | acc | 0.2788 | ± | 0.0350 |
| high_school_geography | Yaml | none | 5 | acc | 0.3081 | ± | 0.0329 |
| high_school_government_and_politics | Yaml | none | 5 | acc | 0.3731 | ± | 0.0349 |
| high_school_macroeconomics | Yaml | none | 5 | acc | 0.2923 | ± | 0.0231 |
| high_school_mathematics | Yaml | none | 5 | acc | 0.2630 | ± | 0.0268 |
| high_school_microeconomics | Yaml | none | 5 | acc | 0.3403 | ± | 0.0308 |
| high_school_physics | Yaml | none | 5 | acc | 0.2715 | ± | 0.0363 |
| high_school_psychology | Yaml | none | 5 | acc | 0.2881 | ± | 0.0194 |
| high_school_statistics | Yaml | none | 5 | acc | 0.4722 | ± | 0.0340 |
| high_school_us_history | Yaml | none | 5 | acc | 0.3529 | ± | 0.0335 |
| high_school_world_history | Yaml | none | 5 | acc | 0.2532 | ± | 0.0283 |
| human_aging | Yaml | none | 5 | acc | 0.2108 | ± | 0.0274 |
| human_sexuality | Yaml | none | 5 | acc | 0.2672 | ± | 0.0388 |
| international_law | Yaml | none | 5 | acc | 0.2479 | ± | 0.0394 |
| jurisprudence | Yaml | none | 5 | acc | 0.2500 | ± | 0.0419 |
| logical_fallacies | Yaml | none | 5 | acc | 0.2393 | ± | 0.0335 |
| machine_learning | Yaml | none | 5 | acc | 0.2946 | ± | 0.0433 |
| management | Yaml | none | 5 | acc | 0.1650 | ± | 0.0368 |
| marketing | Yaml | none | 5 | acc | 0.1923 | ± | 0.0258 |
| medical_genetics | Yaml | none | 5 | acc | 0.3000 | ± | 0.0461 |
| miscellaneous | Yaml | none | 5 | acc | 0.2720 | ± | 0.0159 |
| moral_disputes | Yaml | none | 5 | acc | 0.1936 | ± | 0.0213 |
| moral_scenarios | Yaml | none | 5 | acc | 0.2380 | ± | 0.0142 |
| nutrition | Yaml | none | 5 | acc | 0.2484 | ± | 0.0247 |
| philosophy | Yaml | none | 5 | acc | 0.2283 | ± | 0.0238 |
| prehistory | Yaml | none | 5 | acc | 0.2346 | ± | 0.0236 |
| professional_accounting | Yaml | none | 5 | acc | 0.2589 | ± | 0.0261 |
| professional_law | Yaml | none | 5 | acc | 0.2445 | ± | 0.0110 |
| professional_medicine | Yaml | none | 5 | acc | 0.4485 | ± | 0.0302 |
| professional_psychology | Yaml | none | 5 | acc | 0.2614 | ± | 0.0178 |
| public_relations | Yaml | none | 5 | acc | 0.2364 | ± | 0.0407 |
| security_studies | Yaml | none | 5 | acc | 0.4000 | ± | 0.0314 |
| sociology | Yaml | none | 5 | acc | 0.3035 | ± | 0.0325 |
| us_foreign_policy | Yaml | none | 5 | acc | 0.2800 | ± | 0.0451 |
| virology | Yaml | none | 5 | acc | 0.2048 | ± | 0.0314 |
| world_religions | Yaml | none | 5 | acc | 0.1988 | ± | 0.0306 |
- Downloads last month
- 225