|
|
--- |
|
|
license: mit |
|
|
base_model: |
|
|
- meta-llama/Llama-2-7b-hf |
|
|
--- |
|
|
|
|
|
The model is derived from Llama-2-7b-hf through pruning using LLM-Streamline **(Streamlining Redundant Layers to Compress Large Language Models, ICLR 2025 Spotlight)**. The entire training process required only 0.06B tokens. |
|
|
|
|
|
Below are the results of the evaluation using lm-eval: |
|
|
| | arc_c | arc_e | boolq | hellaswag | openbookqa | rte | winogrande | Avg | |
|
|
|--------------|-------|-------|-------|-----------|------------|------|------------|------| |
|
|
| Llama-2-7B | 43.3 | 76.4 | 77.7 | 57.2 | 31.4 | 62.8 | 69.1 | 59.7 | |
|
|
| Llama-2-4.7B | 34.0 | 64.6 | 74.7 | 49.8 | 27.4 | 61.7 | 66.4 | 54.1 | |