XiaodongChen
/

Llama-2-4.7B

Model card Files Files and versions

Llama-2-4.7B / README.md

XiaodongChen's picture

Update README.md

7138538 verified 9 months ago

|

history blame contribute delete

713 Bytes

	---
	license: mit
	base_model:
	- meta-llama/Llama-2-7b-hf
	---

	The model is derived from Llama-2-7b-hf through pruning using LLM-Streamline (Streamlining Redundant Layers to Compress Large Language Models, ICLR 2025 Spotlight). The entire training process required only 0.06B tokens.

	Below are the results of the evaluation using lm-eval:
	\| \| arc_c \| arc_e \| boolq \| hellaswag \| openbookqa \| rte \| winogrande \| Avg \|
	\|--------------\|-------\|-------\|-------\|-----------\|------------\|------\|------------\|------\|
	\| Llama-2-7B \| 43.3 \| 76.4 \| 77.7 \| 57.2 \| 31.4 \| 62.8 \| 69.1 \| 59.7 \|
	\| Llama-2-4.7B \| 34.0 \| 64.6 \| 74.7 \| 49.8 \| 27.4 \| 61.7 \| 66.4 \| 54.1 \|