HenryHHHH
/

DistilLlamaV1

Text Generation

knowledge-distillation

transfer-learning

text-generation-inference

Model card Files Files and versions

HenryHHHH commited on Nov 1, 2024

Commit

e7e8fb7

·

verified ·

1 Parent(s): 3d56cd2

Update README.md

Files changed (1) hide show

README.md +4 -2

README.md CHANGED Viewed

@@ -25,7 +25,9 @@ base_model: meta-llama/LLaMA-2-7B
 ### Overview
-This model is a distilled version of LLaMA 2, containing approximately 80 million parameters. It was trained using a mix of OpenWebText and WikiText Raw V1 datasets. Knowledge distillation was employed to transfer knowledge from a larger "teacher" model—Meta’s 7B LLaMA 2—to help this smaller model mimic the behavior of the teacher.
 This version is the latest version of DistilLlama, which has gone through 5 days of training using two Nvidia A100 80G GPU.
 ### Model Architecture
@@ -98,4 +100,4 @@ The architecture is based on LLaMA 2, with the following parameters:
       url={https://arxiv.org/abs/2308.02019},
 }
-*Note: The repository will be updated as training progresses. Last update 2024-10-23*

 ### Overview
+This model is a distilled version of LLaMA 2, containing approximately 80 million parameters.
+It was trained using a mix of OpenWebText and WikiText Raw V1 datasets.
+Knowledge distillation was employed to transfer knowledge from a larger "teacher" model—Meta’s 7B LLaMA 2—to help this smaller model mimic the behavior of the teacher.
 This version is the latest version of DistilLlama, which has gone through 5 days of training using two Nvidia A100 80G GPU.
 ### Model Architecture
       url={https://arxiv.org/abs/2308.02019},
 }
+*Note: The repository will be updated as training progresses. Last update 2024-11-01*