Update README.md
Browse files
README.md
CHANGED
|
@@ -25,7 +25,9 @@ base_model: meta-llama/LLaMA-2-7B
|
|
| 25 |
|
| 26 |
### Overview
|
| 27 |
|
| 28 |
-
This model is a distilled version of LLaMA 2, containing approximately 80 million parameters.
|
|
|
|
|
|
|
| 29 |
This version is the latest version of DistilLlama, which has gone through 5 days of training using two Nvidia A100 80G GPU.
|
| 30 |
|
| 31 |
### Model Architecture
|
|
@@ -98,4 +100,4 @@ The architecture is based on LLaMA 2, with the following parameters:
|
|
| 98 |
url={https://arxiv.org/abs/2308.02019},
|
| 99 |
}
|
| 100 |
|
| 101 |
-
*Note: The repository will be updated as training progresses. Last update 2024-
|
|
|
|
| 25 |
|
| 26 |
### Overview
|
| 27 |
|
| 28 |
+
This model is a distilled version of LLaMA 2, containing approximately 80 million parameters.
|
| 29 |
+
It was trained using a mix of OpenWebText and WikiText Raw V1 datasets.
|
| 30 |
+
Knowledge distillation was employed to transfer knowledge from a larger "teacher" model—Meta’s 7B LLaMA 2—to help this smaller model mimic the behavior of the teacher.
|
| 31 |
This version is the latest version of DistilLlama, which has gone through 5 days of training using two Nvidia A100 80G GPU.
|
| 32 |
|
| 33 |
### Model Architecture
|
|
|
|
| 100 |
url={https://arxiv.org/abs/2308.02019},
|
| 101 |
}
|
| 102 |
|
| 103 |
+
*Note: The repository will be updated as training progresses. Last update 2024-11-01*
|