Update README.md
Browse files
README.md
CHANGED
|
@@ -9,7 +9,7 @@ license: apache-2.0
|
|
| 9 |
|
| 10 |
This repo contains the **DeepSeek-VL2-DeepSeekMoE-Tiny** model utilized to train the [EMOVA](https://huggingface.co/collections/Emova-ollm/emova-models-67779d377bb8261e6057a320) series of models. Different from traditional LLMs based on dense Transformers, DeepSeekMoE LLMs utilize an efficient sparse Mixture-of-Experts (MoE) architecture. In total, DeepSeek-VL2-DeepSeekMoE-Tiny contains 3B parameters, while only a 0.57B subset is activated for each token during inference. This DeepSeek-VL2-DeepSeekMoE-Tiny checkpoint is extracted from the [DeepSeek-VL2-Tiny](https://huggingface.co/deepseek-ai/deepseek-vl2-tiny) model.
|
| 11 |
|
| 12 |
-
This checkpoint does not contain speech tokens, and thus, should be utilized in the **Stage 1: Vision-language pre-alignment** of EMOVA training.
|
| 13 |
|
| 14 |
## Usage
|
| 15 |
|
|
|
|
| 9 |
|
| 10 |
This repo contains the **DeepSeek-VL2-DeepSeekMoE-Tiny** model utilized to train the [EMOVA](https://huggingface.co/collections/Emova-ollm/emova-models-67779d377bb8261e6057a320) series of models. Different from traditional LLMs based on dense Transformers, DeepSeekMoE LLMs utilize an efficient sparse Mixture-of-Experts (MoE) architecture. In total, DeepSeek-VL2-DeepSeekMoE-Tiny contains 3B parameters, while only a 0.57B subset is activated for each token during inference. This DeepSeek-VL2-DeepSeekMoE-Tiny checkpoint is extracted from the [DeepSeek-VL2-Tiny](https://huggingface.co/deepseek-ai/deepseek-vl2-tiny) model.
|
| 11 |
|
| 12 |
+
This checkpoint does not contain speech tokens, and thus, should be utilized in the **Stage 1: Vision-language pre-alignment** of EMOVA training.
|
| 13 |
|
| 14 |
## Usage
|
| 15 |
|