KaiChen1998 commited on
Commit
45ff956
·
verified ·
1 Parent(s): 52c1877

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -9,7 +9,7 @@ license: apache-2.0
9
 
10
  This repo contains the **DeepSeek-VL2-DeepSeekMoE-Tiny** model utilized to train the [EMOVA](https://huggingface.co/collections/Emova-ollm/emova-models-67779d377bb8261e6057a320) series of models. Different from traditional LLMs based on dense Transformers, DeepSeekMoE LLMs utilize an efficient sparse Mixture-of-Experts (MoE) architecture. In total, DeepSeek-VL2-DeepSeekMoE-Tiny contains 3B parameters, while only a 0.57B subset is activated for each token during inference. This DeepSeek-VL2-DeepSeekMoE-Tiny checkpoint is extracted from the [DeepSeek-VL2-Tiny](https://huggingface.co/deepseek-ai/deepseek-vl2-tiny) model.
11
 
12
- This checkpoint does not contain speech tokens, and thus, should be utilized in the **Stage 1: Vision-language pre-alignment** of EMOVA training.
13
 
14
  ## Usage
15
 
 
9
 
10
  This repo contains the **DeepSeek-VL2-DeepSeekMoE-Tiny** model utilized to train the [EMOVA](https://huggingface.co/collections/Emova-ollm/emova-models-67779d377bb8261e6057a320) series of models. Different from traditional LLMs based on dense Transformers, DeepSeekMoE LLMs utilize an efficient sparse Mixture-of-Experts (MoE) architecture. In total, DeepSeek-VL2-DeepSeekMoE-Tiny contains 3B parameters, while only a 0.57B subset is activated for each token during inference. This DeepSeek-VL2-DeepSeekMoE-Tiny checkpoint is extracted from the [DeepSeek-VL2-Tiny](https://huggingface.co/deepseek-ai/deepseek-vl2-tiny) model.
11
 
12
+ This checkpoint does not contain speech tokens, and thus, should be utilized in the **Stage 1: Vision-language pre-alignment** of EMOVA training.
13
 
14
  ## Usage
15