Emova-ollm
/

deepseek-vl2-deepseekmoe-tiny

Text Generation

text-generation-inference

Model card Files Files and versions

KaiChen1998 commited on Mar 12

Commit

45ff956

·

verified ·

1 Parent(s): 52c1877

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -9,7 +9,7 @@ license: apache-2.0
 This repo contains the **DeepSeek-VL2-DeepSeekMoE-Tiny** model utilized to train the [EMOVA](https://huggingface.co/collections/Emova-ollm/emova-models-67779d377bb8261e6057a320) series of models. Different from traditional LLMs based on dense Transformers, DeepSeekMoE LLMs utilize an efficient sparse Mixture-of-Experts (MoE) architecture. In total, DeepSeek-VL2-DeepSeekMoE-Tiny contains 3B parameters, while only a 0.57B subset is activated for each token during inference. This DeepSeek-VL2-DeepSeekMoE-Tiny checkpoint is extracted from the [DeepSeek-VL2-Tiny](https://huggingface.co/deepseek-ai/deepseek-vl2-tiny) model.
-This checkpoint does not contain speech tokens, and thus, should be utilized in the **Stage 1: Vision-language pre-alignment** of EMOVA training.
 ## Usage

 This repo contains the **DeepSeek-VL2-DeepSeekMoE-Tiny** model utilized to train the [EMOVA](https://huggingface.co/collections/Emova-ollm/emova-models-67779d377bb8261e6057a320) series of models. Different from traditional LLMs based on dense Transformers, DeepSeekMoE LLMs utilize an efficient sparse Mixture-of-Experts (MoE) architecture. In total, DeepSeek-VL2-DeepSeekMoE-Tiny contains 3B parameters, while only a 0.57B subset is activated for each token during inference. This DeepSeek-VL2-DeepSeekMoE-Tiny checkpoint is extracted from the [DeepSeek-VL2-Tiny](https://huggingface.co/deepseek-ai/deepseek-vl2-tiny) model.
+This checkpoint does not contain speech tokens, and thus, should be utilized in the **Stage 1: Vision-language pre-alignment** of EMOVA training.
 ## Usage