Improve model card: Add pipeline tag, paper link, GitHub & citation
#2
by
nielsr
HF Staff
- opened
README.md
CHANGED
|
@@ -1,15 +1,16 @@
|
|
| 1 |
---
|
|
|
|
| 2 |
license: apache-2.0
|
|
|
|
| 3 |
tags:
|
| 4 |
- dllm
|
| 5 |
- diffusion
|
| 6 |
- llm
|
| 7 |
-
- text_generation
|
| 8 |
-
library_name: transformers
|
| 9 |
---
|
|
|
|
| 10 |
# LLaDA-MoE
|
| 11 |
|
| 12 |
-
**LLaDA-MoE** is a new and upgraded series of the LLaDA diffusion language model. This pre-release includes two cutting-edge models:
|
| 13 |
|
| 14 |
- `LLaDA-MoE-7B-A1B-Base`: A base pre-trained model designed for research and secondary development.
|
| 15 |
- `LLaDA-MoE-7B-A1B-Instruct`: An instruction-tuned model optimized for practical applications.
|
|
@@ -20,8 +21,9 @@ library_name: transformers
|
|
| 20 |
<img src="https://raw.githubusercontent.com/Ulov888/LLaDA_Assets/main/benchmarks_details_table.png" width="800" />
|
| 21 |
</div>
|
| 22 |
|
| 23 |
-
|
| 24 |
-
|
|
|
|
| 25 |
|
| 26 |
## π Performance Highlights
|
| 27 |
|
|
@@ -175,17 +177,20 @@ input_ids = torch.tensor(input_ids).to(device).unsqueeze(0)
|
|
| 175 |
|
| 176 |
text = generate(model, input_ids, steps=128, gen_length=128, block_length=32, temperature=0., cfg_scale=0., remasking='low_confidence')
|
| 177 |
print(tokenizer.batch_decode(text[:, input_ids.shape[1]:], skip_special_tokens=False)[0])
|
| 178 |
-
|
| 179 |
-
|
| 180 |
-
|
| 181 |
-
|
| 182 |
```
|
| 183 |
|
|
|
|
| 184 |
|
| 185 |
-
|
| 186 |
|
| 187 |
-
|
| 188 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 189 |
|
| 190 |
---
|
| 191 |
|
|
@@ -197,6 +202,6 @@ This project is licensed under the terms of the [Apache License 2.0](https://www
|
|
| 197 |
|
| 198 |
## π€ Contact & Collaboration
|
| 199 |
|
| 200 |
-
For questions, collaborations, or feedback, please reach out via [Hugging Face](https://huggingface.co/inclusionAI/LLaDA-MoE-7B-A1B-Base) or open an issue in the [repository](https://github.com/inclusionAI).
|
| 201 |
|
| 202 |
π Join us in advancing open, efficient, and intelligent language models!
|
|
|
|
| 1 |
---
|
| 2 |
+
library_name: transformers
|
| 3 |
license: apache-2.0
|
| 4 |
+
pipeline_tag: text-generation
|
| 5 |
tags:
|
| 6 |
- dllm
|
| 7 |
- diffusion
|
| 8 |
- llm
|
|
|
|
|
|
|
| 9 |
---
|
| 10 |
+
|
| 11 |
# LLaDA-MoE
|
| 12 |
|
| 13 |
+
**LLaDA-MoE** is a new and upgraded series of the LLaDA diffusion language model, developed as part of the `dInfer` framework presented in the paper [dInfer: An Efficient Inference Framework for Diffusion Language Models](https://huggingface.co/papers/2510.08666). This pre-release includes two cutting-edge models:
|
| 14 |
|
| 15 |
- `LLaDA-MoE-7B-A1B-Base`: A base pre-trained model designed for research and secondary development.
|
| 16 |
- `LLaDA-MoE-7B-A1B-Instruct`: An instruction-tuned model optimized for practical applications.
|
|
|
|
| 21 |
<img src="https://raw.githubusercontent.com/Ulov888/LLaDA_Assets/main/benchmarks_details_table.png" width="800" />
|
| 22 |
</div>
|
| 23 |
|
| 24 |
+
## GitHub Repository
|
| 25 |
+
For the complete codebase, training scripts, and more details on the `dInfer` framework, please visit the official GitHub repository:
|
| 26 |
+
[https://github.com/inclusionAI/dInfer](https://github.com/inclusionAI/dInfer)
|
| 27 |
|
| 28 |
## π Performance Highlights
|
| 29 |
|
|
|
|
| 177 |
|
| 178 |
text = generate(model, input_ids, steps=128, gen_length=128, block_length=32, temperature=0., cfg_scale=0., remasking='low_confidence')
|
| 179 |
print(tokenizer.batch_decode(text[:, input_ids.shape[1]:], skip_special_tokens=False)[0])
|
|
|
|
|
|
|
|
|
|
|
|
|
| 180 |
```
|
| 181 |
|
| 182 |
+
## π Citation
|
| 183 |
|
| 184 |
+
If you find `dInfer` and LLaDA-MoE useful in your research or applications, please cite our paper:
|
| 185 |
|
| 186 |
+
```bibtex
|
| 187 |
+
@article{dinfer,
|
| 188 |
+
title={dInfer: An Efficient Inference Framework for Diffusion Language Models},
|
| 189 |
+
author={Yuxin Ma, Lun Du, Lanning Wei, Kun Chen, Qian Xu, Kangyu Wang, Guofeng Feng, Guoshan Lu, Lin Liu, Xiaojing Qi, Xinyuan Zhang, Zhen Tao, Haibo Feng, Ziyun Jiang, Ying Xu, Zenan Huang, Yihong Zhuang, Haokai Xu, Jiaqi Hu, Zhenzhong Lan, Junbo Zhao, Jianguo Li, Da Zheng},
|
| 190 |
+
year={2025},
|
| 191 |
+
journal={arXiv preprint arXiv:2510.08666}
|
| 192 |
+
}
|
| 193 |
+
```
|
| 194 |
|
| 195 |
---
|
| 196 |
|
|
|
|
| 202 |
|
| 203 |
## π€ Contact & Collaboration
|
| 204 |
|
| 205 |
+
For questions, collaborations, or feedback, please reach out via [Hugging Face](https://huggingface.co/inclusionAI/LLaDA-MoE-7B-A1B-Base) or open an issue in the [repository](https://github.com/inclusionAI/dInfer).
|
| 206 |
|
| 207 |
π Join us in advancing open, efficient, and intelligent language models!
|