DiRL
Collection
An Efficient Training Framework for Diffusion Language Models
•
3 items
•
Updated
•
1
DiRL-8B-Instruct is an 8B parameter diffusion language model specialized for mathematical reasoning. It is trained using the DiRL framework based on SDAR-8B-Chat. Through two-stage training (SFT + RL), DiRL-8B-Instruct achieves state-of-the-art results at the 8B scale on mathematical reasoning benchmarks, even outperforming 32B models on most tasks.
Highlights
- SOTA Performance: Achieves 83.05% on MATH500, 20.63% on AIME2024, and 20.83% on AIME2025, surpassing all 8B baselines.
- Training Framework: Trained with DiRL, an efficient training framework for diffusion language models.
- Strong Baseline: Built on SDAR-8B-Chat, gaining +11.20% on MATH500 and +11.46% on AIME2024.
from lmdeploy import pipeline, PytorchEngineConfig, GenerationConfig
from transformers import AutoTokenizer
model_path = "OpenMOSS-Team/DiRL-8B-Instruct"
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path)
# Prepare prompts
prompts = [
[{"role": "user", "content": "Solve: If x + 5 = 12, what is x?"}],
]
prompts = tokenizer.apply_chat_template(prompts, tokenize=False, add_generation_prompt=True)
# Configure backend for DLLM inference
backend_config = PytorchEngineConfig(
dtype="float16",
max_prefill_token_num=8192,
cache_max_entry_count=0.8,
dllm_block_length=4,
dllm_denoising_steps=4,
dllm_unmasking_strategy="low_confidence_dynamic",
dllm_confidence_threshold=0.9,
)
# Create inference pipeline
with pipeline(model_path, backend_config=backend_config) as pipe:
gen_config = GenerationConfig(
top_p=1.0,
top_k=50,
temperature=1.0,
do_sample=False, # greedy decoding
max_new_tokens=8192,
)
outputs = pipe(prompts, gen_config=gen_config)
for output in outputs:
print(output.text)
| Model | MATH500 | GSM8K | AIME2024 | AIME2025 | OlympiadBench | Average |
|---|---|---|---|---|---|---|
| Qwen2.5-7B-Instruct | 73.78 | 89.78 | 8.96 | 5.63 | 36.58 | 42.95 |
| Qwen2.5-32B-Instruct | 81.13 | 94.03 | 12.92 | 11.88 | 45.65 | 49.12 |
| SDAR-8B-Chat | 71.85 | 89.87 | 9.17 | 9.38 | 36.03 | 43.26 |
| Trado-8B-Instruct | 75.59 | 91.06 | 11.67 | 15.00 | 40.32 | 46.73 |
| DiRL-8B-Instruct | 83.05 | 93.03 | 20.63 | 20.83 | 46.40 | 52.79 |
If you use this model in your research, please cite:
@misc{zhu2025dirl,
title={DiRL: An Efficient Training Framework for Diffusion Language Models},
author={Zhu, Ying and Wan, Jiaxin and Liang, Tianyi and Guo, Xu and Liu, Xiaoran and Huang, Zengfeng and He, Ziwei and Qiu, Xipeng},
year={2025},
institution={Fudan University, Shanghai Innovation Institute},
url={https://github.com/OpenMOSS/DiRL}
}
Base model
JetLM/SDAR-8B-Chat