DiRL-8B-Instruct

💻Github Repo

Introduction

DiRL-8B-Instruct is an 8B parameter diffusion language model specialized for mathematical reasoning. It is trained using the DiRL framework based on SDAR-8B-Chat. Through two-stage training (SFT + RL), DiRL-8B-Instruct achieves state-of-the-art results at the 8B scale on mathematical reasoning benchmarks, even outperforming 32B models on most tasks.

Highlights

  • SOTA Performance: Achieves 83.05% on MATH500, 20.63% on AIME2024, and 20.83% on AIME2025, surpassing all 8B baselines.
  • Training Framework: Trained with DiRL, an efficient training framework for diffusion language models.
  • Strong Baseline: Built on SDAR-8B-Chat, gaining +11.20% on MATH500 and +11.46% on AIME2024.

Inference

Using LMDeploy

from lmdeploy import pipeline, PytorchEngineConfig, GenerationConfig
from transformers import AutoTokenizer

model_path = "OpenMOSS-Team/DiRL-8B-Instruct"

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path)

# Prepare prompts
prompts = [
    [{"role": "user", "content": "Solve: If x + 5 = 12, what is x?"}],
]
prompts = tokenizer.apply_chat_template(prompts, tokenize=False, add_generation_prompt=True)

# Configure backend for DLLM inference
backend_config = PytorchEngineConfig(
    dtype="float16",
    max_prefill_token_num=8192,
    cache_max_entry_count=0.8,
    dllm_block_length=4,
    dllm_denoising_steps=4,
    dllm_unmasking_strategy="low_confidence_dynamic",
    dllm_confidence_threshold=0.9,
)

# Create inference pipeline
with pipeline(model_path, backend_config=backend_config) as pipe:
    gen_config = GenerationConfig(
        top_p=1.0,
        top_k=50,
        temperature=1.0,
        do_sample=False,  # greedy decoding
        max_new_tokens=8192,
    )
    
    outputs = pipe(prompts, gen_config=gen_config)
    
    for output in outputs:
        print(output.text)

Performance

Model MATH500 GSM8K AIME2024 AIME2025 OlympiadBench Average
Qwen2.5-7B-Instruct 73.78 89.78 8.96 5.63 36.58 42.95
Qwen2.5-32B-Instruct 81.13 94.03 12.92 11.88 45.65 49.12
SDAR-8B-Chat 71.85 89.87 9.17 9.38 36.03 43.26
Trado-8B-Instruct 75.59 91.06 11.67 15.00 40.32 46.73
DiRL-8B-Instruct 83.05 93.03 20.63 20.83 46.40 52.79

Citation

If you use this model in your research, please cite:

@misc{zhu2025dirl,
  title={DiRL: An Efficient Training Framework for Diffusion Language Models},
  author={Zhu, Ying and Wan, Jiaxin and Liang, Tianyi and Guo, Xu and Liu, Xiaoran and Huang, Zengfeng and He, Ziwei and Qiu, Xipeng},
  year={2025},
  institution={Fudan University, Shanghai Innovation Institute},
  url={https://github.com/OpenMOSS/DiRL}
}
Downloads last month
9
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for OpenMOSS-Team/DiRL-8B-Instruct

Base model

JetLM/SDAR-8B-Chat
Finetuned
(2)
this model

Collection including OpenMOSS-Team/DiRL-8B-Instruct