DiRL-8B-Instruct

Introduction

DiRL-8B-Instruct is an 8B parameter diffusion language model specialized for mathematical reasoning. It is trained using the DiRL framework based on SDAR-8B-Chat. Through two-stage training (SFT + RL), DiRL-8B-Instruct achieves state-of-the-art results at the 8B scale on mathematical reasoning benchmarks, even outperforming 32B models on most tasks.

Highlights

SOTA Performance: Achieves 83.05% on MATH500, 20.63% on AIME2024, and 20.83% on AIME2025, surpassing all 8B baselines.

Training Framework: Trained with DiRL, an efficient training framework for diffusion language models.

Strong Baseline: Built on SDAR-8B-Chat, gaining +11.20% on MATH500 and +11.46% on AIME2024.

Inference

Using LMDeploy

from lmdeploy import pipeline, PytorchEngineConfig, GenerationConfig
from transformers import AutoTokenizer

model_path = "OpenMOSS-Team/DiRL-8B-Instruct"

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path)

# Prepare prompts
prompts = [
    [{"role": "user", "content": "Solve: If x + 5 = 12, what is x?"}],
]
prompts = tokenizer.apply_chat_template(prompts, tokenize=False, add_generation_prompt=True)

# Configure backend for DLLM inference
backend_config = PytorchEngineConfig(
    dtype="float16",
    max_prefill_token_num=8192,
    cache_max_entry_count=0.8,
    dllm_block_length=4,
    dllm_denoising_steps=4,
    dllm_unmasking_strategy="low_confidence_dynamic",
    dllm_confidence_threshold=0.9,
)

# Create inference pipeline
with pipeline(model_path, backend_config=backend_config) as pipe:
    gen_config = GenerationConfig(
        top_p=1.0,
        top_k=50,
        temperature=1.0,
        do_sample=False,  # greedy decoding
        max_new_tokens=8192,
    )
    
    outputs = pipe(prompts, gen_config=gen_config)
    
    for output in outputs:
        print(output.text)

Performance

Model	MATH500	GSM8K	AIME2024	AIME2025	OlympiadBench	Average
Qwen2.5-7B-Instruct	73.78	89.78	8.96	5.63	36.58	42.95
Qwen2.5-32B-Instruct	81.13	94.03	12.92	11.88	45.65	49.12
SDAR-8B-Chat	71.85	89.87	9.17	9.38	36.03	43.26
Trado-8B-Instruct	75.59	91.06	11.67	15.00	40.32	46.73
DiRL-8B-Instruct	83.05	93.03	20.63	20.83	46.40	52.79

Citation

If you use this model in your research, please cite:

@misc{zhu2025dirl,
  title={DiRL: An Efficient Training Framework for Diffusion Language Models},
  author={Zhu, Ying and Wan, Jiaxin and Liang, Tianyi and Guo, Xu and Liu, Xiaoran and Huang, Zengfeng and He, Ziwei and Qiu, Xipeng},
  year={2025},
  institution={Fudan University, Shanghai Innovation Institute},
  url={https://github.com/OpenMOSS/DiRL}
}

Downloads last month: 9

Safetensors

Model size

8B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for OpenMOSS-Team/DiRL-8B-Instruct

Base model

JetLM/SDAR-8B-Chat

Finetuned

(2)

this model

Collection including OpenMOSS-Team/DiRL-8B-Instruct

DiRL

Collection

An Efficient Training Framework for Diffusion Language Models • 3 items • Updated 3 days ago • 1