BitsAndBytes 4 bits quantization from DeepSeek-R1-Distill-Qwen-7B commit 393119fcd6a873e5776c79b0db01c96911f5f0fc

Tested successfully with vLLM 0.7.2 with the following parameters:

llm_model = LLM(
    "MPWARE/DeepSeek-R1-Distill-Qwen-7B-BnB-4bits",
    task="generate",
    dtype=torch.bfloat16,
    max_num_seqs=8192,
    max_model_len=8192,
    trust_remote_code=True,
    quantization="bitsandbytes",
    load_format="bitsandbytes",
    enforce_eager=True, # Required for vLLM architecture V1
    tensor_parallel_size=1, 
    gpu_memory_utilization=0.95,  
    seed=42
)

Downloads last month: -

Safetensors

Model size

4B params

Tensor type

F16

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MPWARE/DeepSeek-R1-Distill-Qwen-7B-BnB-4bits

Base model

deepseek-ai/DeepSeek-R1-Distill-Qwen-7B

Quantized

(163)

this model

Collection including MPWARE/DeepSeek-R1-Distill-Qwen-7B-BnB-4bits

DeepSeek-R1 Family

Collection

7 items • Updated Feb 14