BitsAndBytes 4 bits quantization from DeepSeek-R1-Distill-Qwen-7B commit 393119fcd6a873e5776c79b0db01c96911f5f0fc

Tested successfully with vLLM 0.7.2 with the following parameters:

llm_model = LLM(
    "MPWARE/DeepSeek-R1-Distill-Qwen-7B-BnB-4bits",
    task="generate",
    dtype=torch.bfloat16,
    max_num_seqs=8192,
    max_model_len=8192,
    trust_remote_code=True,
    quantization="bitsandbytes",
    load_format="bitsandbytes",
    enforce_eager=True, # Required for vLLM architecture V1
    tensor_parallel_size=1, 
    gpu_memory_utilization=0.95,  
    seed=42
)
Downloads last month
-
Safetensors
Model size
4B params
Tensor type
F16
·
F32
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MPWARE/DeepSeek-R1-Distill-Qwen-7B-BnB-4bits

Quantized
(163)
this model

Collection including MPWARE/DeepSeek-R1-Distill-Qwen-7B-BnB-4bits