---
library_name: vllm
language:
- en
- fr
- de
- es
- pt
- it
- ja
- ko
- ru
- zh
- ar
- fa
- id
- ms
- ne
- pl
- ro
- sr
- sv
- tr
- uk
- vi
- hi
- bn
license: apache-2.0
inference: false
base_model:
- mistralai/Mistral-Small-3.1-24B-Base-2503
- togethercomputer/mistral-3.2-instruct-2506
extra_gated_description: >-
  If you want to learn more about how we process your personal data, please read
  our <a href="https://mistral.ai/terms/">Privacy Policy</a>.
tags:
- mistral-common
- quantized
model_type: mistral
quantization: bitsandbytes
---

# Mistral-Small-3.2-24B-Instruct-2506 (Quantized)

This is a quantized version of [togethercomputer/mistral-3.2-instruct-2506](https://huggingface.co/togethercomputer/mistral-3.2-instruct-2506), optimized for reduced memory usage while maintaining performance.

Mistral-Small-3.2-24B-Instruct-2506 is a minor update of [Mistral-Small-3.1-24B-Instruct-2503](https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503).

## Quantization Details

This model has been quantized to reduce memory requirements while preserving model quality. The quantization reduces the model size significantly compared to the original fp16/bf16 version.

## Base Model Improvements

Small-3.2 improves in the following categories:
- **Instruction following**: Small-3.2 is better at following precise instructions
- **Repetition errors**: Small-3.2 produces less infinite generations or repetitive answers
- **Function calling**: Small-3.2's function calling template is more robust

In all other categories Small-3.2 should match or slightly improve compared to [Mistral-Small-3.1-24B-Instruct-2503](https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503).

## Key Features
- same as [Mistral-Small-3.1-24B-Instruct-2503](https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503#key-features)
- Reduced memory footprint through quantization
- Optimized for inference with maintained quality

## Usage

The quantized model can be used with the following frameworks;
- [`vllm (recommended)`](https://github.com/vllm-project/vllm)
- [`transformers`](https://github.com/huggingface/transformers)

**Note 1**: We recommend using a relatively low temperature, such as `temperature=0.15`.

**Note 2**: Make sure to add a system prompt to the model to best tailor it to your needs.

### Memory Requirements

This quantized version requires significantly less GPU memory than the original model:
- Original: ~55 GB of GPU RAM in bf16 or fp16
- Quantized: Reduced memory footprint (exact requirements depend on quantization method used)

## License

This model inherits the same license as the base model: Apache-2.0

## Original Model

For benchmark results and detailed usage examples, please refer to the original model: [togethercomputer/mistral-3.2-instruct-2506](https://huggingface.co/togethercomputer/mistral-3.2-instruct-2506)