--- library_name: vllm language: - en - fr - de - es - pt - it - ja - ko - ru - zh - ar - fa - id - ms - ne - pl - ro - sr - sv - tr - uk - vi - hi - bn license: apache-2.0 inference: false base_model: - mistralai/Mistral-Small-3.1-24B-Base-2503 - togethercomputer/mistral-3.2-instruct-2506 extra_gated_description: >- If you want to learn more about how we process your personal data, please read our Privacy Policy. tags: - mistral-common - quantized model_type: mistral quantization: bitsandbytes --- # Mistral-Small-3.2-24B-Instruct-2506 (Quantized) This is a quantized version of [togethercomputer/mistral-3.2-instruct-2506](https://huggingface.co/togethercomputer/mistral-3.2-instruct-2506), optimized for reduced memory usage while maintaining performance. Mistral-Small-3.2-24B-Instruct-2506 is a minor update of [Mistral-Small-3.1-24B-Instruct-2503](https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503). ## Quantization Details This model has been quantized to reduce memory requirements while preserving model quality. The quantization reduces the model size significantly compared to the original fp16/bf16 version. ## Base Model Improvements Small-3.2 improves in the following categories: - **Instruction following**: Small-3.2 is better at following precise instructions - **Repetition errors**: Small-3.2 produces less infinite generations or repetitive answers - **Function calling**: Small-3.2's function calling template is more robust In all other categories Small-3.2 should match or slightly improve compared to [Mistral-Small-3.1-24B-Instruct-2503](https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503). ## Key Features - same as [Mistral-Small-3.1-24B-Instruct-2503](https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503#key-features) - Reduced memory footprint through quantization - Optimized for inference with maintained quality ## Usage The quantized model can be used with the following frameworks; - [`vllm (recommended)`](https://github.com/vllm-project/vllm) - [`transformers`](https://github.com/huggingface/transformers) **Note 1**: We recommend using a relatively low temperature, such as `temperature=0.15`. **Note 2**: Make sure to add a system prompt to the model to best tailor it to your needs. ### Memory Requirements This quantized version requires significantly less GPU memory than the original model: - Original: ~55 GB of GPU RAM in bf16 or fp16 - Quantized: Reduced memory footprint (exact requirements depend on quantization method used) ## License This model inherits the same license as the base model: Apache-2.0 ## Original Model For benchmark results and detailed usage examples, please refer to the original model: [togethercomputer/mistral-3.2-instruct-2506](https://huggingface.co/togethercomputer/mistral-3.2-instruct-2506)