---
license: llama2
base_model: meta-llama/Llama-2-70b-chat-hf
---
# Llama-2-70b-chat-hf-WMXFP4FP8-AMXFP4FP8-AMP-KVFP8
- ## Introduction
This model was created by applying [Quark](https://quark.docs.amd.com/latest/index.html) with calibration samples from Pile dataset.
- ## Quantization Stragegy
- ***Quantized Layers***: All linear layers excluding "lm_head"
- ***Weight***: Auto Mixed Precision quantized by Quark, each weight has either quantization scheme in candidates of
- FP8 symmetric per-tensor
- OCP Microscaling (MX) FP4
- ***Activation***: Auto Mixed Precision quantized by Quark, each activation input has the same quantization scheme with weight, i.e., in candidates of
- FP8 symmetric per-tensor
- OCP Microscaling (MX) FP4
- ***KV Cache***: FP8 symmetric per-tensor
- ## Quick Start
1. [Download and install Quark](https://quark.docs.amd.com/latest/install.html)
2. [TODO] We will provide example script(s) to run auto mixed precision (AMP) quantizations later.
## Deployment
The Quark quantized Auto Mixed Precision (AMP) models are now supported to be easily deployed in vLLM backend (vLLM-compatible).
## Evaluation
The quantization evaluation results are conducted in pseudo-quantization mode, which may slightly differ from the actual quantized inference accuracy. These results are provided for reference only.
#### Evaluation scores
| Quant scheme | arc challenge (↑)
(acc) | | gsm8k (↑)
(strict-match) | | mmlu (↑)
(acc) | | winogrande (↑)
(acc) | |
|--------------|-----------------------------|-------|-----------------------------|-------|-------------------|-------|--------------------------|-------|
| | absolute value | recovery rate | absolute value | recovery rate | absolute value | recovery rate | absolute value | recovery rate |
| **FP16** | 0.5290 | 100.0% | 0.5049 | 100.0% | 0.6110 | 100.0% | 0.7490 | 100.0% |
| **FP8** | 0.5265 | 99.5% | 0.5262 | 104.2% | 0.6107 | 100.0% | 0.7451 | 99.5% |
| AMP | 0.5273 | 99.7% | 0.5125 | 101.5% | 0.6007 | 98.3% | 0.7324 | 97.8% |
| **MXFP4** | 0.5094 | 96.3% | 0.4572 | 90.6% | 0.5869 | 96.1% | 0.7316 | 97.7% |
#### License
Modifications copyright(c) 2024 Advanced Micro Devices,Inc. All rights reserved.
Built with Meta Llama.
Llama 2 is licensed under the LLAMA 2 Community License, Copyright (c) Meta Platforms, Inc. All Rights Reserved.