--- license: llama2 base_model: meta-llama/Llama-2-70b-chat-hf --- # Llama-2-70b-chat-hf-WMXFP4FP8-AMXFP4FP8-AMP-KVFP8 - ## Introduction This model was created by applying [Quark](https://quark.docs.amd.com/latest/index.html) with calibration samples from Pile dataset. - ## Quantization Stragegy - ***Quantized Layers***: All linear layers excluding "lm_head" - ***Weight***: Auto Mixed Precision quantized by Quark, each weight has either quantization scheme in candidates of - FP8 symmetric per-tensor - OCP Microscaling (MX) FP4 - ***Activation***: Auto Mixed Precision quantized by Quark, each activation input has the same quantization scheme with weight, i.e., in candidates of - FP8 symmetric per-tensor - OCP Microscaling (MX) FP4 - ***KV Cache***: FP8 symmetric per-tensor - ## Quick Start 1. [Download and install Quark](https://quark.docs.amd.com/latest/install.html) 2. [TODO] We will provide example script(s) to run auto mixed precision (AMP) quantizations later. ## Deployment The Quark quantized Auto Mixed Precision (AMP) models are now supported to be easily deployed in vLLM backend (vLLM-compatible). ## Evaluation The quantization evaluation results are conducted in pseudo-quantization mode, which may slightly differ from the actual quantized inference accuracy. These results are provided for reference only. #### Evaluation scores | Quant scheme | arc challenge (↑)
(acc) | | gsm8k (↑)
(strict-match) | | mmlu (↑)
(acc) | | winogrande (↑)
(acc) | | |--------------|-----------------------------|-------|-----------------------------|-------|-------------------|-------|--------------------------|-------| | | absolute value | recovery rate | absolute value | recovery rate | absolute value | recovery rate | absolute value | recovery rate | | **FP16** | 0.5290 | 100.0% | 0.5049 | 100.0% | 0.6110 | 100.0% | 0.7490 | 100.0% | | **FP8** | 0.5265 | 99.5% | 0.5262 | 104.2% | 0.6107 | 100.0% | 0.7451 | 99.5% | | AMP | 0.5273 | 99.7% | 0.5125 | 101.5% | 0.6007 | 98.3% | 0.7324 | 97.8% | | **MXFP4** | 0.5094 | 96.3% | 0.4572 | 90.6% | 0.5869 | 96.1% | 0.7316 | 97.7% | #### License Modifications copyright(c) 2024 Advanced Micro Devices,Inc. All rights reserved. Built with Meta Llama. Llama 2 is licensed under the LLAMA 2 Community License, Copyright (c) Meta Platforms, Inc. All Rights Reserved.