Request to Add AWQ Quantization Model

#13
by wunu - opened

AWQ (Activation-aware Weight Quantization) is a powerful quantization technique that can significantly reduce the memory footprint and computational cost of large language models while maintaining high accuracy. Adding support for the AWQ quantization model would enable more efficient deployment and usage of models, especially in resource-constrained environments. Could you please consider integrating this quantization model? Thank you.

I’d like to see that too, it would be very helpful for deployment.

Sign up or log in to comment