Request to Add AWQ Quantization Model
#13
by
wunu
- opened
AWQ (Activation-aware Weight Quantization) is a powerful quantization technique that can significantly reduce the memory footprint and computational cost of large language models while maintaining high accuracy. Adding support for the AWQ quantization model would enable more efficient deployment and usage of models, especially in resource-constrained environments. Could you please consider integrating this quantization model? Thank you.
I’d like to see that too, it would be very helpful for deployment.