Note: this repo has low accuracy and is under investigation.
yujiepan/Meta-Llama-3-8B-gptq-w4g64
This model applies AutoGPTQ on meta-llama/Meta-Llama-3-8B.
- 4-bit symmetric weight only quantization
- group_size=64
- calibration set: ptb-new
Accuracy
| model | precision | wikitext ppl (โ) |
|---|---|---|
| meta-llama/Meta-Llama-3-8B | FP16 | 9.179 |
| yujiepan/Meta-Llama-3-8B-gptq-w4g64 | w4g64 | 14.949 |
Codes
import os
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, GPTQConfig
model_id = "meta-llama/Meta-Llama-3-8B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
quantization_config = GPTQConfig(
bits=4, group_size=64,
dataset="ptb-new",
tokenizer=tokenizer,
)
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
low_cpu_mem_usage=True,
quantization_config=quantization_config,
)
model.push_to_hub('yujiepan/Meta-Llama-3-8B-gptq-w4g64')
- Downloads last month
- 1