Yhyu13
/

Xwin-Math-7B-V1.0-QUIP-2bit

Text Generation

text-generation-inference

Model card Files Files and versions

yhyu13 commited on Dec 10, 2023

Commit

955bb3b

·

1 Parent(s): 62e3564

Update readme

Files changed (1) hide show

README.md +20 -5

README.md CHANGED Viewed

@@ -2,6 +2,17 @@
 license: llama2
 ---
 Experiment QUIP 2-bit E8P12 version that works in textgen-webui with quip mode loader
 Generated by using scripts from https://gitee.com/yhyu13/llama_-tools
@@ -12,7 +23,7 @@ GPTQ 4bit : https://huggingface.co/Yhyu13/Xwin-Math-7B-V1.0-GPTQ-4bit
 ---
-I used `hessian_offline_llama.py` provided by QUIP repo to generate hessian specifically for the orignal model before applying Quip quantization.
 It took quite a long time for hessian for all 31 layers, about 6 hours for 7B models on a single RTX3090. I am not sure if I made any error.
@@ -21,12 +32,16 @@ QUIP byproducts are also uploaded.
 Perplexity calcaultead using `eval_ppl.py` provided by QUIP repo
 QUIP PPL:
-wikitext2 perplexity: 11.247852325439453
-c4 perplexity: 16.275997161865234
 Original model PPL:
-wikitext2 perplexity: 6.042122840881348
-c4 perplexity: 8.430611610412598
 Looks like something is wrong, the quantized model is a disaster.

 license: llama2
 ---
+# Edit
+PS : https://github.com/Cornell-RelaxML/quip-sharp/issues/13
+As mentioned in the above issue thread,
+- for accurate hessian generation, use a larger devset (e.g., 4096) and consider accumulating hessians in fp32 if consumer GPUs with fast fp64 are not available.
+- changing the Hessian dataset from a natural language dataset to a mathematical dataset, as the task is a math model.
+---
 Experiment QUIP 2-bit E8P12 version that works in textgen-webui with quip mode loader
 Generated by using scripts from https://gitee.com/yhyu13/llama_-tools
 ---
+This repo used `hessian_offline_llama.py` provided by QUIP repo to generate hessian specifically for the orignal model before applying Quip quantization.
 It took quite a long time for hessian for all 31 layers, about 6 hours for 7B models on a single RTX3090. I am not sure if I made any error.
 Perplexity calcaultead using `eval_ppl.py` provided by QUIP repo
 QUIP PPL:
+- wikitext2 perplexity: 11.247852325439453
+- c4 perplexity: 16.275997161865234
 Original model PPL:
+- wikitext2 perplexity: 6.042122840881348
+- c4 perplexity: 8.430611610412598
 Looks like something is wrong, the quantized model is a disaster.