yhyu13 commited on
Commit
955bb3b
·
1 Parent(s): 62e3564

Update readme

Browse files
Files changed (1) hide show
  1. README.md +20 -5
README.md CHANGED
@@ -2,6 +2,17 @@
2
  license: llama2
3
  ---
4
 
 
 
 
 
 
 
 
 
 
 
 
5
  Experiment QUIP 2-bit E8P12 version that works in textgen-webui with quip mode loader
6
 
7
  Generated by using scripts from https://gitee.com/yhyu13/llama_-tools
@@ -12,7 +23,7 @@ GPTQ 4bit : https://huggingface.co/Yhyu13/Xwin-Math-7B-V1.0-GPTQ-4bit
12
 
13
  ---
14
 
15
- I used `hessian_offline_llama.py` provided by QUIP repo to generate hessian specifically for the orignal model before applying Quip quantization.
16
 
17
  It took quite a long time for hessian for all 31 layers, about 6 hours for 7B models on a single RTX3090. I am not sure if I made any error.
18
 
@@ -21,12 +32,16 @@ QUIP byproducts are also uploaded.
21
 
22
  Perplexity calcaultead using `eval_ppl.py` provided by QUIP repo
23
  QUIP PPL:
24
- wikitext2 perplexity: 11.247852325439453
25
- c4 perplexity: 16.275997161865234
 
 
26
 
27
  Original model PPL:
28
- wikitext2 perplexity: 6.042122840881348
29
- c4 perplexity: 8.430611610412598
 
 
30
 
31
  Looks like something is wrong, the quantized model is a disaster.
32
 
 
2
  license: llama2
3
  ---
4
 
5
+ # Edit
6
+
7
+ PS : https://github.com/Cornell-RelaxML/quip-sharp/issues/13
8
+
9
+ As mentioned in the above issue thread,
10
+
11
+ - for accurate hessian generation, use a larger devset (e.g., 4096) and consider accumulating hessians in fp32 if consumer GPUs with fast fp64 are not available.
12
+ - changing the Hessian dataset from a natural language dataset to a mathematical dataset, as the task is a math model.
13
+
14
+ ---
15
+
16
  Experiment QUIP 2-bit E8P12 version that works in textgen-webui with quip mode loader
17
 
18
  Generated by using scripts from https://gitee.com/yhyu13/llama_-tools
 
23
 
24
  ---
25
 
26
+ This repo used `hessian_offline_llama.py` provided by QUIP repo to generate hessian specifically for the orignal model before applying Quip quantization.
27
 
28
  It took quite a long time for hessian for all 31 layers, about 6 hours for 7B models on a single RTX3090. I am not sure if I made any error.
29
 
 
32
 
33
  Perplexity calcaultead using `eval_ppl.py` provided by QUIP repo
34
  QUIP PPL:
35
+
36
+ - wikitext2 perplexity: 11.247852325439453
37
+
38
+ - c4 perplexity: 16.275997161865234
39
 
40
  Original model PPL:
41
+
42
+ - wikitext2 perplexity: 6.042122840881348
43
+
44
+ - c4 perplexity: 8.430611610412598
45
 
46
  Looks like something is wrong, the quantized model is a disaster.
47