cpatonn commited on
Commit
bc895b0
·
verified ·
1 Parent(s): ca77181

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -0
README.md CHANGED
@@ -28,6 +28,20 @@ base_model:
28
  - **Calibration Dataset:** [nvidia/Llama-Nemotron-Post-Training-Dataset](https://huggingface.co/datasets/nvidia/Llama-Nemotron-Post-Training-Dataset)
29
  - **Quantization Tool:** [llm-compressor](https://github.com/vllm-project/llm-compressor)
30
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31
  ## Inference
32
 
33
  ### Prerequisite
 
28
  - **Calibration Dataset:** [nvidia/Llama-Nemotron-Post-Training-Dataset](https://huggingface.co/datasets/nvidia/Llama-Nemotron-Post-Training-Dataset)
29
  - **Quantization Tool:** [llm-compressor](https://github.com/vllm-project/llm-compressor)
30
 
31
+ ### Memory Usage
32
+
33
+ | **Type** | **MiniMax-M2-REAP-162B-A10B** | **MiniMax-M2-REAP-162B-A10B-AWQ-4bit** |
34
+ |:---------------:|:----------------:|:----------------:|
35
+ | **Memory Size** | 152.1 GB | 86.6 GB |
36
+ | **KV Cache per Token** | 124.0 kB | 31.0 kB |
37
+ | **KV Cache per Context** | 23.3 GB | 5.8 GB |
38
+
39
+ ### Evaluations
40
+
41
+ | **Benchmarks** | **MiniMax-M2-REAP-162B-A10B** | **MiniMax-M2-REAP-162B-A10B-AWQ-4bit** |
42
+ |:---------------:|:----------------:|:----------------:|
43
+ | **Perplexity** | 1.75134 | 1.75138 |
44
+
45
  ## Inference
46
 
47
  ### Prerequisite