Update experimental PR build instructions
Browse files
README.md
CHANGED
|
@@ -130,18 +130,15 @@ export model=/mnt/models/ubergarm/Hunyuan-A13B-Instruct-GGUF/Hunyuan-A13B-Instru
|
|
| 130 |
```
|
| 131 |
|
| 132 |
## *NOTE* Building Experimental PRs
|
| 133 |
-
This PR is based on
|
| 134 |
```bash
|
| 135 |
# get the code setup
|
| 136 |
cd projects
|
| 137 |
git clone https://github.com/ikawrakow/ik_llama.cpp.git
|
| 138 |
git ik_llama.cpp
|
| 139 |
-
git fetch origin
|
| 140 |
git remote add ubergarm https://github.com/ubergarm/ik_llama.cpp
|
| 141 |
git fetch ubergarm
|
| 142 |
git checkout ug/hunyuan-moe-2
|
| 143 |
-
git checkout -b merge-stuff-here
|
| 144 |
-
git merge ikawrakow/ik/iq3_ks_v2
|
| 145 |
|
| 146 |
# build for CUDA
|
| 147 |
cmake -B build -DCMAKE_BUILD_TYPE=Release -DGGML_CUDA=ON -DGGML_VULKAN=OFF -DGGML_RPC=OFF -DGGML_BLAS=OFF -DGGML_CUDA_F16=ON -DGGML_SCHED_MAX_COPIES=1
|
|
@@ -153,6 +150,7 @@ git branch -D merge-stuff-here
|
|
| 153 |
```
|
| 154 |
|
| 155 |
## VRAM Estimations
|
|
|
|
| 156 |
|
| 157 |
* 8k = 3790MiB total with KV self size = 544.00 MiB, K (q8_0): 272.00 MiB, V (q8_0): 272.00 MiB
|
| 158 |
* 32k = 5462MiB total with KV self size = 2176.00 MiB, K (q8_0): 1088.00 MiB, V (q8_0): 1088.00 MiB
|
|
|
|
| 130 |
```
|
| 131 |
|
| 132 |
## *NOTE* Building Experimental PRs
|
| 133 |
+
This PR is based on currently un-released PRs so is quite experimental. To build it before PRs are merged try something like this:
|
| 134 |
```bash
|
| 135 |
# get the code setup
|
| 136 |
cd projects
|
| 137 |
git clone https://github.com/ikawrakow/ik_llama.cpp.git
|
| 138 |
git ik_llama.cpp
|
|
|
|
| 139 |
git remote add ubergarm https://github.com/ubergarm/ik_llama.cpp
|
| 140 |
git fetch ubergarm
|
| 141 |
git checkout ug/hunyuan-moe-2
|
|
|
|
|
|
|
| 142 |
|
| 143 |
# build for CUDA
|
| 144 |
cmake -B build -DCMAKE_BUILD_TYPE=Release -DGGML_CUDA=ON -DGGML_VULKAN=OFF -DGGML_RPC=OFF -DGGML_BLAS=OFF -DGGML_CUDA_F16=ON -DGGML_SCHED_MAX_COPIES=1
|
|
|
|
| 150 |
```
|
| 151 |
|
| 152 |
## VRAM Estimations
|
| 153 |
+
Context length = VRAM use:
|
| 154 |
|
| 155 |
* 8k = 3790MiB total with KV self size = 544.00 MiB, K (q8_0): 272.00 MiB, V (q8_0): 272.00 MiB
|
| 156 |
* 32k = 5462MiB total with KV self size = 2176.00 MiB, K (q8_0): 1088.00 MiB, V (q8_0): 1088.00 MiB
|