cturan commited on
Commit
a44c65b
·
verified ·
1 Parent(s): 4b15118

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +29 -1
README.md CHANGED
@@ -5,4 +5,32 @@ library_name: transformers
5
  base_model:
6
  - MiniMaxAI/MiniMax-M2
7
  ---
8
- Test gguf for this model, will not work with standart llama.cpp, this is just experimental, https://github.com/cturan/llama.cpp/tree/minimax compile this.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  base_model:
6
  - MiniMaxAI/MiniMax-M2
7
  ---
8
+ Test gguf for this model, will not work with standart llama.cpp, this is just experimental, https://github.com/cturan/llama.cpp/tree/minimax compile this.
9
+
10
+ for example
11
+
12
+ Ubuntu 22.04 cuda:
13
+ wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
14
+ sudo dpkg -i cuda-keyring_1.1-1_all.deb
15
+ sudo apt-get update
16
+ sudo apt-get -y install cuda-toolkit-12-8
17
+ export CUDA_HOME=/usr/local/cuda
18
+ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64
19
+ export PATH=$PATH:$CUDA_HOME/bin
20
+ apt install cmake
21
+
22
+ git clone --branch minimax --single-branch https://github.com/cturan/llama.cpp.git
23
+ cd llama.cpp
24
+ mkdir build
25
+ cd build
26
+ cmake .. -DLLAMA_CUDA=ON -DLLAMA_CURL=OFF
27
+ cmake --build . --config Release --parallel $(nproc --all)
28
+
29
+ all done now you have binaries in llama.cpp/build/bin
30
+
31
+ run it like
32
+
33
+ ./llama-server -m minimax-m2-Q4_K.gguf -ngl 999 --cpu-moe --jinja -fa on -c 32000 --reasoning-format auto
34
+
35
+ this will offload experts to cpu so you just need 16gb vram
36
+