cturan
/

MiniMax-M2-GGUF

Text Generation

Model card Files Files and versions

cturan commited on 24 days ago

Commit

a44c65b

·

verified ·

1 Parent(s): 4b15118

Update README.md

Files changed (1) hide show

README.md +29 -1

README.md CHANGED Viewed

@@ -5,4 +5,32 @@ library_name: transformers
 base_model:
 - MiniMaxAI/MiniMax-M2
 ---
-Test gguf for this model, will not work with standart llama.cpp, this is just experimental,  https://github.com/cturan/llama.cpp/tree/minimax compile this.

 base_model:
 - MiniMaxAI/MiniMax-M2
 ---
+Test gguf for this model, will not work with standart llama.cpp, this is just experimental,  https://github.com/cturan/llama.cpp/tree/minimax compile this.
+for example
+Ubuntu 22.04 cuda:
+  wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
+  sudo dpkg -i cuda-keyring_1.1-1_all.deb
+  sudo apt-get update
+  sudo apt-get -y install cuda-toolkit-12-8
+  export CUDA_HOME=/usr/local/cuda
+  export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64
+  export PATH=$PATH:$CUDA_HOME/bin
+  apt install cmake
+  git clone --branch minimax --single-branch https://github.com/cturan/llama.cpp.git
+  cd llama.cpp
+  mkdir build
+  cd build
+  cmake .. -DLLAMA_CUDA=ON  -DLLAMA_CURL=OFF
+  cmake --build . --config Release --parallel $(nproc --all)
+all done now you have binaries in llama.cpp/build/bin
+run it like
+./llama-server -m minimax-m2-Q4_K.gguf -ngl 999 --cpu-moe --jinja -fa on -c 32000 --reasoning-format auto
+this will offload experts to cpu so you just need 16gb vram