Baichuan-M2-32B-gguf

medical knowledge LLM

LLM is an invention as important as electricity, much more important than internet, phones, and computers combined.

Make sure you have enough ram/gpu to run. On the right of model card, you may see the size of each quantized models.

Use the model in ollama

First download and install ollama.

https://ollama.com/download

Command

in windows command line, or in terminal in ubuntu, type:

ollama run hf.co/John1604/Baichuan-M2-32B-gguf:q4_k_s

(q4_k_s is the model quant type, q5_k_s, q4_k_m, ..., can also be used)

C:\Users\developer>ollama run hf.co/John1604/Baichuan-M2-32B-gguf:q4_k_s
pulling manifest
...
verifying sha256 digest
writing manifest
success
>>> 

Use the model in LM Studio

download and install LM Studio

https://lmstudio.ai/

Discover models

In the LM Studio, click "Discover" icon. "Mission Control" popup window will be displayed.

In the "Mission Control" search bar, type "John1604/Baichuan-M2-32B-gguf" and check "GGUF", the model should be found.

Download the model.

You may choose quantized model.

Load the model.

Ask questions.

quantized models

Type Bits Quality Description
Q2_K 2-bit 🟥 Low Minimal footprint; only for tests
Q3_K_S 3-bit 🟧 Low “Small” variant (less accurate)
Q3_K_M 3-bit 🟧 Low–Med “Medium” variant
Q4_K_S 4-bit 🟨 Med Small, faster, slightly less quality
Q4_K_M 4-bit 🟩 Med–High “Medium” — best 4-bit balance
Q5_K_S 5-bit 🟩 High Slightly smaller than Q5_K_M
Q5_K_M 5-bit 🟩🟩 High Excellent general-purpose quant
Q6_K 6-bit 🟩🟩🟩 Very High Almost FP16 quality, larger size
Q8_0 8-bit 🟩🟩🟩🟩 Near-lossless baseline
Downloads last month
284
GGUF
Model size
33B params
Architecture
qwen2
Hardware compatibility
Log In to view the estimation

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for John1604/Baichuan-M2-32B-gguf

Base model

Qwen/Qwen2.5-32B
Quantized
(7)
this model