Baichuan-M2-32B-gguf

medical knowledge LLM

LLM is an invention as important as electricity, much more important than internet, phones, and computers combined.

Make sure you have enough ram/gpu to run. On the right of model card, you may see the size of each quantized models.

in windows command line, or in terminal in ubuntu, type:

ollama run hf.co/John1604/Baichuan-M2-32B-gguf:q4_k_s

(q4_k_s is the model quant type, q5_k_s, q4_k_m, ..., can also be used)

C:\Users\developer>ollama run hf.co/John1604/Baichuan-M2-32B-gguf:q4_k_s
pulling manifest
...
verifying sha256 digest
writing manifest
success
>>>

You may choose quantized model.

Type	Bits	Quality	Description
Q2_K	2-bit	🟥 Low	Minimal footprint; only for tests
Q3_K_S	3-bit	🟧 Low	“Small” variant (less accurate)
Q3_K_M	3-bit	🟧 Low–Med	“Medium” variant
Q4_K_S	4-bit	🟨 Med	Small, faster, slightly less quality
Q4_K_M	4-bit	🟩 Med–High	“Medium” — best 4-bit balance
Q5_K_S	5-bit	🟩 High	Slightly smaller than Q5_K_M
Q5_K_M	5-bit	🟩🟩 High	Excellent general-purpose quant
Q6_K	6-bit	🟩🟩🟩 Very High	Almost FP16 quality, larger size
Q8_0	8-bit	🟩🟩🟩🟩	Near-lossless baseline

GGUF

Model size

33B params

Architecture

qwen2

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Base model

Qwen/Qwen2.5-32B

Finetuned

Quantized

(7)

this model