Gemma3-Callous-Calla-4B-mlx / README.md

Daizee

Create README.md

1eb80af verified 20 days ago

preview code

raw

history blame contribute delete

1.33 kB

metadata

tags:
  - mlx
  - apple-silicon
  - text-generation
  - gemma3
library_name: mlx-lm
pipeline_tag: text-generation
base_model: Daizee/Gemma3-Callous-Calla-4B

Gemma3-Callous-Calla-4B — MLX builds (Apple Silicon)

This repo hosts MLX-converted variants of Daizee/Gemma3-Callous-Calla-4B for fast, local inference on Apple Silicon (M-series).
Tokenizer/config are included at the repo root. MLX weight folders live under mlx/.

Note on vocab padding: For MLX compatibility, the tokenizer/embeddings were padded to the next multiple of 64 tokens.
In this build: 262,208 tokens (added 64 placeholder tokens named <pad_ex_*>).

Variants

Path	Bits	Group Size	Notes
`mlx/g128/`	int4	128	Smallest & fastest
`mlx/g64/`	int4	64	Slightly larger, often steadier
`mlx/int8/`	8	—	Closest to fp16 quality (slower)

Quickstart (MLX-LM)

Run from Hugging Face (no cloning needed)

python -m mlx_lm.generate \
  --model hf://Daizee/Gemma3-Callous-Calla-4B-mlx/mlx/g64 \
  --prompt "Summarize the Bill of Rights for 7th graders in 4 bullet points." \
  --max-tokens 180 --temp 0.3 --top-p 0.92