Gemma3-Callous-Calla-4B β€” MLX builds (Apple Silicon)

This repo hosts MLX-converted variants of Daizee/Gemma3-Callous-Calla-4B for fast, local inference on Apple Silicon (M-series).
Tokenizer/config are included at the repo root. MLX weight folders live under mlx/.

Note on vocab padding: For MLX compatibility, the tokenizer/embeddings were padded to the next multiple of 64 tokens.
In this build: 262,208 tokens (added 64 placeholder tokens named <pad_ex_*>).

Variants

Path Bits Group Size Notes
mlx/g128/ int4 128 Smallest & fastest
mlx/g64/ int4 64 Slightly larger, often steadier
mlx/int8/ 8 β€” Closest to fp16 quality (slower)

Quickstart (MLX-LM)

Run from Hugging Face (no cloning needed)

python -m mlx_lm.generate \
  --model hf://Daizee/Gemma3-Callous-Calla-4B-mlx/mlx/g64 \
  --prompt "Summarize the Bill of Rights for 7th graders in 4 bullet points." \
  --max-tokens 180 --temp 0.3 --top-p 0.92
Downloads last month
73
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Daizee/Gemma3-Callous-Calla-4B-mlx

Finetuned
(1)
this model