Daizee's picture
Create README.md
1eb80af verified
metadata
tags:
  - mlx
  - apple-silicon
  - text-generation
  - gemma3
library_name: mlx-lm
pipeline_tag: text-generation
base_model: Daizee/Gemma3-Callous-Calla-4B

Gemma3-Callous-Calla-4B — MLX builds (Apple Silicon)

This repo hosts MLX-converted variants of Daizee/Gemma3-Callous-Calla-4B for fast, local inference on Apple Silicon (M-series).
Tokenizer/config are included at the repo root. MLX weight folders live under mlx/.

Note on vocab padding: For MLX compatibility, the tokenizer/embeddings were padded to the next multiple of 64 tokens.
In this build: 262,208 tokens (added 64 placeholder tokens named <pad_ex_*>).

Variants

Path Bits Group Size Notes
mlx/g128/ int4 128 Smallest & fastest
mlx/g64/ int4 64 Slightly larger, often steadier
mlx/int8/ 8 Closest to fp16 quality (slower)

Quickstart (MLX-LM)

Run from Hugging Face (no cloning needed)

python -m mlx_lm.generate \
  --model hf://Daizee/Gemma3-Callous-Calla-4B-mlx/mlx/g64 \
  --prompt "Summarize the Bill of Rights for 7th graders in 4 bullet points." \
  --max-tokens 180 --temp 0.3 --top-p 0.92