--- tags: - mlx - apple-silicon - text-generation - gemma3 library_name: mlx-lm pipeline_tag: text-generation base_model: Daizee/Gemma3-Callous-Calla-4B --- # Gemma3-Callous-Calla-4B — **MLX** builds (Apple Silicon) This repo hosts **MLX-converted** variants of **Daizee/Gemma3-Callous-Calla-4B** for fast, local inference on Apple Silicon (M-series). Tokenizer/config are included at the repo root. MLX weight folders live under `mlx/`. > **Note on vocab padding:** For MLX compatibility, the tokenizer/embeddings were padded to the next multiple of 64 tokens. > In this build: **262,208 tokens** (added 64 placeholder tokens named ``). ## Variants | Path | Bits | Group Size | Notes | |--------------|------|------------|------------------------------------| | `mlx/g128/` | int4 | 128 | Smallest & fastest | | `mlx/g64/` | int4 | 64 | Slightly larger, often steadier | | `mlx/int8/` | 8 | — | Closest to fp16 quality (slower) | ## Quickstart (MLX-LM) ### Run from Hugging Face (no cloning needed) ```bash python -m mlx_lm.generate \ --model hf://Daizee/Gemma3-Callous-Calla-4B-mlx/mlx/g64 \ --prompt "Summarize the Bill of Rights for 7th graders in 4 bullet points." \ --max-tokens 180 --temp 0.3 --top-p 0.92