|
|
--- |
|
|
tags: |
|
|
- mlx |
|
|
- apple-silicon |
|
|
- text-generation |
|
|
- gemma3 |
|
|
library_name: mlx-lm |
|
|
pipeline_tag: text-generation |
|
|
base_model: Daizee/Gemma3-Callous-Calla-4B |
|
|
--- |
|
|
|
|
|
# Gemma3-Callous-Calla-4B — **MLX** builds (Apple Silicon) |
|
|
|
|
|
This repo hosts **MLX-converted** variants of **Daizee/Gemma3-Callous-Calla-4B** for fast, local inference on Apple Silicon (M-series). |
|
|
Tokenizer/config are included at the repo root. MLX weight folders live under `mlx/`. |
|
|
|
|
|
> **Note on vocab padding:** For MLX compatibility, the tokenizer/embeddings were padded to the next multiple of 64 tokens. |
|
|
> In this build: **262,208 tokens** (added 64 placeholder tokens named `<pad_ex_*>`). |
|
|
|
|
|
## Variants |
|
|
|
|
|
| Path | Bits | Group Size | Notes | |
|
|
|--------------|------|------------|------------------------------------| |
|
|
| `mlx/g128/` | int4 | 128 | Smallest & fastest | |
|
|
| `mlx/g64/` | int4 | 64 | Slightly larger, often steadier | |
|
|
| `mlx/int8/` | 8 | — | Closest to fp16 quality (slower) | |
|
|
|
|
|
## Quickstart (MLX-LM) |
|
|
|
|
|
### Run from Hugging Face (no cloning needed) |
|
|
```bash |
|
|
python -m mlx_lm.generate \ |
|
|
--model hf://Daizee/Gemma3-Callous-Calla-4B-mlx/mlx/g64 \ |
|
|
--prompt "Summarize the Bill of Rights for 7th graders in 4 bullet points." \ |
|
|
--max-tokens 180 --temp 0.3 --top-p 0.92 |
|
|
|