Daizee
/

Gemma3-Callous-Calla-4B-mlx

Text Generation

Model card Files Files and versions

Gemma3-Callous-Calla-4B-mlx / README.md

Daizee's picture

Create README.md

1eb80af verified 24 days ago

|

history blame contribute delete

1.33 kB

	---
	tags:
	- mlx
	- apple-silicon
	- text-generation
	- gemma3
	library_name: mlx-lm
	pipeline_tag: text-generation
	base_model: Daizee/Gemma3-Callous-Calla-4B
	---

	# Gemma3-Callous-Calla-4B — MLX builds (Apple Silicon)

	This repo hosts MLX-converted variants of Daizee/Gemma3-Callous-Calla-4B for fast, local inference on Apple Silicon (M-series).
	Tokenizer/config are included at the repo root. MLX weight folders live under `mlx/`.

	> Note on vocab padding: For MLX compatibility, the tokenizer/embeddings were padded to the next multiple of 64 tokens.
	> In this build: 262,208 tokens (added 64 placeholder tokens named `<pad_ex_*>`).

	## Variants

	\| Path \| Bits \| Group Size \| Notes \|
	\|--------------\|------\|------------\|------------------------------------\|
	\| `mlx/g128/` \| int4 \| 128 \| Smallest & fastest \|
	\| `mlx/g64/` \| int4 \| 64 \| Slightly larger, often steadier \|
	\| `mlx/int8/` \| 8 \| — \| Closest to fp16 quality (slower) \|

	## Quickstart (MLX-LM)

	### Run from Hugging Face (no cloning needed)
	```bash
	python -m mlx_lm.generate \
	--model hf://Daizee/Gemma3-Callous-Calla-4B-mlx/mlx/g64 \
	--prompt "Summarize the Bill of Rights for 7th graders in 4 bullet points." \
	--max-tokens 180 --temp 0.3 --top-p 0.92