inferencerlabs
/

Kimi-K2-Instruct-0905-MLX-3.825bit

Text Generation

4-bit precision

Model card Files Files and versions

inferencerlabs commited on Sep 6

Commit

de44468

·

verified ·

1 Parent(s): b7c634d

Upload complete model

Files changed (1) hide show

README.md +4 -4

README.md CHANGED Viewed

@@ -10,9 +10,9 @@ base_model: moonshotai/Kimi-K2-Instruct-0905
 **CURRENTLY UPLOADING**
 *Notice will be removed once complete*
-**See Kimi-K2-Instruct-0905 Dynamic MLX in action - [COMING SOON](https://youtu.be/-zfUvA2CDqE)**
-*q3.824bit dynamic quant typically achieves 1.... perplexity in our testing, slotting closer to q4 perplexity (1.168) than q3 perplexity (1.900).*
 | Quantization | Perplexity |
 |:------------:|:----------:|
 | **q2**       | 41.293     |
@@ -28,8 +28,8 @@ base_model: moonshotai/Kimi-K2-Instruct-0905
 * Runs on a single M3 Ultra 512GB RAM using [Inferencer app](https://inferencer.com)
 * Does not require expanding VRAM limit
-  * However expanding it will allow you to avoid slow downs with larger context windows:
   * `sudo sysctl iogpu.wired_limit_mb=507000`
 * Expect ~20 tokens/s
 * Quantized with a modified version of [MLX](https://github.com/ml-explore/mlx) 0.26
-* For more details see [demonstration video](https://youtu.be/-zfUvA2CDqE) or visit [Kimi K2](https://huggingface.co/moonshotai/Kimi-K2-Instruct-0905).

 **CURRENTLY UPLOADING**
 *Notice will be removed once complete*
+**See Kimi-K2-Instruct-0905 Dynamic MLX in action - [https://youtu.be/Ia-q3Ll4tAY](https://youtu.be/Ia-q3Ll4tAY)**
+*q3.824bit dynamic quant typically achieves 1.256 perplexity in our testing, slotting closer to q4 perplexity (1.168) than q3 perplexity (1.900).*
 | Quantization | Perplexity |
 |:------------:|:----------:|
 | **q2**       | 41.293     |
 * Runs on a single M3 Ultra 512GB RAM using [Inferencer app](https://inferencer.com)
 * Does not require expanding VRAM limit
+  * However expanding it will allow you to use larger context windows:
   * `sudo sysctl iogpu.wired_limit_mb=507000`
 * Expect ~20 tokens/s
 * Quantized with a modified version of [MLX](https://github.com/ml-explore/mlx) 0.26
+* For more details see [demonstration video](https://youtu.be/Ia-q3Ll4tAY) or visit [Kimi K2](https://huggingface.co/moonshotai/Kimi-K2-Instruct-0905).