Upload complete model
Browse files
README.md
CHANGED
|
@@ -10,9 +10,9 @@ base_model: moonshotai/Kimi-K2-Instruct-0905
|
|
| 10 |
**CURRENTLY UPLOADING**
|
| 11 |
*Notice will be removed once complete*
|
| 12 |
|
| 13 |
-
**See Kimi-K2-Instruct-0905 Dynamic MLX in action - [
|
| 14 |
|
| 15 |
-
*q3.824bit dynamic quant typically achieves 1
|
| 16 |
| Quantization | Perplexity |
|
| 17 |
|:------------:|:----------:|
|
| 18 |
| **q2** | 41.293 |
|
|
@@ -28,8 +28,8 @@ base_model: moonshotai/Kimi-K2-Instruct-0905
|
|
| 28 |
|
| 29 |
* Runs on a single M3 Ultra 512GB RAM using [Inferencer app](https://inferencer.com)
|
| 30 |
* Does not require expanding VRAM limit
|
| 31 |
-
* However expanding it will allow you to
|
| 32 |
* `sudo sysctl iogpu.wired_limit_mb=507000`
|
| 33 |
* Expect ~20 tokens/s
|
| 34 |
* Quantized with a modified version of [MLX](https://github.com/ml-explore/mlx) 0.26
|
| 35 |
-
* For more details see [demonstration video](https://youtu.be
|
|
|
|
| 10 |
**CURRENTLY UPLOADING**
|
| 11 |
*Notice will be removed once complete*
|
| 12 |
|
| 13 |
+
**See Kimi-K2-Instruct-0905 Dynamic MLX in action - [https://youtu.be/Ia-q3Ll4tAY](https://youtu.be/Ia-q3Ll4tAY)**
|
| 14 |
|
| 15 |
+
*q3.824bit dynamic quant typically achieves 1.256 perplexity in our testing, slotting closer to q4 perplexity (1.168) than q3 perplexity (1.900).*
|
| 16 |
| Quantization | Perplexity |
|
| 17 |
|:------------:|:----------:|
|
| 18 |
| **q2** | 41.293 |
|
|
|
|
| 28 |
|
| 29 |
* Runs on a single M3 Ultra 512GB RAM using [Inferencer app](https://inferencer.com)
|
| 30 |
* Does not require expanding VRAM limit
|
| 31 |
+
* However expanding it will allow you to use larger context windows:
|
| 32 |
* `sudo sysctl iogpu.wired_limit_mb=507000`
|
| 33 |
* Expect ~20 tokens/s
|
| 34 |
* Quantized with a modified version of [MLX](https://github.com/ml-explore/mlx) 0.26
|
| 35 |
+
* For more details see [demonstration video](https://youtu.be/Ia-q3Ll4tAY) or visit [Kimi K2](https://huggingface.co/moonshotai/Kimi-K2-Instruct-0905).
|