Update README.md
Browse files
README.md
CHANGED
|
@@ -15,13 +15,13 @@ tags:
|
|
| 15 |
|
| 16 |
# OpenHermes 2.5 Mistral 7B - DeepSparse
|
| 17 |
|
| 18 |
-
This repo contains
|
| 19 |
|
| 20 |
This model was quantized and pruned with [SparseGPT](https://arxiv.org/abs/2301.00774), using [SparseML](https://github.com/neuralmagic/sparseml).
|
| 21 |
|
| 22 |
## Inference
|
| 23 |
|
| 24 |
-
Install [DeepSparse LLM](https://github.com/neuralmagic/deepsparse):
|
| 25 |
```
|
| 26 |
pip install deepsparse-nightly[llm]
|
| 27 |
```
|
|
@@ -52,7 +52,7 @@ That's a difficult question as there are many people who inspire me. However, on
|
|
| 52 |
|
| 53 |
## Sparsification
|
| 54 |
|
| 55 |
-
|
| 56 |
|
| 57 |
```bash
|
| 58 |
git clone https://github.com/neuralmagic/sparseml
|
|
@@ -62,6 +62,7 @@ python sparseml/src/sparseml/transformers/sparsification/obcq/export.py --task t
|
|
| 62 |
cp deployment/model.onnx deployment/model-orig.onnx
|
| 63 |
```
|
| 64 |
|
|
|
|
| 65 |
```python
|
| 66 |
import os
|
| 67 |
import onnx
|
|
|
|
| 15 |
|
| 16 |
# OpenHermes 2.5 Mistral 7B - DeepSparse
|
| 17 |
|
| 18 |
+
This repo contains model files for [Teknium's OpenHermes 2.5 Mistral 7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) optimized for [DeepSparse](https://github.com/neuralmagic/deepsparse), a CPU inference runtime for sparse models.
|
| 19 |
|
| 20 |
This model was quantized and pruned with [SparseGPT](https://arxiv.org/abs/2301.00774), using [SparseML](https://github.com/neuralmagic/sparseml).
|
| 21 |
|
| 22 |
## Inference
|
| 23 |
|
| 24 |
+
Install [DeepSparse LLM](https://github.com/neuralmagic/deepsparse) for fast inference on CPUs:
|
| 25 |
```
|
| 26 |
pip install deepsparse-nightly[llm]
|
| 27 |
```
|
|
|
|
| 52 |
|
| 53 |
## Sparsification
|
| 54 |
|
| 55 |
+
For details on how this model was sparsified, see the `recipe.yaml` in this repo and follow the instructions below.
|
| 56 |
|
| 57 |
```bash
|
| 58 |
git clone https://github.com/neuralmagic/sparseml
|
|
|
|
| 62 |
cp deployment/model.onnx deployment/model-orig.onnx
|
| 63 |
```
|
| 64 |
|
| 65 |
+
Run this kv-cache injection afterwards:
|
| 66 |
```python
|
| 67 |
import os
|
| 68 |
import onnx
|