RedHatAI
/

OpenHermes-2.5-Mistral-7B-pruned50-quant-ds

Text Generation

Model card Files Files and versions

mgoin commited on Nov 26, 2023

Commit

0bcd329

·

1 Parent(s): d262277

Update README.md

Files changed (1) hide show

README.md +4 -3

README.md CHANGED Viewed

@@ -15,13 +15,13 @@ tags:
 # OpenHermes 2.5 Mistral 7B - DeepSparse
-This repo contains [DeepSparse](https://github.com/neuralmagic/deepsparse), a sparsity-aware CPU inference runtime, model files for [Teknium's OpenHermes 2.5 Mistral 7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B).
 This model was quantized and pruned with [SparseGPT](https://arxiv.org/abs/2301.00774), using [SparseML](https://github.com/neuralmagic/sparseml).
 ## Inference
-Install [DeepSparse LLM](https://github.com/neuralmagic/deepsparse):
 ```
 pip install deepsparse-nightly[llm]
 ```
@@ -52,7 +52,7 @@ That's a difficult question as there are many people who inspire me. However, on
 ## Sparsification
-See the `recipe.yaml` in this repo and follow the instructions below.
 ```bash
 git clone https://github.com/neuralmagic/sparseml
@@ -62,6 +62,7 @@ python sparseml/src/sparseml/transformers/sparsification/obcq/export.py --task t
 cp deployment/model.onnx deployment/model-orig.onnx
 ```
 ```python
 import os
 import onnx

 # OpenHermes 2.5 Mistral 7B - DeepSparse
+This repo contains model files for [Teknium's OpenHermes 2.5 Mistral 7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) optimized for [DeepSparse](https://github.com/neuralmagic/deepsparse), a CPU inference runtime for sparse models.
 This model was quantized and pruned with [SparseGPT](https://arxiv.org/abs/2301.00774), using [SparseML](https://github.com/neuralmagic/sparseml).
 ## Inference
+Install [DeepSparse LLM](https://github.com/neuralmagic/deepsparse) for fast inference on CPUs:
 ```
 pip install deepsparse-nightly[llm]
 ```
 ## Sparsification
+For details on how this model was sparsified, see the `recipe.yaml` in this repo and follow the instructions below.
 ```bash
 git clone https://github.com/neuralmagic/sparseml
 cp deployment/model.onnx deployment/model-orig.onnx
 ```
+Run this kv-cache injection afterwards:
 ```python
 import os
 import onnx