Update README.md
Browse files
README.md
CHANGED
|
@@ -67,10 +67,10 @@ Please refer to [OpenChatKit](https://github.com/togethercomputer/OpenChatKit) f
|
|
| 67 |
|
| 68 |
## Inference
|
| 69 |
|
| 70 |
-
You can use the Together API to try out Llama-2-7B-32K-beta for inference.
|
| 71 |
-
The updated inference stack allows for efficient
|
| 72 |
|
| 73 |
-
To run the model locally, we strongly recommend to install Flash Attention V2:
|
| 74 |
```
|
| 75 |
# Please update the path of `CUDA_HOME`
|
| 76 |
export CUDA_HOME=/usr/local/cuda-11.8
|
|
|
|
| 67 |
|
| 68 |
## Inference
|
| 69 |
|
| 70 |
+
You can use the [Together API](https://together.ai/blog/api-announcement) to try out Llama-2-7B-32K-beta for inference.
|
| 71 |
+
The updated inference stack allows for efficient inference.
|
| 72 |
|
| 73 |
+
To run the model locally, we strongly recommend to install Flash Attention V2, which is necessary to obtain the best performance:
|
| 74 |
```
|
| 75 |
# Please update the path of `CUDA_HOME`
|
| 76 |
export CUDA_HOME=/usr/local/cuda-11.8
|