nvidia
/

Hymba-1.5B-Instruct

Text Generation

Model card Files Files and versions

YongganFu commited on Nov 26, 2024

Commit

7216465

·

verified ·

1 Parent(s): 58a02fc

Update README.md

Files changed (1) hide show

README.md +10 -2

README.md CHANGED Viewed

@@ -63,13 +63,21 @@ Features of this architecture:
 ### Step 1: Environment Setup
-Since Hymba-1.5B-Instruct employs [FlexAttention](https://pytorch.org/blog/flexattention/), which relies on Pytorch2.5 and other related dependencies, please use the provided `setup.sh` (support CUDA 12.1/12.4) to install the related packages:
 ```
 wget --header="Authorization: Bearer YOUR_HF_TOKEN" https://huggingface.co/nvidia/Hymba-1.5B-Base/resolve/main/setup.sh
 bash setup.sh
 ```
 ### Step 2: Chat with Hymba-1.5B-Instruct
 After setting up the environment, you can use the following script to chat with our Model
@@ -99,7 +107,7 @@ stopping_criteria = StoppingCriteriaList([StopStringCriteria(tokenizer=tokenizer
 outputs = model.generate(
     tokenized_chat,
     max_new_tokens=256,
-    do_sample=True,
     temperature=0.7,
     use_cache=True,
     stopping_criteria=stopping_criteria

 ### Step 1: Environment Setup
+Since Hymba-1.5B-Instruct employs [FlexAttention](https://pytorch.org/blog/flexattention/), which relies on Pytorch2.5 and other related dependencies, we provide two ways to setup the environment:
+- **[Local install]** Install the related packages using our provided `setup.sh` (support CUDA 12.1/12.4):
 ```
 wget --header="Authorization: Bearer YOUR_HF_TOKEN" https://huggingface.co/nvidia/Hymba-1.5B-Base/resolve/main/setup.sh
 bash setup.sh
 ```
+- **[Docker]** A docker image is provided with all of Hymba's dependencies installed. You can download our docker image and start a container using the following commands:
+```
+docker pull ghcr.io/tilmto/hymba:v1
+docker run --gpus all -v /home/$USER:/home/$USER -it ghcr.io/tilmto/hymba:v1 bash
+```
 ### Step 2: Chat with Hymba-1.5B-Instruct
 After setting up the environment, you can use the following script to chat with our Model
 outputs = model.generate(
     tokenized_chat,
     max_new_tokens=256,
+    do_sample=False,
     temperature=0.7,
     use_cache=True,
     stopping_criteria=stopping_criteria