Update README.md
Browse files
README.md
CHANGED
|
@@ -24,7 +24,10 @@ quantized_by: jartine
|
|
| 24 |
|
| 25 |
This is a large language model that was released by Meta on 2024-07-23.
|
| 26 |
As of its release date, this is the largest and most complex open
|
| 27 |
-
weights model available.
|
|
|
|
|
|
|
|
|
|
| 28 |
|
| 29 |
- Model creator: [Meta](https://huggingface.co/meta-llama/)
|
| 30 |
- Original model: [meta-llama/Meta-Llama-3.1-405B](https://huggingface.co/meta-llama/Meta-Llama-3.1-405B)
|
|
@@ -37,12 +40,21 @@ FreeBSD, OpenBSD and NetBSD systems you control on both AMD64 and ARM64.
|
|
| 37 |
## Quickstart
|
| 38 |
|
| 39 |
Running the following on a desktop OS will launch a tab in your web
|
| 40 |
-
browser.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 41 |
|
| 42 |
```
|
| 43 |
-
wget https://huggingface.co/Mozilla/Meta-Llama-3.1-405B-llamafile/resolve/main/Meta-Llama-3.1-405B.
|
| 44 |
-
|
| 45 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 46 |
```
|
| 47 |
|
| 48 |
You can then use the completion mode of the GUI to experiment with this
|
|
@@ -53,9 +65,10 @@ model. You can prompt the model for completions on the command line too:
|
|
| 53 |
```
|
| 54 |
|
| 55 |
This model has a max context window size of 128k tokens. By default, a
|
| 56 |
-
context window size of
|
| 57 |
window by passing the `-c 8192` flag. The software currently has
|
| 58 |
-
limitations that may prevent scaling to the
|
|
|
|
| 59 |
|
| 60 |
On GPUs with sufficient RAM, the `-ngl 999` flag may be passed to use
|
| 61 |
the system's NVIDIA or AMD GPU(s). On Windows, only the graphics card
|
|
|
|
| 24 |
|
| 25 |
This is a large language model that was released by Meta on 2024-07-23.
|
| 26 |
As of its release date, this is the largest and most complex open
|
| 27 |
+
weights model available. This is the base model. It hasn't been fine
|
| 28 |
+
tuned to follow your instructions. See also
|
| 29 |
+
[Meta-Llama-3.1-405B-Instruct-llamafile](https://huggingface.co/Mozilla/Meta-Llama-3.1-405B-Instruct-llamafile)
|
| 30 |
+
for a friendlier and more useful version of this model.
|
| 31 |
|
| 32 |
- Model creator: [Meta](https://huggingface.co/meta-llama/)
|
| 33 |
- Original model: [meta-llama/Meta-Llama-3.1-405B](https://huggingface.co/meta-llama/Meta-Llama-3.1-405B)
|
|
|
|
| 40 |
## Quickstart
|
| 41 |
|
| 42 |
Running the following on a desktop OS will launch a tab in your web
|
| 43 |
+
browser. The smallest weights available are are Q2\_K which should work
|
| 44 |
+
fine on systems with at least 150 GB of RAM. This llamafile needs to be
|
| 45 |
+
downloaded in multiple files, due to HuggingFace's 50GB upload limit and
|
| 46 |
+
then concatenated back together locally. Therefore you'll need at least
|
| 47 |
+
400GB of free disk space.
|
| 48 |
|
| 49 |
```
|
| 50 |
+
wget https://huggingface.co/Mozilla/Meta-Llama-3.1-405B-llamafile/resolve/main/Meta-Llama-3.1-405B.Q2_K.cat0.llamafile
|
| 51 |
+
wget https://huggingface.co/Mozilla/Meta-Llama-3.1-405B-llamafile/resolve/main/Meta-Llama-3.1-405B.Q2_K.cat1.llamafile
|
| 52 |
+
wget https://huggingface.co/Mozilla/Meta-Llama-3.1-405B-llamafile/resolve/main/Meta-Llama-3.1-405B.Q2_K.cat2.llamafile
|
| 53 |
+
wget https://huggingface.co/Mozilla/Meta-Llama-3.1-405B-llamafile/resolve/main/Meta-Llama-3.1-405B.Q2_K.cat3.llamafile
|
| 54 |
+
cat Meta-Llama-3.1-405B.Q2_K.cat{0,1,2,3}.llamafile >Meta-Llama-3.1-405B.Q2_K.llamafile
|
| 55 |
+
rm Meta-Llama-3.1-405B.Q2_K.cat*.llamafile
|
| 56 |
+
chmod +x Meta-Llama-3.1-405B.Q2_K.llamafile
|
| 57 |
+
./Meta-Llama-3.1-405B.Q2_K.llamafile
|
| 58 |
```
|
| 59 |
|
| 60 |
You can then use the completion mode of the GUI to experiment with this
|
|
|
|
| 65 |
```
|
| 66 |
|
| 67 |
This model has a max context window size of 128k tokens. By default, a
|
| 68 |
+
context window size of 4096 tokens is used. You can use a larger context
|
| 69 |
window by passing the `-c 8192` flag. The software currently has
|
| 70 |
+
limitations in its llama v3.1 support that may prevent scaling to the
|
| 71 |
+
full 128k size.
|
| 72 |
|
| 73 |
On GPUs with sufficient RAM, the `-ngl 999` flag may be passed to use
|
| 74 |
the system's NVIDIA or AMD GPU(s). On Windows, only the graphics card
|