Align tokenizer with mistral-common (#45)
Browse files- Align tokenizer with mistral-common (53f216c52ce4534a38a71c21861acd514fa8a904)
- Defend the honour of the Hugging Face tokenizer (684c1751c210aa11e0b187c0eac1b7b2bd4d7967)
- Update to tokenizer v3 with correct proper special tokens (106a1b0c338ddbd0e3e42dbeb63634bc85d6f71b)
- Re-add chat template (3256c7e7ea279386e0cdd18553202ed78c4d735b)
Co-authored-by: Matthew Carrigan <[email protected]>
- README.md +0 -5
- tokenizer.json +0 -0
- tokenizer.model +2 -2
- tokenizer_config.json +0 -0
README.md
CHANGED
|
@@ -13,11 +13,6 @@ extra_gated_description: If you want to learn more about how we process your per
|
|
| 13 |
|
| 14 |
# Model Card for Codestral-22B-v0.1
|
| 15 |
|
| 16 |
-
###
|
| 17 |
-
|
| 18 |
-
> [!WARNING]
|
| 19 |
-
> 🚫
|
| 20 |
-
> The `transformers` tokenizer is not properly configured. Make sure that your encoding and decoding is correct by using `mistral-common` as shown below:
|
| 21 |
|
| 22 |
## Encode and Decode with `mistral_common`
|
| 23 |
|
|
|
|
| 13 |
|
| 14 |
# Model Card for Codestral-22B-v0.1
|
| 15 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 16 |
|
| 17 |
## Encode and Decode with `mistral_common`
|
| 18 |
|
tokenizer.json
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
tokenizer.model
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:9addc8bdce5988448ae81b729336f43a81262160ae8da760674badab9d4c7d33
|
| 3 |
+
size 587591
|
tokenizer_config.json
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|