Align tokenizer with mistral-common (#45)

- Align tokenizer with mistral-common (53f216c52ce4534a38a71c21861acd514fa8a904)
- Defend the honour of the Hugging Face tokenizer (684c1751c210aa11e0b187c0eac1b7b2bd4d7967)
- Update to tokenizer v3 with correct proper special tokens (106a1b0c338ddbd0e3e42dbeb63634bc85d6f71b)
- Re-add chat template (3256c7e7ea279386e0cdd18553202ed78c4d735b)

Co-authored-by: Matthew Carrigan <[email protected]>

Files changed (4) hide show

README.md CHANGED Viewed

@@ -13,11 +13,6 @@ extra_gated_description: If you want to learn more about how we process your per
 # Model Card for Codestral-22B-v0.1
-###
-> [!WARNING]
-> 🚫
-> The `transformers` tokenizer is not properly configured. Make sure that your encoding and decoding is correct by using `mistral-common` as shown below:
 ## Encode and Decode with `mistral_common`


13
14	# Model Card for Codestral-22B-v0.1
15





16
17	## Encode and Decode with `mistral_common`
18

tokenizer.json CHANGED Viewed

The diff for this file is too large to render. See raw diff

tokenizer.model CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:37f00374dea48658ee8f5d0f21895b9bc55cb0103939607c8185bfd1c6ca1f89
-size 587404

 version https://git-lfs.github.com/spec/v1
+oid sha256:9addc8bdce5988448ae81b729336f43a81262160ae8da760674badab9d4c7d33
+size 587591

tokenizer_config.json CHANGED Viewed

The diff for this file is too large to render. See raw diff