Update README.md
Browse files
README.md
CHANGED
|
@@ -33,6 +33,12 @@ model = AutoModel.from_pretrained("GroNLP/bert-base-dutch-cased") # PyTorch
|
|
| 33 |
model = TFAutoModel.from_pretrained("GroNLP/bert-base-dutch-cased") # Tensorflow
|
| 34 |
```
|
| 35 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 36 |
## Benchmarks
|
| 37 |
|
| 38 |
The arXiv paper lists benchmarks. Here are a couple of comparisons between BERTje, multilingual BERT, BERT-NL and RobBERT that were done after writing the paper. Unlike some other comparisons, the fine-tuning procedures for these benchmarks are identical for each pre-trained model. You may be able to achieve higher scores for individual models by optimizing fine-tuning procedures.
|
|
@@ -69,12 +75,12 @@ Headers in the tables below link to original data sources. Scores link to the mo
|
|
| 69 |
|
| 70 |
```bibtex
|
| 71 |
@misc{devries2019bertje,
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
|
| 75 |
-
|
| 76 |
-
|
| 77 |
-
|
| 78 |
-
|
| 79 |
}
|
| 80 |
```
|
|
|
|
| 33 |
model = TFAutoModel.from_pretrained("GroNLP/bert-base-dutch-cased") # Tensorflow
|
| 34 |
```
|
| 35 |
|
| 36 |
+
**WARNING:** The vocabulary size of BERTje has changed in 2021. If you use an older fine-tuned model and experience problems with the `GroNLP/bert-base-dutch-cased` tokenizer, use use the following tokenizer:
|
| 37 |
+
|
| 38 |
+
```python
|
| 39 |
+
tokenizer = AutoTokenizer.from_pretrained("GroNLP/bert-base-dutch-cased", revision="v1") # v1 is the old vocabulary
|
| 40 |
+
```
|
| 41 |
+
|
| 42 |
## Benchmarks
|
| 43 |
|
| 44 |
The arXiv paper lists benchmarks. Here are a couple of comparisons between BERTje, multilingual BERT, BERT-NL and RobBERT that were done after writing the paper. Unlike some other comparisons, the fine-tuning procedures for these benchmarks are identical for each pre-trained model. You may be able to achieve higher scores for individual models by optimizing fine-tuning procedures.
|
|
|
|
| 75 |
|
| 76 |
```bibtex
|
| 77 |
@misc{devries2019bertje,
|
| 78 |
+
\ttitle = {{BERTje}: {A} {Dutch} {BERT} {Model}},
|
| 79 |
+
\tshorttitle = {{BERTje}},
|
| 80 |
+
\tauthor = {de Vries, Wietse and van Cranenburgh, Andreas and Bisazza, Arianna and Caselli, Tommaso and Noord, Gertjan van and Nissim, Malvina},
|
| 81 |
+
\tyear = {2019},
|
| 82 |
+
\tmonth = dec,
|
| 83 |
+
\thowpublished = {arXiv:1912.09582},
|
| 84 |
+
\turl = {http://arxiv.org/abs/1912.09582},
|
| 85 |
}
|
| 86 |
```
|