Migrate model card from transformers-repo
Browse filesRead announcement at https://discuss.huggingface.co/t/announcement-all-model-cards-will-be-migrated-to-hf-co-model-repos/2755
Original file history: https://github.com/huggingface/transformers/commits/master/model_cards/allenai/scibert_scivocab_uncased/README.md
README.md
ADDED
|
@@ -0,0 +1,26 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# SciBERT
|
| 2 |
+
|
| 3 |
+
This is the pretrained model presented in [SciBERT: A Pretrained Language Model for Scientific Text](https://www.aclweb.org/anthology/D19-1371/), which is a BERT model trained on scientific text.
|
| 4 |
+
|
| 5 |
+
The training corpus was papers taken from [Semantic Scholar](https://www.semanticscholar.org). Corpus size is 1.14M papers, 3.1B tokens. We use the full text of the papers in training, not just abstracts.
|
| 6 |
+
|
| 7 |
+
SciBERT has its own wordpiece vocabulary (scivocab) that's built to best match the training corpus. We trained cased and uncased versions.
|
| 8 |
+
|
| 9 |
+
Available models include:
|
| 10 |
+
* `scibert_scivocab_cased`
|
| 11 |
+
* `scibert_scivocab_uncased`
|
| 12 |
+
|
| 13 |
+
|
| 14 |
+
The original repo can be found [here](https://github.com/allenai/scibert).
|
| 15 |
+
|
| 16 |
+
If using these models, please cite the following paper:
|
| 17 |
+
```
|
| 18 |
+
@inproceedings{beltagy-etal-2019-scibert,
|
| 19 |
+
title = "SciBERT: A Pretrained Language Model for Scientific Text",
|
| 20 |
+
author = "Beltagy, Iz and Lo, Kyle and Cohan, Arman",
|
| 21 |
+
booktitle = "EMNLP",
|
| 22 |
+
year = "2019",
|
| 23 |
+
publisher = "Association for Computational Linguistics",
|
| 24 |
+
url = "https://www.aclweb.org/anthology/D19-1371"
|
| 25 |
+
}
|
| 26 |
+
```
|