Migrate model card from transformers-repo

Read announcement at https://discuss.huggingface.co/t/announcement-all-model-cards-will-be-migrated-to-hf-co-model-repos/2755
Original file history: https://github.com/huggingface/transformers/commits/master/model_cards/allenai/scibert_scivocab_uncased/README.md

Files changed (1) hide show

README.md +26 -0

README.md ADDED Viewed

	@@ -0,0 +1,26 @@

+# SciBERT
+This is the pretrained model presented in [SciBERT: A Pretrained Language Model for Scientific Text](https://www.aclweb.org/anthology/D19-1371/), which is a BERT model trained on scientific text.
+The training corpus was papers taken from [Semantic Scholar](https://www.semanticscholar.org). Corpus size is 1.14M papers, 3.1B tokens. We use the full text of the papers in training, not just abstracts.
+SciBERT has its own wordpiece vocabulary (scivocab) that's built to best match the training corpus. We trained cased and uncased versions.
+Available models include:
+* `scibert_scivocab_cased`
+* `scibert_scivocab_uncased`
+The original repo can be found [here](https://github.com/allenai/scibert).
+If using these models, please cite the following paper:
+```
+@inproceedings{beltagy-etal-2019-scibert,
+    title = "SciBERT: A Pretrained Language Model for Scientific Text",
+    author = "Beltagy, Iz  and Lo, Kyle  and Cohan, Arman",
+    booktitle = "EMNLP",
+    year = "2019",
+    publisher = "Association for Computational Linguistics",
+    url = "https://www.aclweb.org/anthology/D19-1371"
+}
+```