Improve model card for mmBERT training checkpoints with metadata and usage
#1
by
nielsr
HF Staff
- opened
This PR significantly enhances the model card for the mmBERT raw training checkpoints by:
- Updating the
languagetag tomulto accurately reflect its multilingual nature (over 1800 languages). - Adding
pipeline_tag: feature-extractionto improve discoverability on the Hub and enable the automated widget. While this repository contains raw checkpoints, the associated models are commonly used for feature extraction. - Specifying
library_name: transformersas the linked GitHub repository provides extensive usage examples with thetransformerslibrary. - Adding descriptive
tagslikebert,multilingual, andencoder-onlyfor better searchability. - Updating the main title to reflect the paper's title and linking to the Hugging Face paper page.
- Incorporating the paper's abstract.
- Including comprehensive usage examples from the GitHub README's "Quick Start" and "Getting Started" sections, showcasing how to use the associated
mmBERTmodels (e.g.,mmbert-small,mmbert-base) for various tasks like feature extraction, masked language modeling, classification, and retrieval. - Clarifying the purpose of this repository as containing raw training checkpoints and directing users to the model collection for runnable models.
- Integrating other valuable sections from the GitHub README such as "Model Family", "Training Details", "Evaluation", "FAQ", and "Limitations" to provide a complete overview of the project.
These changes make the model card more informative, discoverable, and user-friendly on the Hugging Face Hub.