English

Improve model card for mmBERT training checkpoints with metadata and usage

#1
by nielsr HF Staff - opened

This PR significantly enhances the model card for the mmBERT raw training checkpoints by:

  • Updating the language tag to mul to accurately reflect its multilingual nature (over 1800 languages).
  • Adding pipeline_tag: feature-extraction to improve discoverability on the Hub and enable the automated widget. While this repository contains raw checkpoints, the associated models are commonly used for feature extraction.
  • Specifying library_name: transformers as the linked GitHub repository provides extensive usage examples with the transformers library.
  • Adding descriptive tags like bert, multilingual, and encoder-only for better searchability.
  • Updating the main title to reflect the paper's title and linking to the Hugging Face paper page.
  • Incorporating the paper's abstract.
  • Including comprehensive usage examples from the GitHub README's "Quick Start" and "Getting Started" sections, showcasing how to use the associated mmBERT models (e.g., mmbert-small, mmbert-base) for various tasks like feature extraction, masked language modeling, classification, and retrieval.
  • Clarifying the purpose of this repository as containing raw training checkpoints and directing users to the model collection for runnable models.
  • Integrating other valuable sections from the GitHub README such as "Model Family", "Training Details", "Evaluation", "FAQ", and "Limitations" to provide a complete overview of the project.

These changes make the model card more informative, discoverable, and user-friendly on the Hugging Face Hub.

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment