ibm-granite
/

granite-3.0-2b-base

Text Generation

Model card Files Files and versions

amezasor commited on Oct 15, 2024

Commit

ef0f96e

·

verified ·

1 Parent(s): 5e31aec

data update

Files changed (1) hide show

README.md +4 -2

README.md CHANGED Viewed

@@ -208,7 +208,7 @@ model-index:
 # Granite-3.0-2B-Base
 ## Model Summary
-**Granite-3.0-2B-Base** is an open-source decoder-only language model from IBM Research that supports a variety of text-to-text generation tasks (e.g., question-answering, text-completion). **Granite-3.0-2B-Base** is trained from scratch and follows a two-phase training strategy. In the first phase, it is trained on 10 trillion tokens sourced from diverse domains, including natural language, math, code, and safety. During the second phase, it is further trained on 2 trillion tokens using a carefully curated mix of high-quality data, aiming to enhance its performance on specific tasks.
 - **Developers:** IBM Research
 - **GitHub Repository:** [ibm-granite/granite-language-models](https://github.com/ibm-granite/granite-language-models)
@@ -280,7 +280,9 @@ print(output)
 <!-- TO DO: To be completed once the paper is ready -->
 ## Training Data
-This model is trained on a mix of open-source and proprietary datasets.
 ## Infrastructure
 We train the Granite Language models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs.

 # Granite-3.0-2B-Base
 ## Model Summary
+**Granite-3.0-2B-Base** is an open-source decoder-only language model from IBM Research that supports a variety of text-to-text generation tasks (e.g., question-answering, text-completion). **Granite-3.0-2B-Base** is trained from scratch and follows a two-phase training strategy. In the first phase, it is trained on 10 trillion tokens sourced from diverse domains. During the second phase, it is further trained on 2 trillion tokens using a carefully curated mix of high-quality data, aiming to enhance its performance on specific tasks.
 - **Developers:** IBM Research
 - **GitHub Repository:** [ibm-granite/granite-language-models](https://github.com/ibm-granite/granite-language-models)
 <!-- TO DO: To be completed once the paper is ready -->
 ## Training Data
+This model is trained on a mix of open-source and proprietary data following a two-phase training strategy.
+* Phase 1 data: The data for phase 1 is sourced from diverse domains, such as: web, code, academic sources, books, and math data.
+* Phase 2 data: The data for phase 2 comprises a curated mix of high-quality data from the same domains, plus multilingual and instruction data. The goal of this second training phase is to enhance the model’s performance on specific tasks.
 ## Infrastructure
 We train the Granite Language models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs.