data update
Browse files
README.md
CHANGED
|
@@ -208,7 +208,7 @@ model-index:
|
|
| 208 |
# Granite-3.0-2B-Base
|
| 209 |
|
| 210 |
## Model Summary
|
| 211 |
-
**Granite-3.0-2B-Base** is an open-source decoder-only language model from IBM Research that supports a variety of text-to-text generation tasks (e.g., question-answering, text-completion). **Granite-3.0-2B-Base** is trained from scratch and follows a two-phase training strategy. In the first phase, it is trained on 10 trillion tokens sourced from diverse domains
|
| 212 |
|
| 213 |
- **Developers:** IBM Research
|
| 214 |
- **GitHub Repository:** [ibm-granite/granite-language-models](https://github.com/ibm-granite/granite-language-models)
|
|
@@ -280,7 +280,9 @@ print(output)
|
|
| 280 |
|
| 281 |
<!-- TO DO: To be completed once the paper is ready -->
|
| 282 |
## Training Data
|
| 283 |
-
This model is trained on a mix of open-source and proprietary
|
|
|
|
|
|
|
| 284 |
|
| 285 |
## Infrastructure
|
| 286 |
We train the Granite Language models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs.
|
|
|
|
| 208 |
# Granite-3.0-2B-Base
|
| 209 |
|
| 210 |
## Model Summary
|
| 211 |
+
**Granite-3.0-2B-Base** is an open-source decoder-only language model from IBM Research that supports a variety of text-to-text generation tasks (e.g., question-answering, text-completion). **Granite-3.0-2B-Base** is trained from scratch and follows a two-phase training strategy. In the first phase, it is trained on 10 trillion tokens sourced from diverse domains. During the second phase, it is further trained on 2 trillion tokens using a carefully curated mix of high-quality data, aiming to enhance its performance on specific tasks.
|
| 212 |
|
| 213 |
- **Developers:** IBM Research
|
| 214 |
- **GitHub Repository:** [ibm-granite/granite-language-models](https://github.com/ibm-granite/granite-language-models)
|
|
|
|
| 280 |
|
| 281 |
<!-- TO DO: To be completed once the paper is ready -->
|
| 282 |
## Training Data
|
| 283 |
+
This model is trained on a mix of open-source and proprietary data following a two-phase training strategy.
|
| 284 |
+
* Phase 1 data: The data for phase 1 is sourced from diverse domains, such as: web, code, academic sources, books, and math data.
|
| 285 |
+
* Phase 2 data: The data for phase 2 comprises a curated mix of high-quality data from the same domains, plus multilingual and instruction data. The goal of this second training phase is to enhance the model’s performance on specific tasks.
|
| 286 |
|
| 287 |
## Infrastructure
|
| 288 |
We train the Granite Language models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs.
|