|
|
--- |
|
|
language: en |
|
|
license: mit |
|
|
library_name: transformers |
|
|
tags: |
|
|
- climate-change |
|
|
- domain-adaptation |
|
|
- masked-language-modeling |
|
|
- scientific-nlp |
|
|
- transformer |
|
|
- BERT |
|
|
- ClimateBERT |
|
|
metrics: |
|
|
- f1 |
|
|
model-index: |
|
|
- name: SciClimateBERT |
|
|
results: |
|
|
- task: |
|
|
type: text-classification |
|
|
name: Climate NLP Tasks (ClimaBench) |
|
|
dataset: |
|
|
name: ClimaBench |
|
|
type: benchmark |
|
|
metrics: |
|
|
- type: f1 |
|
|
name: Macro F1 (avg) |
|
|
value: 57.829 |
|
|
--- |
|
|
|
|
|
# SciClimateBERT 🌎🔬 |
|
|
|
|
|
**SciClimateBERT** is a domain-adapted version of [**ClimateBERT**](https://huggingface.co/climatebert/distilroberta-base-climate-f), further pretrained on peer-reviewed scientific papers focused on climate change. While ClimateBERT is tuned for general climate-related text, SciClimateBERT narrows the focus to high-quality academic content, improving performance in scientific NLP applications. |
|
|
|
|
|
## 🔍 Overview |
|
|
|
|
|
- **Base Model**: ClimateBERT (RoBERTa-based architecture) |
|
|
- **Pretraining Method**: Continued pretraining (domain adaptation) with Masked Language Modeling (MLM) |
|
|
- **Corpus**: Scientific climate change literature from top-tier journals |
|
|
- **Tokenizer**: ClimateBERT tokenizer (unchanged) |
|
|
- **Language**: English |
|
|
- **Domain**: Scientific climate change research |
|
|
|
|
|
## 📊 Performance |
|
|
|
|
|
Evaluated on **ClimaBench**, a benchmark suite for climate-focused NLP tasks: |
|
|
|
|
|
| Metric | Value | |
|
|
|----------------|--------------| |
|
|
| Macro F1 (avg) | 57.83| |
|
|
| Tasks won | 0/7 | |
|
|
| Avg. Std Dev | 0.01747| |
|
|
|
|
|
While based on ClimateBERT, this model focuses on structured scientific input, making it ideal for downstream applications in climate science and research automation. |
|
|
|
|
|
Climate performance model card: |
|
|
|SciClimateBERT|| |
|
|
|---------------------------------|-----------------------------| |
|
|
| 1. Model publicly available? | Yes | |
|
|
| 2. Time to train final model |300h | |
|
|
| 3. Time for all experiments | 1,226h ~ 51 days | |
|
|
| 4. Power of GPU and CPU | 0.250 kW + 0.013 kW | |
|
|
| 5. Location for computations | Croatia | |
|
|
| 6. Energy mix at location | 224.71 gCO<sub>2</sub>eq/kWh | |
|
|
| 7. CO$_2$eq for final model | 18 kg CO<sub>2</sub> | |
|
|
| 8. CO$_2$eq for all experiments | 74 kg CO<sub>2</sub> | |
|
|
|
|
|
## 🧪 Intended Uses |
|
|
|
|
|
**Use for:** |
|
|
- Scientific climate change text classification and extraction |
|
|
- Knowledge base and graph construction in climate policy and research domains |
|
|
|
|
|
**Not suitable for:** |
|
|
- Non-scientific general-purpose text |
|
|
- Multilingual applications |
|
|
|
|
|
Example: |
|
|
``` python |
|
|
from transformers import AutoTokenizer, AutoModelForMaskedLM, pipeline |
|
|
import torch |
|
|
|
|
|
# Load the pretrained model and tokenizer |
|
|
model_name = "P0L3/clirebert_clirevocab_uncased" |
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
model = AutoModelForMaskedLM.from_pretrained(model_name) |
|
|
|
|
|
# Move model to GPU if available |
|
|
device = 0 if torch.cuda.is_available() else -1 |
|
|
|
|
|
# Create a fill-mask pipeline |
|
|
fill_mask = pipeline("fill-mask", model=model, tokenizer=tokenizer, device=device) |
|
|
|
|
|
# Example input from scientific climate literature |
|
|
text = "The increase in greenhouse gas emissions has significantly affected the <mask> balance of the Earth." |
|
|
|
|
|
# Run prediction |
|
|
predictions = fill_mask(text) |
|
|
|
|
|
# Show top predictions |
|
|
print(text) |
|
|
print(10*">") |
|
|
for p in predictions: |
|
|
print(f"{p['sequence']} — {p['score']:.4f}") |
|
|
``` |
|
|
Output: |
|
|
``` shell |
|
|
The increase in greenhouse gas emissions has significantly affected the <mask> balance of the Earth. |
|
|
>>>>>>>>>> |
|
|
The increase in greenhouse gas ... affected the energy balance of the Earth. — 0.7897 |
|
|
The increase in greenhouse gas ... affected the radiation balance of the Earth. — 0.0522 |
|
|
The increase in greenhouse gas ... affected the mass balance of the Earth. — 0.0401 |
|
|
The increase in greenhouse gas ... affected the water balance of the Earth. — 0.0359 |
|
|
The increase in greenhouse gas ... affected the carbon balance of the Earth. — 0.0190 |
|
|
``` |
|
|
|
|
|
## ⚠️ Limitations |
|
|
- May reflect scientific publication biases |
|
|
|
|
|
## 🧾 Citation |
|
|
|
|
|
If you use this model, please cite: |
|
|
|
|
|
```bibtex |
|
|
@Article{Poleksić2025, |
|
|
author={Poleksi{\'{c}}, Andrija |
|
|
and Martin{\v{c}}i{\'{c}}-Ip{\v{s}}i{\'{c}}, Sanda}, |
|
|
title={Pretraining and evaluation of BERT models for climate research}, |
|
|
journal={Discover Applied Sciences}, |
|
|
year={2025}, |
|
|
month={Oct}, |
|
|
day={24}, |
|
|
volume={7}, |
|
|
number={11}, |
|
|
pages={1278}, |
|
|
issn={3004-9261}, |
|
|
doi={10.1007/s42452-025-07740-5}, |
|
|
url={https://doi.org/10.1007/s42452-025-07740-5} |
|
|
} |
|
|
|
|
|
|
|
|
|