---
language: en
license: mit
library_name: transformers
tags:
- climate-change
- domain-adaptation
- masked-language-modeling
- scientific-nlp
- transformer
- BERT
- ClimateBERT
metrics:
- f1
model-index:
- name: SciClimateBERT
results:
- task:
type: text-classification
name: Climate NLP Tasks (ClimaBench)
dataset:
name: ClimaBench
type: benchmark
metrics:
- type: f1
name: Macro F1 (avg)
value: 57.829
---
# SciClimateBERT ππ¬
**SciClimateBERT** is a domain-adapted version of [**ClimateBERT**](https://huggingface.co/climatebert/distilroberta-base-climate-f), further pretrained on peer-reviewed scientific papers focused on climate change. While ClimateBERT is tuned for general climate-related text, SciClimateBERT narrows the focus to high-quality academic content, improving performance in scientific NLP applications.
## π Overview
- **Base Model**: ClimateBERT (RoBERTa-based architecture)
- **Pretraining Method**: Continued pretraining (domain adaptation) with Masked Language Modeling (MLM)
- **Corpus**: Scientific climate change literature from top-tier journals
- **Tokenizer**: ClimateBERT tokenizer (unchanged)
- **Language**: English
- **Domain**: Scientific climate change research
## π Performance
Evaluated on **ClimaBench**, a benchmark suite for climate-focused NLP tasks:
| Metric | Value |
|----------------|--------------|
| Macro F1 (avg) | 57.83|
| Tasks won | 0/7 |
| Avg. Std Dev | 0.01747|
While based on ClimateBERT, this model focuses on structured scientific input, making it ideal for downstream applications in climate science and research automation.
Climate performance model card:
|SciClimateBERT||
|---------------------------------|-----------------------------|
| 1. Model publicly available? | Yes |
| 2. Time to train final model |300h |
| 3. Time for all experiments | 1,226h ~ 51 days |
| 4. Power of GPU and CPU | 0.250 kW + 0.013 kW |
| 5. Location for computations | Croatia |
| 6. Energy mix at location | 224.71 gCO2eq/kWh |
| 7. CO$_2$eq for final model | 18 kg CO2 |
| 8. CO$_2$eq for all experiments | 74 kg CO2 |
## π§ͺ Intended Uses
**Use for:**
- Scientific climate change text classification and extraction
- Knowledge base and graph construction in climate policy and research domains
**Not suitable for:**
- Non-scientific general-purpose text
- Multilingual applications
Example:
``` python
from transformers import AutoTokenizer, AutoModelForMaskedLM, pipeline
import torch
# Load the pretrained model and tokenizer
model_name = "P0L3/clirebert_clirevocab_uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForMaskedLM.from_pretrained(model_name)
# Move model to GPU if available
device = 0 if torch.cuda.is_available() else -1
# Create a fill-mask pipeline
fill_mask = pipeline("fill-mask", model=model, tokenizer=tokenizer, device=device)
# Example input from scientific climate literature
text = "The increase in greenhouse gas emissions has significantly affected the balance of the Earth."
# Run prediction
predictions = fill_mask(text)
# Show top predictions
print(text)
print(10*">")
for p in predictions:
print(f"{p['sequence']} β {p['score']:.4f}")
```
Output:
``` shell
The increase in greenhouse gas emissions has significantly affected the balance of the Earth.
>>>>>>>>>>
The increase in greenhouse gas ... affected the energy balance of the Earth. β 0.7897
The increase in greenhouse gas ... affected the radiation balance of the Earth. β 0.0522
The increase in greenhouse gas ... affected the mass balance of the Earth. β 0.0401
The increase in greenhouse gas ... affected the water balance of the Earth. β 0.0359
The increase in greenhouse gas ... affected the carbon balance of the Earth. β 0.0190
```
## β οΈ Limitations
- May reflect scientific publication biases
## π§Ύ Citation
If you use this model, please cite:
```bibtex
ο»Ώ@Article{PoleksiΔ2025,
author={Poleksi{\'{c}}, Andrija
and Martin{\v{c}}i{\'{c}}-Ip{\v{s}}i{\'{c}}, Sanda},
title={Pretraining and evaluation of BERT models for climate research},
journal={Discover Applied Sciences},
year={2025},
month={Oct},
day={24},
volume={7},
number={11},
pages={1278},
issn={3004-9261},
doi={10.1007/s42452-025-07740-5},
url={https://doi.org/10.1007/s42452-025-07740-5}
}