--- language: en license: mit library_name: transformers tags: - climate-change - domain-adaptation - masked-language-modeling - scientific-nlp - transformer - BERT - ClimateBERT metrics: - f1 model-index: - name: SciClimateBERT results: - task: type: text-classification name: Climate NLP Tasks (ClimaBench) dataset: name: ClimaBench type: benchmark metrics: - type: f1 name: Macro F1 (avg) value: 57.829 --- # SciClimateBERT πŸŒŽπŸ”¬ **SciClimateBERT** is a domain-adapted version of [**ClimateBERT**](https://huggingface.co/climatebert/distilroberta-base-climate-f), further pretrained on peer-reviewed scientific papers focused on climate change. While ClimateBERT is tuned for general climate-related text, SciClimateBERT narrows the focus to high-quality academic content, improving performance in scientific NLP applications. ## πŸ” Overview - **Base Model**: ClimateBERT (RoBERTa-based architecture) - **Pretraining Method**: Continued pretraining (domain adaptation) with Masked Language Modeling (MLM) - **Corpus**: Scientific climate change literature from top-tier journals - **Tokenizer**: ClimateBERT tokenizer (unchanged) - **Language**: English - **Domain**: Scientific climate change research ## πŸ“Š Performance Evaluated on **ClimaBench**, a benchmark suite for climate-focused NLP tasks: | Metric | Value | |----------------|--------------| | Macro F1 (avg) | 57.83| | Tasks won | 0/7 | | Avg. Std Dev | 0.01747| While based on ClimateBERT, this model focuses on structured scientific input, making it ideal for downstream applications in climate science and research automation. Climate performance model card: |SciClimateBERT|| |---------------------------------|-----------------------------| | 1. Model publicly available? | Yes | | 2. Time to train final model |300h | | 3. Time for all experiments | 1,226h ~ 51 days | | 4. Power of GPU and CPU | 0.250 kW + 0.013 kW | | 5. Location for computations | Croatia | | 6. Energy mix at location | 224.71 gCO2eq/kWh | | 7. CO$_2$eq for final model | 18 kg CO2 | | 8. CO$_2$eq for all experiments | 74 kg CO2 | ## πŸ§ͺ Intended Uses **Use for:** - Scientific climate change text classification and extraction - Knowledge base and graph construction in climate policy and research domains **Not suitable for:** - Non-scientific general-purpose text - Multilingual applications Example: ``` python from transformers import AutoTokenizer, AutoModelForMaskedLM, pipeline import torch # Load the pretrained model and tokenizer model_name = "P0L3/clirebert_clirevocab_uncased" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForMaskedLM.from_pretrained(model_name) # Move model to GPU if available device = 0 if torch.cuda.is_available() else -1 # Create a fill-mask pipeline fill_mask = pipeline("fill-mask", model=model, tokenizer=tokenizer, device=device) # Example input from scientific climate literature text = "The increase in greenhouse gas emissions has significantly affected the balance of the Earth." # Run prediction predictions = fill_mask(text) # Show top predictions print(text) print(10*">") for p in predictions: print(f"{p['sequence']} β€” {p['score']:.4f}") ``` Output: ``` shell The increase in greenhouse gas emissions has significantly affected the balance of the Earth. >>>>>>>>>> The increase in greenhouse gas ... affected the energy balance of the Earth. β€” 0.7897 The increase in greenhouse gas ... affected the radiation balance of the Earth. β€” 0.0522 The increase in greenhouse gas ... affected the mass balance of the Earth. β€” 0.0401 The increase in greenhouse gas ... affected the water balance of the Earth. β€” 0.0359 The increase in greenhouse gas ... affected the carbon balance of the Earth. β€” 0.0190 ``` ## ⚠️ Limitations - May reflect scientific publication biases ## 🧾 Citation If you use this model, please cite: ```bibtex ο»Ώ@Article{PoleksiΔ‡2025, author={Poleksi{\'{c}}, Andrija and Martin{\v{c}}i{\'{c}}-Ip{\v{s}}i{\'{c}}, Sanda}, title={Pretraining and evaluation of BERT models for climate research}, journal={Discover Applied Sciences}, year={2025}, month={Oct}, day={24}, volume={7}, number={11}, pages={1278}, issn={3004-9261}, doi={10.1007/s42452-025-07740-5}, url={https://doi.org/10.1007/s42452-025-07740-5} }