P0L3 commited on
Commit
73c0b68
·
verified ·
1 Parent(s): 1d91e39

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +80 -0
README.md ADDED
@@ -0,0 +1,80 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: mit
4
+ library_name: transformers
5
+ tags:
6
+ - climate-change
7
+ - domain-adaptation
8
+ - masked-language-modeling
9
+ - scientific-nlp
10
+ - transformer
11
+ - BERT
12
+ - ClimateBERT
13
+ metrics:
14
+ - f1
15
+ model-index:
16
+ - name: SciClimateBERT
17
+ results:
18
+ - task:
19
+ type: text-classification
20
+ name: Climate NLP Tasks (ClimaBench)
21
+ dataset:
22
+ name: ClimaBench
23
+ type: benchmark
24
+ metrics:
25
+ - type: f1
26
+ name: Macro F1 (avg)
27
+ value: 57.83
28
+ ---
29
+
30
+ # SciClimateBERT 🌎🔬
31
+
32
+ **SciClimateBERT** is a domain-adapted version of **ClimateBERT**, further pretrained on peer-reviewed scientific papers focused on climate change. While ClimateBERT is tuned for general climate-related text, SciClimateBERT narrows the focus to high-quality academic content, improving performance in scientific NLP applications.
33
+
34
+ ## 🔍 Overview
35
+
36
+ - **Base Model**: ClimateBERT (RoBERTa-based architecture)
37
+ - **Pretraining Method**: Continued pretraining (domain adaptation) with Masked Language Modeling (MLM)
38
+ - **Corpus**: Scientific climate change literature from top-tier journals
39
+ - **Tokenizer**: ClimateBERT tokenizer (unchanged)
40
+ - **Language**: English
41
+ - **Domain**: Scientific climate change research
42
+
43
+ ## 📊 Performance
44
+
45
+ Evaluated on **ClimaBench**, a benchmark suite for climate-focused NLP tasks:
46
+
47
+ | Metric | Value |
48
+ |----------------|--------------|
49
+ | Macro F1 (avg) | 57.83|
50
+ | Tasks won | 0/7 |
51
+ | Avg. Std Dev | 0.01747|
52
+
53
+ While based on ClimateBERT, this model focuses on structured scientific input, making it ideal for downstream applications in climate science and research automation.
54
+
55
+ ## 🧪 Intended Uses
56
+
57
+ **Use for:**
58
+ - Scientific climate change text classification and extraction
59
+ - NLP-powered climate science discovery tools
60
+ - Knowledge base and graph construction in climate policy and research domains
61
+
62
+ **Not suitable for:**
63
+ - Non-scientific general-purpose text
64
+ - Multilingual applications
65
+
66
+ ## ⚠️ Limitations
67
+ - May reflect scientific publication biases
68
+
69
+ ## 🧾 Citation
70
+
71
+ If you use this model, please cite:
72
+
73
+ ```bibtex
74
+ @article{poleksic_etal_2025,
75
+ title={Climate Research Domain BERTs: Pretraining, Adaptation, and Evaluation},
76
+ author={Poleksić, Andrija and
77
+ Martinčić-Ipšić, Sanda},
78
+ journal={None},
79
+ year={2025}
80
+ }