placeholder files (v1.0)

Browse files

Files changed (12) hide show

.gitattributes +6 -0
README.md +332 -3
aifs-single-mse-1.0.ckpt +3 -0
assets/aifs_diagram.png +3 -0
assets/decoder_graph.jpeg +3 -0
assets/encoder_graph.jpeg +3 -0
assets/radiation_cloudcover.gif +3 -0
assets/scorecard_single1.0_vs_ifs_2024.png +3 -0
assets/scorecard_single1.0_vs_single0.2.1_2023.png +3 -0
config_finetuning.yaml +483 -0
config_pretraining.yaml +492 -0
run_AIFS_v1.ipynb +0 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,9 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+assets/aifs_diagram.png filter=lfs diff=lfs merge=lfs -text
+assets/decoder_graph.jpeg filter=lfs diff=lfs merge=lfs -text
+assets/encoder_graph.jpeg filter=lfs diff=lfs merge=lfs -text
+assets/radiation_cloudcover.gif filter=lfs diff=lfs merge=lfs -text
+assets/scorecard_single1.0_vs_ifs_2024.png filter=lfs diff=lfs merge=lfs -text
+assets/scorecard_single1.0_vs_single0.2.1_2023.png filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -1,3 +1,332 @@
----
-license: cc-by-4.0
----

+---
+license: cc-by-4.0
+metrics:
+- mse
+pipeline_tag: graph-ml
+language:
+- en
+library_name: anemoi
+---
+# AIFS Single - v1.0
+<!-- Provide a quick summary of what the model is/does. -->
+Here, we introduce the **Artificial Intelligence Forecasting System (AIFS)**, a data driven forecast
+model developed by the European Centre for Medium-Range Weather Forecasts (ECMWF).
+The release of AIFS Single v1.0 marks the first operationally supported AIFS model. Version 1
+supersedes the existing experimental version, [0.2.1 AIFS-single](https://huggingface.co/ecmwf/aifs-single-0.2.1).
+The new version, 1.0, brings changes to the AIFS Single model, including among many others:
+- Improved performance for upper-level atmospheric variables (AIFS Single still uses 13 pressure-levels, so this improvement mainly refers to 50 and 100 hPa)
+- Improved skill for total precipitation.
+- Additional output variables, including 100 meter winds, snow-fall, surface solar-radiation and land variables such as soil-moisture and soil-temperature.
+<div style="display: flex; justify-content: center;">
+  <img src="assets/radiation_cloudcover.gif" alt="AIFS 10 days Forecast" style="width: 50%;"/>
+</div>
+AIFS produces highly skilled forecasts for upper-air variables, surface weather parameters and
+tropical cyclone tracks. AIFS Single is run four times daily alongside ECMWF’s physics-based NWP model and forecasts
+are available to the public under ECMWF’s open data policy (https://www.ecmwf.int/en/forecasts/datasets/open-data).
+Note that due to the non-determinism of GPUs, users will be unable to exactly reproduce an official AIFS forecast
+when running AIFS Single themselves.
+For more details please refer to https://confluence.ecmwf.int/display/FCST/Implementation+of+AIFS+Single+v1
+## Data Details
+### Data parameters
+#### New parameters
+More detailed information about the new parameters introduced with AIFS Single v1.0 is provided in the table below.
+| Short Name | Name | Units | Component Type | Lev.Type |
+|:----------:|:----:|:-----:|:--------------:|:--------:|
+| ssrd  | Surface short-wave (solar) radiation downwards | \\(J m^{-2}\\) | AIFS | sfc |
+| strd  | Surface long-wave (thermal) radiation downwards | \\(J m^{-2}\\) | AIFS | sfc |
+| lcc   | Low cloud cover | \\((0 - 1)\\) | AIFS | sfc |
+| mcc   | Medium cloud cover | \\((0 - 1)\\) | AIFS | sfc |
+| hcc   | High cloud cover | \\((0 - 1)\\) | AIFS | sfc |
+| sf    | Snowfall water equivalent | \\(kg m^{-2}\\) | AIFS | sfc |
+| tcc   | Total cloud cover | \\((0 - 1)\\) | AIFS | sfc |
+| 100u  | 100 metre U wind component | \\(m s^{-1}\\) | AIFS | sfc |
+| 100v  | 100 metre V wind component | \\(m s^{-1}\\) | AIFS | sfc |
+| rowe  | Runoff water equivalent (surface plus subsurface) | \\(kg m^{-2}\\) | AIFS | sfc |
+| vsw   | Volumetric soil moisture | \\(m^3 m^{-3}\\) | AIFS | sol |
+| sot   | Soil temperature | \\(K\\) | AIFS | sol |
+#### Changes to existing parameters
+There are no changes to existing parameters already introduced with AIFS Single v0.2.1.
+**Note**
+Regarding precipitation units, it's worth noting that AIFS model was trained on \\(m^{3}/m^{2}\\) and will therefore produce precip in that units.
+If one wants to retrieve precipitation from Open data, the units will be \\(mm\\).
+#### Discontinued parameters
+No parameters have been discontinued with regards to the previous version of AIFS Single v0.2.1.
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+AIFS is based on a graph neural network (GNN) encoder and decoder, and a sliding window transformer processor,
+and is trained on ECMWF’s ERA5 re-analysis and ECMWF’s operational numerical weather prediction (NWP) analyses.
+<div style="display: flex; justify-content: center;">
+  <img src="assets/encoder_graph.jpeg" alt="Encoder graph" style="width: 50%;"/>
+  <img src="assets/decoder_graph.jpeg" alt="Decoder graph" style="width: 50%;"/>
+</div>
+It has a flexible and modular design and supports several levels of parallelism to enable training on
+high resolution input data. AIFS forecast skill is assessed by comparing its forecasts to NWP analyses
+and direct observational data.
+- **Developed by:** ECMWF
+- **Model type:** Encoder-processor-decoder model
+- **License:** These model weights are published under a Creative Commons Attribution 4.0 International (CC BY 4.0).
+To view a copy of this licence, visit https://creativecommons.org/licenses/by/4.0/.
+The notebooks and other script files are published under an Apache 2.0 licence, to view a copy of this license, visit https://www.apache.org/licenses/LICENSE-2.0.txt.
+### Model resolution
+There are no changes in resolution compared to previous version AIFS Single v0.2.1.
+| | Component | Horizontal Resolution [kms] | Vertical Resolution [levels] |
+|---|:---:|:---:|:---:|
+| Atmosphere | AIFS-single v1.0 | ~ 31 |  13 |
+### Model Sources
+<!-- Provide the basic links for the model. -->
+- **Repository:** [Anemoi](https://anemoi.readthedocs.io/en/latest/index.html) is an open-source framework for
+  creating machine learning (ML) weather forecasting systems, which ECMWF and a range of national meteorological
+  services across Europe have co-developed.
+- **Paper:** https://arxiv.org/pdf/2406.01465
+## How to Get Started with the Model
+To generate a new forecast using AIFS, you can use [anemoi-inference](https://github.com/ecmwf/anemoi-inference). In the [following notebook](run_AIFS_v1.ipynb), a
+step-by-step workflow is specified to run the AIFS using the HuggingFace model:
+1. **Install Required Packages and Imports**
+2. **Retrieve Initial Conditions from ECMWF Open Data**
+  - Select a date
+  - Get the data from the [ECMWF Open Data API](https://www.ecmwf.int/en/forecasts/datasets/open-data)
+  - Get input fields
+  - Add the single levels fields and pressure levels fields
+  - Convert geopotential height into geopotential
+  - Create the initial state
+3. **Load the Model and Run the Forecast**
+  - Download the Model's Checkpoint from Hugging Face
+  - Create a runner
+  - Run the forecast using anemoi-inference
+4. **Inspect the generated forecast**
+  - Plot a field
+🚨  **Note** we train AIFS using `flash_attention` (https://github.com/Dao-AILab/flash-attention).
+The use of 'Flash Attention' package also imposes certain requirements in terms of software and hardware. Those can be found under #Installation and Features in https://github.com/Dao-AILab/flash-attention
+🚨 **Note** the `aifs_single_v1.0.ckpt` checkpoint just contains the model’s weights.
+That file does not contain any information about the optimizer states, lr-scheduler states, etc.
+## How to train AIFS Single v1.0
+To train this model you can use the configuration files included in this repository and the following Anemoi packages:
+```
+anemoi-training==0.3.1
+anemoi-models==0.4.0
+anemoi-graphs==0.4.4
+```
+and run the pretraining stage as follows,
+```
+export DATASETS_PATH=???????
+export OUTPUT_PATH=???????
+anemoi-training train --config-name=config_pretraining.yaml
+```
+Now, you can fine-tune your model for rollout using the `run_id` of your previous run,
+Note - this run_id refers to the run_id and you can find it looking at the checkpoint folder path.
+For more details, please refer to https://anemoi.readthedocs.io/projects/training/en/latest/user-guide/training.html#restarting-a-training-run
+```
+export PRETRAINING_RUN_ID=???????
+anemoi-training train --config-name=config_finetuning.yaml
+```
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+AIFS is trained to produce 6-hour forecasts. It receives as input a representation of the atmospheric states
+at \\(t_{−6h}\\), \\(t_{0}\\), and then forecasts the state at time \\(t_{+6h}\\).
+<div style="display: flex; justify-content: center;">
+  <img src="assets/aifs_diagram.png" alt="AIFS 2m Temperature" style="width: 80%;"/>
+</div>
+The full list of input and output fields is shown below:
+| Field                                                                                                                                                       | Level type                                                                   | Input/Output |
+|-------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------|--------------|
+| Geopotential, horizontal and vertical wind components, specific humidity, temperature                                                                       | Pressure level: 50,100, 150, 200, 250,300, 400, 500, 600,700, 850, 925, 1000 | Both         |
+| Surface pressure, mean sea-level pressure, skin temperature, 2 m temperature, 2 m dewpoint temperature, 10 m horizontal wind components, total column water | Surface                                                                      | Both         |
+| Soil moisture and soil temperature (layers 1 & 2) | Surface | Both |
+| 100m horizontal wind components, solar radiation (Surface short-wave (solar) radiation downwards and Surface long-wave (thermal) radiation downwards), cloud variables (tcc, hcc, mcc, lcc), runoff and snow fall  | Surface | Output |
+| Total precipitation, convective precipitation                                                                                                               | Surface                                                                      | Output       |
+| Land-sea mask, orography, standard deviation of sub-grid orography, slope of sub-scale orography, insolation, latitude/longitude, time of day/day of year   | Surface                                                                      | Input        |
+Input and output states are normalised to unit variance and zero mean for each level. Some of
+the forcing variables, like orography, are min-max normalised.
+### Training Procedure
+Based on the different experiments we have made - the final training recipe for AIFS Single v1.0 has deviated slightly
+from the one used for AIFS Single v0.2.1 since we found that we could get a well trained model by skipping the ERA5
+rollout and directly doing the rollout on the operational-analysis (extended) dataset. When we say 'extended' we refer
+to the fact that for AIFS Single v0.2.1 we used just operational-analysis data from 2019 to 2021, while in this new
+release we have done the fine-tunning from 2016 to 2022.
+The other important change in the fine-tuning stage is that for AIFS Single v0.2.1 after the 6hr model training the
+optimiser was not restarted (ie. rollout was done with the minimal lr of \\(3 × 10^{-7}\\)). For this release we have seen
+that restarting the optimiser for the rollout improves the model's performance. For the operational-fine tuning rollout
+stage, the learning rate cycle is restarted, gradually decreasing to the minimum value at the end of rollout.
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+- **Pre-training**: It was performed on ERA5 for the years 1979 to 2022 with a cosine learning rate (LR) schedule and a
+total of 260,000 steps. The LR is increased from 0 to \\(10^{-4}\\) during the first 1000 steps, then it is annealed to a
+minimum of \\(3 × 10^{-7}\\). The local learning rate used for this stage is \\(3.125 × 10^{-5}\\).
+- **Fine-tuning**: The pre-training is then followed by rollout on operational real-time IFS NWP analyses for the years
+2016 to 2022, this time with a local learning rate of \\(8 × 10^{−7}\\), which is decreased to \\(3 × 10^{−7}\\). Rollout steps
+increase per epoch. In this second stage the warm up period of the optimiser is 100 steps to account for shorter length
+of this stage. Optimizer step are equal to 7900 ( 12 epoch with ~630 steps per epoch).
+As in the previous version of aifs-single for fine-tuning and initialisation of the model during inference, IFS fields
+are interpolated from their native O1280 resolution (approximately \\(0.1°\\)) down to N320 (approximately \\(0.25°\\)).
+#### Training Hyperparameters
+- **Optimizer:** We use *AdamW* (Loshchilov and Hutter [2019]) with the \\(β\\)-coefficients set to 0.9 and 0.95.
+- **Loss function:** The loss function is an area-weighted mean squared error (MSE) between the target atmospheric state
+and prediction.
+- **Loss scaling:** A loss scaling is applied for each output variable. The scaling was chosen empirically such that
+all prognostic variables have roughly equal contributions to the loss, with the exception of the vertical velocities,
+for which the weight was reduced. The loss weights also decrease linearly with height, which means that levels in
+the upper atmosphere (e.g., 50 hPa) contribute relatively little to the total loss value.
+#### Speeds, Sizes, Times
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+Data parallelism is used for training, with a batch size of 16. One model instance is split across four 40GB A100
+GPUs within one node. Training is done using mixed precision (Micikevicius et al. [2018]), and the entire process
+takes about one week, with 64 GPUs in total. The checkpoint size is 1.19 GB and as mentioned above, it does not include the optimizer
+state.
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+AIFS is evaluated against ECMWF IFS (Integrated Forecast System) for 2022. The results of such evaluation are summarized in
+the scorecard below that compares different forecast skill measures across a range of
+variables. For verification, each system is compared against the operational ECMWF analysis from which the forecasts
+are initialised. In addition, the forecasts are compared against radiosonde observations of geopotential, temperature
+and windspeed, and SYNOP observations of 2 m temperature, 10 m wind and 24 h total precipitation. The definition
+of the metrics, such as ACC (ccaf), RMSE (rmsef) and forecast activity (standard deviation of forecast anomaly,
+sdaf) can be found in e.g Ben Bouallegue et al. ` [2024].
+### AIFS Single v1.0 vs AIFS Single v0.2.1 (2023)
+<div style="display: flex; justify-content: center;">
+  <img src="assets/scorecard_single1.0_vs_single0.2.1_2023.png" alt="Scorecard comparing forecast scores of AIFS versus IFS (2022)" style="width: 80%;"/>
+</div>
+### AIFS Single v1.0 vs IFS (2024)
+<div style="display: flex; justify-content: center;">
+  <img src="assets/scorecard_single1.0_vs_ifs_2024.png" alt="Scorecard comparing forecast scores of AIFS versus IFS (2022)" style="width: 80%;"/>
+</div>
+Forecasts are initialised on 00 and 12 UTC. The scorecard show relative score changes as function of lead time (day 1 to 10) for northern extra-tropics (n.hem),
+southern extra-tropics (s.hem), tropics and Europe. Blue colours mark score improvements and red colours score
+degradations. Purple colours indicate an increased in standard deviation of forecast anomaly, while green colours
+indicate a reduction. Framed rectangles indicate 95% significance level. Variables are geopotential (z), temperature
+(t), wind speed (ff), mean sea level pressure (msl), 2 m temperature (2t), 10 m wind speed (10ff) and 24 hr total
+precipitation (tp). Numbers behind variable abbreviations indicate variables on pressure levels (e.g., 500 hPa), and
+suffix indicates verification against IFS NWP analyses (an) or radiosonde and SYNOP observations (ob). Scores
+shown are anomaly correlation (ccaf), SEEPS (seeps, for precipitation), RMSE (rmsef) and standard deviation of
+forecast anomaly (sdaf, see text for more explanation).
+# Known limitations
+- This version of AIFS shares certain limitations with some of the other data-driven weather forecast models that are trained with a weighted MSE loss, such as blurring of the forecast fields at longer lead times.
+- AIFS exhibits reduced forecast skill in the stratosphere, partially due to a low model top.
+- AIFS currently provides reduced intensity of some high-impact systems such as tropical cyclones.
+Please refer to https://confluence.ecmwf.int/display/FCST/Known+AIFS+Forecasting+Issues for further details
+## Technical Specifications
+### Hardware
+<!--  {{ hardware_requirements | default("[More Information Needed]", true)}} -->
+We acknowledge PRACE for awarding us access to Leonardo, CINECA, Italy. In particular, this version of the AIFS has been trained
+on 64 A100 GPUs (40GB).
+### Software
+The model was developed and trained using the [AnemoI framework](https://anemoi-docs.readthedocs.io/en/latest/index.html).
+AnemoI is a framework for developing machine learning weather forecasting models. It comprises of components or packages
+for preparing training datasets, conducting ML model training and a registry for datasets and trained models. AnemoI
+provides tools for operational inference, including interfacing to verification software. As a framework it seeks to
+handle many of the complexities that meteorological organisations will share, allowing them to easily train models from
+existing recipes but with their own data.
+## Citation
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+If you use this model in your work, please cite it as follows:
+**BibTeX:**
+```
+@article{lang2024aifs,
+  title={AIFS-ECMWF's data-driven forecasting system},
+  author={Lang, Simon and Alexe, Mihai and Chantry, Matthew and Dramsch, Jesper and Pinault, Florian and Raoult, Baudouin and Clare, Mariana CA and Lessig, Christian and Maier-Gerber, Michael and Magnusson, Linus and others},
+  journal={arXiv preprint arXiv:2406.01465},
+  year={2024}
+}
+```
+**APA:**
+```
+Lang, S., Alexe, M., Chantry, M., Dramsch, J., Pinault, F., Raoult, B., ... & Rabier, F. (2024). AIFS-ECMWF's data-driven forecasting system. arXiv preprint arXiv:2406.01465.
+```
+## More Information
+[Find the paper here](https://arxiv.org/abs/2406.01465)

aifs-single-mse-1.0.ckpt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1fed399c097c0127d5bbe074f4f8bbc123759736145d990699c215ff07543ccd
+size 994084883

assets/aifs_diagram.png ADDED Viewed

Git LFS Details

SHA256: 8c37f8858399f9fef6918fdb9517f18ffb6bc1ed0a2fc45476b4f775f48bfb0d
Pointer size: 131 Bytes
Size of remote file: 470 kB

assets/decoder_graph.jpeg ADDED Viewed

Git LFS Details

SHA256: a306a3f914ed55b70bb2d8fe89b7070857e963115f1bfa8d58f6d8910b820b3c
Pointer size: 131 Bytes
Size of remote file: 215 kB

assets/encoder_graph.jpeg ADDED Viewed

Git LFS Details

SHA256: fe96aa418e231a623dba209aa048248b6506d8c3106df39ac82e3bef2098ba1f
Pointer size: 131 Bytes
Size of remote file: 168 kB

assets/radiation_cloudcover.gif ADDED Viewed

Git LFS Details

SHA256: f36b9511bd8ede55847ad294b401b86fccef6b0aa6a0264058eb97f3f3488a40
Pointer size: 131 Bytes
Size of remote file: 246 kB

assets/scorecard_single1.0_vs_ifs_2024.png ADDED Viewed

Git LFS Details

SHA256: adfe6cde5b14515238f6d5b183b3e47652d12d54f7733f7f6383d3ed97fab2ef
Pointer size: 131 Bytes
Size of remote file: 271 kB

assets/scorecard_single1.0_vs_single0.2.1_2023.png ADDED Viewed

Git LFS Details

SHA256: 9095c229583e4c1752b99507511c69f64bd41ed1bf35f573f474994f339882c7
Pointer size: 131 Bytes
Size of remote file: 269 kB

config_finetuning.yaml ADDED Viewed

	@@ -0,0 +1,483 @@

+data:
+  format: zarr
+  resolution: n320
+  frequency: 6h
+  timestep: 6h
+  forcing:
+    - cos_latitude
+    - cos_longitude
+    - sin_latitude
+    - sin_longitude
+    - cos_julian_day
+    - cos_local_time
+    - sin_julian_day
+    - sin_local_time
+    - insolation
+    - lsm
+    - sdor
+    - slor
+    - z
+  diagnostic:
+    - tp
+    - cp
+    - sf
+    - tcc
+    - hcc
+    - lcc
+    - mcc
+    - ro
+    - ssrd
+    - strd
+    - 100u
+    - 100v
+  remapped: null
+  normalizer:
+    default: mean-std
+    remap:
+      cp: tp
+      sf: tp
+    std:
+      - tp
+      - cp
+      - sf
+      - ro
+      - tcw
+      - ssrd
+      - q_50
+      - q_100
+      - q_150
+      - q_200
+      - q_250
+      - q_300
+      - q_400
+      - q_500
+      - q_600
+      - q_700
+      - q_850
+      - q_925
+      - q_1000
+    min-max: null
+    max:
+      - sdor
+      - slor
+      - z
+    none:
+      - cos_latitude
+      - cos_longitude
+      - sin_latitude
+      - sin_longitude
+      - cos_julian_day
+      - cos_local_time
+      - sin_julian_day
+      - sin_local_time
+      - insolation
+      - lsm
+      - tcc
+      - mcc
+      - hcc
+      - lcc
+      - swvl1
+      - swvl2
+  imputer:
+    default: none
+  remapper:
+    default: none
+  processors:
+    normalizer:
+      _target_: anemoi.models.preprocessing.normalizer.InputNormalizer
+      _convert_: all
+      config:
+        default: mean-std
+        remap:
+          cp: tp
+          sf: tp
+        std:
+          - tp
+          - cp
+          - sf
+          - ro
+          - tcw
+          - ssrd
+          - q_50
+          - q_100
+          - q_150
+          - q_200
+          - q_250
+          - q_300
+          - q_400
+          - q_500
+          - q_600
+          - q_700
+          - q_850
+          - q_925
+          - q_1000
+        min-max: null
+        max:
+          - sdor
+          - slor
+          - z
+        none:
+          - cos_latitude
+          - cos_longitude
+          - sin_latitude
+          - sin_longitude
+          - cos_julian_day
+          - cos_local_time
+          - sin_julian_day
+          - sin_local_time
+          - insolation
+          - lsm
+          - tcc
+          - mcc
+          - hcc
+          - lcc
+          - swvl1
+          - swvl2
+  num_features: 115
+dataloader:
+  prefetch_factor: 2
+  pin_memory: True
+  read_group_size: 4
+  num_workers:
+    training: 8
+    validation: 8
+    test: 8
+    predict: 8
+  batch_size:
+    training: 1
+    validation: 1
+    test: 4
+    predict: 4
+  limit_batches:
+    training: 1000
+    validation: 10
+    test: 20
+    predict: 20
+  dataset: ${hardware.paths.data}/${hardware.files.dataset}
+  land_dataset: ${hardware.paths.data}/${hardware.files.dataset_land}
+  land_variables: [100u, 100v, swvl1, swvl2, stl1, stl2, tcc, lcc, mcc, hcc, sf, ro, strd, ssrd]
+  training:
+    dataset:
+      - dataset: ${dataloader.dataset}
+        start: null
+        end: 2022
+        frequency: ${data.frequency}
+        drop: []
+      - dataset: ${dataloader.land_dataset}
+        start: null
+        end: 2022
+        frequency: ${data.frequency}
+        select: ${dataloader.land_variables}
+    start: null
+    end: 2022
+    drop: []
+  validation:
+    dataset:
+      - dataset: ${dataloader.dataset}
+        start: 2022
+        end: 2022
+        frequency: ${data.frequency}
+        drop: []
+      - dataset: ${dataloader.land_dataset}
+        start: 2022
+        end: 2022
+        frequency: ${data.frequency}
+        select: ${dataloader.land_variables}
+    start: 2022
+    end: 2022
+    drop: []
+  validation_rollout: 1
+diagnostics:
+  plot:
+    asynchronous: False
+    datashader: True
+    frequency:
+      batch: 750
+      epoch: 10
+    parameters: [tp]
+    sample_idx: 0
+    precip_and_related_fields: [tp, cp]
+    callbacks: []
+    enabled: True
+    scatter: False
+    mode: asyncio
+  callbacks: {}
+  benchmark_profiler:
+    memory:
+      enabled: True
+      steps: 5
+      warmup: 2
+      extra_plots: False
+      trace_rank0_only: False
+    time:
+      enabled: True
+      verbose: False
+    speed:
+      enabled: True
+    system:
+      enabled: True
+    model_summary:
+      enabled: True
+    snapshot:
+      enabled: True
+      steps: 4
+      warmup: 0
+  debug:
+    anomaly_detection: False
+  profiler: False
+  enable_checkpointing: True
+  checkpoint:
+    every_n_minutes:
+      save_frequency: 30
+      num_models_saved: 3
+    every_n_epochs:
+      save_frequency: 1
+      num_models_saved: 3
+    every_n_train_steps:
+      save_frequency: null
+      num_models_saved: 0
+  log:
+    wandb:
+      enabled: False
+    tensorboard:
+      enabled: False
+    mlflow:
+      enabled: False
+    interval: 100
+  enable_progress_bar: True
+  print_memory_summary: False
+hardware:
+  paths:
+    data: ${oc.decode:${oc.env:DATASETS_PATH}}
+    output: ${oc.decode:${oc.env:OUTPUT_DIR}}
+    logs:
+      base: ${hardware.paths.output}/logs
+      wandb: ${hardware.paths.output}/logs/wandb
+      mlflow: ${hardware.paths.output}/logs/mlflow
+      tensorboard: ${hardware.paths.output}/logs/tensorboard
+    checkpoints: ${hardware.paths.output}/checkpoint/
+    plots: ${hardware.paths.output}/plots/
+    profiler: ${hardware.paths.output}/profiler/
+    graph: ${hardware.paths.output}/graphs/
+  files:
+    dataset: aifs-od-an-oper-0001-mars-n320-2016-2023-6h-v6.zarr
+    dataset_land: aifs-od-an-oper-0001-mars-n320-2016-2023-6h-v1-land.zarr
+    graph: graph_enc_proc_dec_n320.pt
+    checkpoint:
+      every_n_epochs: aifs-by_epoch-epoch_{epoch:03d}-val_wmse_{val_wmse:.3e}
+      every_n_train_steps: aifs-by_step-epoch_{epoch:03d}-step_{step:06d}
+      every_n_minutes: aifs-by_time-epoch_{epoch:03d}-step_{step:06d}
+    warm_start: null
+  accelerator: auto
+  num_gpus_per_node: 4
+  num_nodes: 16
+  num_gpus_per_model: 4
+graph:
+  overwrite: True
+  data: data
+  hidden: hidden
+  nodes:
+    data:
+      node_builder:
+        _target_: anemoi.graphs.nodes.ZarrDatasetNodes
+        dataset: ${dataloader.dataset}
+      attributes:
+        area_weight:
+          _target_: anemoi.graphs.nodes.attributes.AreaWeights
+          norm: unit-max
+    hidden:
+      node_builder:
+        _target_: anemoi.graphs.nodes.ReducedGaussianGridNodes
+        grid: o96
+  edges:
+    - source_name: data
+      target_name: hidden
+      edge_builder:
+        _target_: anemoi.graphs.edges.CutOffEdges
+        cutoff_factor: 0.6
+      attributes:
+        edge_length:
+          _target_: anemoi.graphs.edges.attributes.EdgeLength
+          norm: unit-std
+        edge_dirs:
+          _target_: anemoi.graphs.edges.attributes.EdgeDirection
+          norm: unit-std
+    - source_name: hidden
+      target_name: data
+      edge_builder:
+        _target_: anemoi.graphs.edges.KNNEdges
+        num_nearest_neighbours: 3
+      attributes:
+        edge_length:
+          _target_: anemoi.graphs.edges.attributes.EdgeLength
+          norm: unit-std
+        edge_dirs:
+          _target_: anemoi.graphs.edges.attributes.EdgeDirection
+          norm: unit-std
+  attributes:
+    nodes:
+      area_weight:
+        _target_: anemoi.graphs.nodes.attributes.AreaWeights
+        norm: unit-max
+    edges:
+      edge_length:
+        _target_: anemoi.graphs.edges.attributes.EdgeLength
+        norm: unit-std
+      edge_dirs:
+        _target_: anemoi.graphs.edges.attributes.EdgeDirection
+        norm: unit-std
+model:
+  activation: GELU
+  num_channels: 1024
+  model:
+    _target_: anemoi.models.models.encoder_processor_decoder.AnemoiModelEncProcDec
+  processor:
+    _target_: anemoi.models.layers.processor.TransformerProcessor
+    _convert_: all
+    activation: GELU
+    num_layers: 16
+    num_chunks: 2
+    mlp_hidden_ratio: 4
+    num_heads: 16
+    window_size: 1120
+    dropout_p: 0.0
+  encoder:
+    _target_: anemoi.models.layers.mapper.GraphTransformerForwardMapper
+    _convert_: all
+    trainable_size: 8
+    sub_graph_edge_attributes: [edge_length, edge_dirs]
+    activation: GELU
+    num_chunks: 1
+    mlp_hidden_ratio: 4
+    num_heads: 16
+  decoder:
+    _target_: anemoi.models.layers.mapper.GraphTransformerBackwardMapper
+    _convert_: all
+    trainable_size: 8
+    sub_graph_edge_attributes: [edge_length, edge_dirs]
+    activation: GELU
+    num_chunks: 1
+    mlp_hidden_ratio: 4
+    num_heads: 16
+  trainable_parameters:
+    data: 8
+    hidden: 8
+    data2hidden: 8
+    hidden2data: 8
+  attributes:
+    edges: [edge_length, edge_dirs]
+    nodes: []
+  node_loss_weight: area_weight
+  bounding:
+    - _target_: anemoi.models.layers.bounding.ReluBounding
+      variables:
+        - tp
+        - ro
+        - tcw
+        - ssrd
+        - q_50
+        - q_100
+        - q_150
+        - q_200
+        - q_250
+        - q_300
+        - q_400
+        - q_500
+        - q_600
+        - q_700
+        - q_850
+        - q_925
+        - q_1000
+    - _target_: anemoi.models.layers.bounding.HardtanhBounding
+      variables: [tcc, swvl1, swvl2]
+      min_val: 0
+      max_val: 1
+    - _target_: anemoi.models.layers.bounding.FractionBounding
+      variables: [cp, sf]
+      min_val: 0
+      max_val: 1
+      total_var: tp
+    - _target_: anemoi.models.layers.bounding.FractionBounding
+      variables: [lcc, mcc, hcc]
+      min_val: 0
+      max_val: 1
+      total_var: tcc
+training:
+  run_id: null
+  fork_run_id: ${oc.decode:${oc.env:PRETRAINING_RUN_ID}}
+  load_weights_only: True
+  deterministic: False
+  precision: 16-mixed
+  multistep_input: 2
+  accum_grad_batches: 1
+  num_sanity_val_steps: 6
+  gradient_clip:
+    val: 32.0
+    algorithm: value
+  swa:
+    enabled: False
+    lr: 0.0001
+  zero_optimizer: False
+  training_loss:
+    _target_: anemoi.training.losses.mse.WeightedMSELoss
+    scalars:
+      - variable
+      - loss_weights_mask
+    ignore_nans: False
+  loss_gradient_scaling: False
+  validation_metrics:
+    - _target_: anemoi.training.losses.mse.WeightedMSELoss
+      scalars: []
+      ignore_nans: True
+  rollout:
+    start: 1
+    epoch_increment: 1
+    max: 12
+  max_epochs: 13
+  max_steps: 150000
+  lr:
+    rate: 8.0e-7
+    iterations: 7900
+    min: 3.0e-7
+    warmup_t: 100
+  variable_loss_scaling:
+    default: 1
+    pl:
+      q: 0.6
+      t: 6
+      u: 0.8
+      v: 0.5
+      w: 0.001
+      z: 12
+    sfc:
+      sp: 10
+      10u: 0.5
+      10v: 0.5
+      100u: 0.1
+      100v: 0.1
+      2d: 0.5
+      tp: 0.025
+      cp: 0.0025
+      ro: 0.005
+      sf: 0.025
+      tcc: 0.1
+      mcc: 0.1
+      lcc: 0.1
+      hcc: 0.1
+      swvl2: 200
+      swvl1: 100
+      stl2: 10
+      stl1: 1
+      ssrd: 0.05
+      strd: 0.1
+  metrics: [z_500, t_850, u_850, v_850]
+  pressure_level_scaler:
+    _target_: anemoi.training.data.scaling.ReluPressureLevelScaler
+    minimum: 0.2
+    slope: 0.001

config_pretraining.yaml ADDED Viewed

	@@ -0,0 +1,492 @@

+data:
+  format: zarr
+  resolution: n320
+  frequency: 6h
+  timestep: 6h
+  forcing:
+    - cos_latitude
+    - cos_longitude
+    - sin_latitude
+    - sin_longitude
+    - cos_julian_day
+    - cos_local_time
+    - sin_julian_day
+    - sin_local_time
+    - insolation
+    - lsm
+    - sdor
+    - slor
+    - z
+  diagnostic:
+    - tp
+    - cp
+    - sf
+    - tcc
+    - hcc
+    - lcc
+    - mcc
+    - ro
+    - ssrd
+    - strd
+    - 100u
+    - 100v
+  remapped: null
+  normalizer:
+    default: mean-std
+    remap:
+      cp: tp
+      sf: tp
+    std:
+      - tp
+      - cp
+      - sf
+      - ro
+      - tcw
+      - ssrd
+      - q_50
+      - q_100
+      - q_150
+      - q_200
+      - q_250
+      - q_300
+      - q_400
+      - q_500
+      - q_600
+      - q_700
+      - q_850
+      - q_925
+      - q_1000
+    min-max: null
+    max:
+      - sdor
+      - slor
+      - z
+    none:
+      - cos_latitude
+      - cos_longitude
+      - sin_latitude
+      - sin_longitude
+      - cos_julian_day
+      - cos_local_time
+      - sin_julian_day
+      - sin_local_time
+      - insolation
+      - lsm
+      - tcc
+      - mcc
+      - hcc
+      - lcc
+      - swvl1
+      - swvl2
+  imputer:
+    default: none
+  remapper:
+    default: none
+  processors:
+    normalizer:
+      _target_: anemoi.models.preprocessing.normalizer.InputNormalizer
+      _convert_: all
+      config:
+        default: mean-std
+        remap:
+          cp: tp
+          sf: tp
+        std:
+          - tp
+          - cp
+          - sf
+          - ro
+          - tcw
+          - ssrd
+          - q_50
+          - q_100
+          - q_150
+          - q_200
+          - q_250
+          - q_300
+          - q_400
+          - q_500
+          - q_600
+          - q_700
+          - q_850
+          - q_925
+          - q_1000
+        min-max: null
+        max:
+          - sdor
+          - slor
+          - z
+        none:
+          - cos_latitude
+          - cos_longitude
+          - sin_latitude
+          - sin_longitude
+          - cos_julian_day
+          - cos_local_time
+          - sin_julian_day
+          - sin_local_time
+          - insolation
+          - lsm
+          - tcc
+          - mcc
+          - hcc
+          - lcc
+          - swvl1
+          - swvl2
+  num_features: 115
+dataloader:
+  prefetch_factor: 2
+  pin_memory: True
+  read_group_size: 4
+  num_workers:
+    training: 4
+    validation: 4
+    test: 8
+    predict: 8
+  batch_size:
+    training: 1
+    validation: 1
+    test: 4
+    predict: 4
+  limit_batches:
+    training: null
+    validation: 10
+    test: 20
+    predict: 20
+  dataset: ${hardware.paths.data}/${hardware.files.dataset}
+  land_dataset: ${hardware.paths.data}/${hardware.files.dataset_land}
+  land_variables: [100u, 100v, swvl1, swvl2, stl1, stl2, tcc, lcc, mcc, hcc, sf, ro, strd, ssrd]
+  training:
+    dataset:
+      - dataset: ${dataloader.dataset}
+        start: null
+        end: 2022
+        frequency: ${data.frequency}
+        drop: []
+      - dataset: ${dataloader.land_dataset}
+        start: null
+        end: 2022
+        frequency: ${data.frequency}
+        select: ${dataloader.land_variables}
+    start: null
+    end: 2022
+    drop: []
+  validation:
+    dataset:
+      - dataset: ${dataloader.dataset}
+        start: 2022
+        end: 2022
+        frequency: ${data.frequency}
+        drop: []
+      - dataset: ${dataloader.land_dataset}
+        start: 2022
+        end: 2022
+        frequency: ${data.frequency}
+        select: ${dataloader.land_variables}
+    start: 2022
+    end: 2022
+    drop: []
+  validation_rollout: 1
+diagnostics:
+  plot:
+    asynchronous: False
+    datashader: True
+    frequency:
+      batch: 750
+      epoch: 10
+    parameters: [tp]
+    sample_idx: 0
+    callbacks:
+      - _target_: anemoi.training.diagnostics.callbacks.plot.PlotLoss
+        parameter_groups:
+          moisture: [tp, cp, tcw]
+          sfc_wind: [10u, 10v]
+      - _target_: anemoi.training.diagnostics.callbacks.plot.PlotSample
+        sample_idx: 0
+        per_sample: 6
+        parameters: [tp]
+        accumulation_levels_plot: [0, 0.05, 0.1, 0.25, 0.5, 1, 1.5, 2, 3, 4, 5, 6, 7, 100]
+        cmap_accumulation:
+          - "#ffffff"
+          - "#04e9e7"
+          - "#019ff4"
+          - "#0300f4"
+          - "#02fd02"
+          - "#01c501"
+          - "#008e00"
+          - "#fdf802"
+          - "#e5bc00"
+          - "#fd9500"
+          - "#fd0000"
+          - "#d40000"
+          - "#bc0000"
+          - "#f800fd"
+        precip_and_related_fields: [tp, cp]
+    enabled: True
+    scatter: False
+    mode: asyncio
+  callbacks: {}
+  benchmark_profiler:
+    memory:
+      enabled: True
+      steps: 5
+      warmup: 2
+      extra_plots: False
+      trace_rank0_only: False
+    time:
+      enabled: True
+      verbose: False
+    speed:
+      enabled: True
+    system:
+      enabled: True
+    model_summary:
+      enabled: True
+    snapshot:
+      enabled: True
+      steps: 4
+      warmup: 0
+  debug:
+    anomaly_detection: False
+  profiler: False
+  enable_checkpointing: True
+  checkpoint:
+    every_n_minutes:
+      save_frequency: 30
+      num_models_saved: 3
+    every_n_epochs:
+      save_frequency: 1
+      num_models_saved: 3
+    every_n_train_steps:
+      save_frequency: null
+      num_models_saved: 0
+  log:
+    wandb:
+      enabled: False
+    tensorboard:
+      enabled: False
+    mlflow:
+      enabled: False
+    interval: 100
+  enable_progress_bar: True
+  print_memory_summary: False
+hardware:
+  paths:
+    data: ${oc.decode:${oc.env:DATASETS_PATH}}
+    output: ${oc.decode:${oc.env:OUTPUT_DIR}}
+    logs:
+      base: ${hardware.paths.output}/logs
+      wandb: ${hardware.paths.output}/logs/wandb
+      mlflow: ${hardware.paths.output}/logs/mlflow
+      tensorboard: ${hardware.paths.output}/logs/tensorboard
+    checkpoints: ${hardware.paths.output}/checkpoint
+    plots: ${hardware.paths.output}/plots
+    profiler: ${hardware.paths.output}/profiler
+    graph: ${hardware.paths.output}/graphs
+  files:
+    dataset: aifs-ea-an-oper-0001-mars-n320-1979-2022-6h-v6.zarr
+    dataset_land: aifs-ea-an-oper-0001-mars-n320-1979-2023-6h-v1-land.zarr
+    graph: graph_enc_proc_dec_n320.pt
+    checkpoint:
+      every_n_epochs: aifs-by_epoch-epoch_{epoch:03d}-val_wmse_{val_wmse:.3e}
+      every_n_train_steps: aifs-by_step-epoch_{epoch:03d}-step_{step:06d}
+      every_n_minutes: aifs-by_time-epoch_{epoch:03d}-step_{step:06d}
+    warm_start: null
+  accelerator: auto
+  num_gpus_per_node: 4
+  num_nodes: 16
+  num_gpus_per_model: 4
+graph:
+  overwrite: True
+  data: data
+  hidden: hidden
+  nodes:
+    data:
+      node_builder:
+        _target_: anemoi.graphs.nodes.ZarrDatasetNodes
+        dataset: ${dataloader.dataset}
+      attributes:
+        area_weight:
+          _target_: anemoi.graphs.nodes.attributes.AreaWeights
+          norm: unit-max
+    hidden:
+      node_builder:
+        _target_: anemoi.graphs.nodes.ReducedGaussianGridNodes
+        grid: o96
+  edges:
+    - source_name: data
+      target_name: hidden
+      edge_builder:
+        _target_: anemoi.graphs.edges.CutOffEdges
+        cutoff_factor: 0.6
+      attributes:
+        edge_length:
+          _target_: anemoi.graphs.edges.attributes.EdgeLength
+          norm: unit-std
+        edge_dirs:
+          _target_: anemoi.graphs.edges.attributes.EdgeDirection
+          norm: unit-std
+    - source_name: hidden
+      target_name: data
+      edge_builder:
+        _target_: anemoi.graphs.edges.KNNEdges
+        num_nearest_neighbours: 3
+      attributes:
+        edge_length:
+          _target_: anemoi.graphs.edges.attributes.EdgeLength
+          norm: unit-std
+        edge_dirs:
+          _target_: anemoi.graphs.edges.attributes.EdgeDirection
+          norm: unit-std
+model:
+  activation: GELU
+  num_channels: 1024
+  model:
+    _target_: anemoi.models.models.encoder_processor_decoder.AnemoiModelEncProcDec
+  processor:
+    _target_: anemoi.models.layers.processor.TransformerProcessor
+    _convert_: all
+    activation: GELU
+    num_layers: 16
+    num_chunks: 2
+    mlp_hidden_ratio: 4
+    num_heads: 16
+    window_size: 1120
+    dropout_p: 0
+  encoder:
+    _target_: anemoi.models.layers.mapper.GraphTransformerForwardMapper
+    _convert_: all
+    trainable_size: 8
+    sub_graph_edge_attributes: [edge_length, edge_dirs]
+    activation: GELU
+    num_chunks: 1
+    mlp_hidden_ratio: 4
+    num_heads: 16
+  decoder:
+    _target_: anemoi.models.layers.mapper.GraphTransformerBackwardMapper
+    _convert_: all
+    trainable_size: 8
+    sub_graph_edge_attributes: [edge_length, edge_dirs]
+    activation: GELU
+    num_chunks: 1
+    mlp_hidden_ratio: 4
+    num_heads: 16
+  trainable_parameters:
+    data: 8
+    hidden: 8
+    data2hidden: 8
+    hidden2data: 8
+  attributes:
+    edges: [edge_length, edge_dirs]
+    nodes: []
+  node_loss_weight: area_weight
+  bounding:
+    - _target_: anemoi.models.layers.bounding.ReluBounding
+      variables:
+        - tp
+        - ro
+        - tcw
+        - ssrd
+        - q_50
+        - q_100
+        - q_150
+        - q_200
+        - q_250
+        - q_300
+        - q_400
+        - q_500
+        - q_600
+        - q_700
+        - q_850
+        - q_925
+        - q_1000
+    - _target_: anemoi.models.layers.bounding.HardtanhBounding
+      variables: [tcc, swvl1, swvl2]
+      min_val: 0
+      max_val: 1
+    - _target_: anemoi.models.layers.bounding.FractionBounding
+      variables: [cp, sf]
+      min_val: 0
+      max_val: 1
+      total_var: tp
+    - _target_: anemoi.models.layers.bounding.FractionBounding
+      variables: [lcc, mcc, hcc]
+      min_val: 0
+      max_val: 1
+      total_var: tcc
+training:
+  run_id: null
+  fork_run_id: null
+  load_weights_only: null
+  deterministic: False
+  precision: 16-mixed
+  multistep_input: 2
+  accum_grad_batches: 1
+  num_sanity_val_steps: 6
+  gradient_clip:
+    val: 32
+    algorithm: value
+  swa:
+    enabled: False
+    lr: 0.0001
+  zero_optimizer: False
+  training_loss:
+    _target_: anemoi.training.losses.mse.WeightedMSELoss
+    scalars: [variable, loss_weights_mask]
+    ignore_nans: False
+  loss_gradient_scaling: False
+  validation_metrics:
+    - _target_: anemoi.training.losses.mse.WeightedMSELoss
+      scalars: []
+      ignore_nans: True
+  rollout:
+    start: 1
+    epoch_increment: 0
+    max: 1
+  max_epochs: null
+  max_steps: 260000
+  lr:
+    rate: 0.00003125
+    iterations: 260000
+    min: 3.0e-7
+  variable_loss_scaling:
+    default: 1
+    pl:
+      q: 0.6
+      t: 6
+      u: 0.8
+      v: 0.5
+      w: 0.001
+      z: 12
+    sfc:
+      sp: 10
+      10u: 0.5
+      10v: 0.5
+      100u: 0.1
+      100v: 0.1
+      2d: 0.5
+      tp: 0.025
+      cp: 0.0025
+      ro: 0.005
+      sf: 0.025
+      tcc: 0.1
+      mcc: 0.1
+      lcc: 0.1
+      hcc: 0.1
+      swvl2: 200
+      swvl1: 100
+      stl2: 10
+      stl1: 1
+      ssrd: 0.05
+      strd: 0.1
+  metrics: [z_500, t_850, u_850, v_850]
+  pressure_level_scaler:
+    _target_: anemoi.training.data.scaling.ReluPressureLevelScaler
+    minimum: 0.2
+    slope: 0.001

run_AIFS_v1.ipynb ADDED Viewed

The diff for this file is too large to render. See raw diff