Reconstruction

This page explains how to compute the reconstruction RMSE and histograms of signed errors of a trained autoencoder.

The script for reconstruction is located at experiments/autoencoder/reconstruction.py.

Configuration file

model_path: /path/to/appa/autoencoders/your_ae/lap
data_path: era5
id: valid

checkpoint: last  # or best

start_date: 2000-04-01
end_date: 2000-04-30

batch_size: 1
num_chunks: 8
delete_tmp: true  # delete tmp files after completion

# Signed error histograms
num_bins: 200
bin_ranges:
  2m_temperature: [-10, 10]
  10m_u_component_of_wind: [-10, 10]
  10m_v_component_of_wind: [-10, 10]
  mean_sea_level_pressure: [-7, 7]
  total_precipitation: [-4, 4]
  sea_surface_temperature: [-20, 20]
  temperature: [-10, 10]
  u_component_of_wind: [-10, 10]
  v_component_of_wind: [-10, 10]
  geopotential: [-600, 600]
  specific_humidity: [-5, 5]
multipliers:
  specific_humidity: 1000  # kg/kg to g/kg
  total_precipitation: 1000  # m to mm
  mean_sea_level_pressure: 0.01  # Pa to hPa

hardware:
  backend: slurm  # or async
  account: your_account
  latent_chunk:
    cpus: 8
    ram: 60GB
    time: "1:00:00"
    partition: your_partition
  aggregate:
    cpus: 4
    ram: 60GB
    time: "05:00"
    partition: your_partition

The first section defines the path to the AE, your ground-truth data to reconstruct, and the ID of the run for saving. The result will be located in ae_path/reconstruction/id.
model_target defines whether to use the "best" model or the latest checkpoint.
start_date and end_date define the interval over which you reconstruct samples. This should be your test set range.
batch_size should probably 1, depending on your resources. Larger than 1 might also not work depending on XFormers.
num_chunks divides the data into chunks for faster processing on Slurm clusters.
delete_tmp toggles the deletion of temporary saved files after computation. Only recommended when debugging the code.
num_bins defines the number of bins, i.e., the "resolution", of your signed error histograms.
bin_ranges defines the value range for each variable. This is manually defined for better readability of the histograms.
multipliers defines whether to change units for some variables.
hardware is once again similar to other scripts.

You can then simply start the reconstruction process with python reconstruction.py.

Visualization

Reconstruction saves:

RMSE.
Signed error histogram.
Snapshots.

RMSE can be visualized in the scripts/plots/rmse_snapshots.py or scripts/plots/rmse_spectra.py scripts. Signed error histograms can be rendered with scripts/plots/physical_consistency.py. Snapshots can be rendered in scripts/plots/rmse_snapshots.py.

Please see this page for more information.