Skip to content

Reconstruction

Sacha Lewin edited this page Sep 27, 2025 · 5 revisions

This page explains how to compute the reconstruction RMSE and histograms of signed errors of a trained autoencoder.

The script for reconstruction is located at experiments/autoencoder/reconstruction.py.

Configuration file

model_path: /path/to/appa/autoencoders/your_ae/lap
data_path: era5
id: valid

checkpoint: last  # or best

start_date: 2000-04-01
end_date: 2000-04-30

batch_size: 1
num_chunks: 8
delete_tmp: true  # delete tmp files after completion

# Signed error histograms
num_bins: 200
bin_ranges:
  2m_temperature: [-10, 10]
  10m_u_component_of_wind: [-10, 10]
  10m_v_component_of_wind: [-10, 10]
  mean_sea_level_pressure: [-7, 7]
  total_precipitation: [-4, 4]
  sea_surface_temperature: [-20, 20]
  temperature: [-10, 10]
  u_component_of_wind: [-10, 10]
  v_component_of_wind: [-10, 10]
  geopotential: [-600, 600]
  specific_humidity: [-5, 5]
multipliers:
  specific_humidity: 1000  # kg/kg to g/kg
  total_precipitation: 1000  # m to mm
  mean_sea_level_pressure: 0.01  # Pa to hPa

hardware:
  backend: slurm  # or async
  account: your_account
  latent_chunk:
    cpus: 8
    ram: 60GB
    time: "1:00:00"
    partition: your_partition
  aggregate:
    cpus: 4
    ram: 60GB
    time: "05:00"
    partition: your_partition
  • The first section defines the path to the AE, your ground-truth data to reconstruct, and the ID of the run for saving. The result will be located in ae_path/reconstruction/id.
  • model_target defines whether to use the "best" model or the latest checkpoint.
  • start_date and end_date define the interval over which you reconstruct samples. This should be your test set range.
  • batch_size should probably 1, depending on your resources. Larger than 1 might also not work depending on XFormers.
  • num_chunks divides the data into chunks for faster processing on Slurm clusters.
  • delete_tmp toggles the deletion of temporary saved files after computation. Only recommended when debugging the code.
  • num_bins defines the number of bins, i.e., the "resolution", of your signed error histograms.
  • bin_ranges defines the value range for each variable. This is manually defined for better readability of the histograms.
  • multipliers defines whether to change units for some variables.
  • hardware is once again similar to other scripts.

You can then simply start the reconstruction process with python reconstruction.py.

Visualization

Reconstruction saves:

  • RMSE.
  • Signed error histogram.
  • Snapshots.

RMSE can be visualized in the scripts/plots/rmse_snapshots.py or scripts/plots/rmse_spectra.py scripts. Signed error histograms can be rendered with scripts/plots/physical_consistency.py. Snapshots can be rendered in scripts/plots/rmse_snapshots.py.

Please see this page for more information.

Next step

Now, you can compute the power spectra of your models.

Clone this wiki locally