Skip to content

Generating Unconditional Samples

Sacha Lewin edited this page Sep 24, 2025 · 8 revisions

This page explains how to generate prior (unconditional) samples with a trained denoiser. This is a good test to check if your denoiser learned properly.

The script is located at experiments/diffusion/generate.py. You will notice that most inference scripts follow a very similar pattern for running them.

We generate prior trajectories following the blanket mechanism of Score-based Data Assimilation by Rozet and Louppe (2023). This means that the whole trajectory, or arbitrary size, is denoised simultaneously by composing the score of a number of blankets, i.e., smaller windows whose score is computed independently. Note that we use a different mechanism in our forecasting script, where we use autoregressive sampling. We also provide an all-at-once forecasting scripts with blankets in forecast_aao.py, but we did not include it in the paper due to its inferior performance because of the few known states to condition on, compared to the size of the trajectory, which significantly limits information flow during denoising.

Configuration file

model_path: /path/to/project/autoencoders/your_ae/1/latents/wiki/denoisers/your_denoiser/0
model_target: best  # Options: best, last

diffusion:
  num_steps: 64  # Defaults to model's validation denoising steps
  sampler:
    type: lms
    config: {}

trajectory_sizes: "72"

# Samples for a given window size
num_samples_per_date: 2
start_dates:
  - "2000-04-03 0h"
  - "2000-04-20 12h"

blanket_overlap: 4  # Overlap between blankets

precision: float16  # Options: float32, float16, bfloat16, null = model training precision

hardware:
  backend: slurm
  account: users
  gpus_per_node: 1
  gen:
    cpus: 8
    ram: 60GB
    time: "6:00"
    partition: a5000
    gpus: 1  # TODO: Support for "auto" mode that computes GPUs based on a number of blankets per GPU for scalable experiments.
  aggregate:
    cpus: 8
    ram: 32GB
    time: "4:00"
    partition: a5000
  • model_path: Absolute path of your model, including the lap, i.e., finishes in .../model_id/lap.
  • model_target: Either use the best model checkpoint, according to validation loss, or the latest one saved.
  • diffusion: Diffusion settings. If set to null, it will use the number of validation steps in the training configuration. Available samplers are pc, ddpm, ddim, rewind, and lms. More information can be found in the Samplers page.
  • trajectory_size: Size of the trajectory in number of states. Some scripts save as {trajectory_size}h even when the dt value is greater than 1 hour, so pay attention to possible inconsistencies if you run models with larger stride.
  • Each inference script follows the literature and generates ensembles of trajectories. Here, you can see that we generate ensembles of two trajectories, and two ensembles, one that starts at midnight on April 3rd, and one that starts at 12h on April 20th. Make sure these dates are included in your total ERA5 data, as the autoencoder takes in context and timestamps, that can be loaded from there. Ground truth will also be recovered to evaluate and render.
  • blanket_overlap: Specifies the number of states over which two adjacent blankets overlap. Less overlap means less information flow between the independent windows and less compute required per denoising step, while more overlap increases the consistency across states but requires more scores to compute. The number $N$ of blanket given a trajectory of size $T$, blanket size $K$, and blanket overlap $O$ is $N = \frac{(T - K)}{(K - O)} + 1$.
  • precision: Specifies the inference precision, either float16, bfloat16, or float32. We did not notice major differences, although we recommend training in float32, more importantly.
  • Hardware settings follow other scripts. Do not forget to put quotes around times in minutes, or they might be interpreted as hours by PyYAML.

Rendering

All scripts save results as latent samples to save space, as full states can reach 1GB each, depending on your number of variables. For simplicity, we therefore separate generation of latent space, and rendering/evaluation which decodes them at runtime.

For rendering your results, please see the dedicated rendering page.

Evaluation

Although evaluation of prior samples is obviously not as interesting as evaluating forecasts, it acts as some sort of "climatology", i.e., average over long periods (if trained on several years). It can also be compared to a persistence model, i.e., state(t) = state(t=0) for all t.

For evaluating, please see the dedicated page, and for prior generation, please see this page.

Next step

Once you have tried to generate prior samples, rendering them, and evaluating them, we recommend having a look at conditional inference, such as forecasting or reanalysis.

Clone this wiki locally