# ParticleDA.jl [`ParticleDA.jl`](https://github.com/Team-RADDISH/ParticleDA.jl) is a Julia package to perform data assimilation with particle filters, distributed using MPI. ## Installation To install the latest stable version of the package, open the [Julia REPL](https://docs.julialang.org/en/v1/stdlib/REPL/), enter the package manager with `]`, then run the command ``` add ParticleDA.jl ``` If you plan to develop the package (make changes, submit pull requests, etc), in the package manager mode run this command ``` dev ParticleDA ``` This will automatically clone the repository to your local directory `~/.julia/dev/ParticleDA`. You can exit from the package manager mode by pressing `CTRL + C` or, alternatively, the backspace key when there is no input in the prompt. ## Usage After installing the package, you can start using it in Julia's REPL with ```julia using ParticleDA ``` The run the particle filter you can use the function `run_particle_filter`: ```@docs run_particle_filter ``` To simulate observations from the model (which can be used to for example test the filtering algorithms) you can use the function `simulate_observations_from_model` ```@docs simulate_observations_from_model ``` The `filter_type` argument to [`run_particle_filter`](@ref) should be a concrete subtype of the `ParticleFilter` abstract type. ```@docs ParticleDA.ParticleFilter BootstrapFilter OptimalFilter ``` The `summary_stat_type` argument to [`run_particle_filter`](@ref) should be a concrete subtype of the `AbstractSummaryStat` abstract type. ```@docs ParticleDA.AbstractSummaryStat ParticleDA.AbstractCustomReductionSummaryStat ParticleDA.AbstractSumReductionSummaryStat ParticleDA.MeanAndVarSummaryStat ParticleDA.MeanSummaryStat ParticleDA.NaiveMeanAndVarSummaryStat ParticleDA.NaiveMeanSummaryStat ``` The next section details how to write the interface between the model and the particle filter. ### Interfacing the model The model needs to define a custom data structure and a few functions, that will be used by [`run_particle_filter`](@ref): * a custom structure which holds the data about the model. This will be used to dispatch the methods to be defined, listed below; * an initialisation function with the following signature: ```julia init(params_dict::Dict, n_tasks::Integer) -> model ``` with `params_dict` a dictionary with the parameters of the model and `n_tasks` an integer specifying the maximum number tasks (coroutines) parallelisable operations will be scheduled over. This initialisation function should create an instance of the model data structure and return it. The value of `n_tasks` can be used to create task-specific buffers for writing to when computing the model updates to avoid reallocating data structures on each function call. As tasks may be run in parallel over multiple threads, any buffers used in functions called within tasks should be unique to the task; to facilitate this functions in the model interface (see below) which may be called within tasks scheduled in parallel are passed a `task_index` argument which is a integer index in `1:n_tasks` which is guaranteed to be unique to a particular task and so can be used to index in to task specific buffers. * The model needs to extend the following methods, using the model type for dispatch: ```@docs ParticleDA.get_state_dimension ParticleDA.get_observation_dimension ParticleDA.get_state_eltype ParticleDA.get_observation_eltype ParticleDA.sample_initial_state! ParticleDA.update_state_deterministic! ParticleDA.update_state_stochastic! ParticleDA.sample_observation_given_state! ParticleDA.get_log_density_observation_given_state ParticleDA.write_model_metadata ``` * Optionally, if the model has additive Gaussian observation and state noise, it may also extend the following methods, again using the model type for dispatch, to allow using the more statistically efficient `OptimalFilter` for filtering ```@docs ParticleDA.get_observation_mean_given_state! ParticleDA.get_covariance_state_noise ParticleDA.get_covariance_observation_noise ParticleDA.get_covariance_state_observation_given_previous_state ParticleDA.get_covariance_observation_observation_given_previous_state ``` Functions are provided to run tests to check a model correctly implements the expected interfaces: ```@docs ParticleDA.run_unit_tests_for_generic_model_interface ParticleDA.run_tests_for_optimal_proposal_model_interface ParticleDA.run_tests_for_convergence_of_filter_estimates_against_kalman_filter ``` ### Other function you may want to extend ```@docs ParticleDA.get_state_indices_correlated_to_observations ParticleDA.get_initial_state_mean! ParticleDA.write_snapshot ParticleDA.write_state ParticleDA.get_covariance_initial_state ParticleDA.init_filter ParticleDA.sample_proposal_and_compute_log_weights! ParticleDA.write_observation ParticleDA.write_weights ParticleDA.get_initial_state_mean ``` ### Input parameters You can store the input parameters in an YAML file with the following structure ```yaml filter: key1: value1 model: model_name1: key2: value2 key3: value3 model_name2: key4: value4 key5: value5 ``` The parameters under `filter` are related to the particle filter, under `model` you can specify the parameters for different models. The particle filter parameters are saved in the following data structure: ```@docs ParticleDA.FilterParameters ``` ## Example: estimating the state in a tsunami model A full example of a model interfacing `ParticleDA` is available in `test/models/llw2d.jl`. This model represents a simple two dimensional simulation of tsunami dynamics and is partly based on the [tsunami data assimilation code](https://github.com/tktmyd/tdac) by Takuto Maeda. The particle filter can be run with observations simulated from the model as follows ```julia # Load ParticleDA using ParticleDA # Save some variables for later use test_dir = joinpath(dirname(pathof(ParticleDA)), "..", "test") llw2d_src = joinpath(test_dir, "models", "llw2d.jl") input_file = joinpath(test_dir, "integration_test_1.yaml") observation_file = tempname() # Instantiate the test environment using Pkg Pkg.activate(test_dir) Pkg.instantiate() # Include the sample model source code and load it include(llw2d_src) using .LLW2d # Simulate observations from the model to use simulate_observations_from_model(LLW2d.init, input_file, observation_file) # Run the (optimal proposal) particle filter with simulated observations computing the # mean and variance of the particle ensemble. On non-Intel architectures you may need # to use NaiveMeanAndVarSummaryStat instead final_states, final_statistics = run_particle_filter( LLW2d.init, input_file, observation_file, OptimalFilter, MeanAndVarSummaryStat ) ``` ### Parameters of the test model The [Input parameters](@ref) section shows how to pass the input parameters for the filter and the model. For the model included in the test suite, called `llw2d`, you can set the following parameters: ```@docs LLW2d.LLW2dModelParameters ``` ### Observation station coordinates The coordinates of the observation stations can be set in two different ways. 1. Provide the coordinates in an input file. Set the parameter `station_filename` to the name of your input file. The input file is in plain text, the format is one row per station containing x, y - coordinates in metres. Here is a simple example with two stations ```julia # Comment lines starting with '#' will be ignored by the code # This file contains two stations: at [1km, 1km] and [2km, 2km] 1.0e3, 1.0e3 2.0e3, 2.0e3 ``` 2. Provide parameters for an equispaced rectilinear grid of observation stations. The values of these parameters should then be set: * `n_stations_x` : Number of observation stations in the x direction * `n_stations_y` : Number of observation stations in the y direction * `station_distance_x` : Distance between stations in the x direction (m) * `station_distance_y` : Distance between stations in the y direction (m) * `station_boundary_x` : Distance between bottom left edge of box and first station in the x direction (m) * `station_boundary_y` : Distance between bottom left edge of box and first station in the y direction (m) As an example, one could set ```julia n_stations_x=2, n_stations_y=2, station_distance_x=1.0e3, station_distance_y=1.0e3, station_boundary_x=10.0e3, station_boundary_y=10.0e3, ``` to generate 4 stations at `[10km, 10km]`, `[10km, 11km]`, `[11km, 10km]` and `[11km, 11km]`. ## Output If the filter parameter `verbose` is set to `true`, [`run_particle_filter`](@ref) will produce an HDF5 file in the run directory. The file name is `particle_da.h5` by default (this is configurable using the `output_filename` filter parameter). The file contains the summary statistics of the estimate state distribution (by default the mean and variance), particle weights, parameters used, and other metadata at each time step observation were assimilated. To read the output file, use the [HDF5 library](https://www.hdfgroup.org/solutions/hdf5/). A basic plotting tool is provided in a [Jupyter notebook](https://github.com/Team-RADDISH/ParticleDA.jl/blob/main/extra/Plot_tdac_output.ipynb). This is intended as a template to build more sophisticated postprocessing tools, but can be used for some simple analysis. Set the variable `timestamp` in the third cell to plot different time slices from the output file. More functionality may be added as the package develops. ## Running in parallel The particle state update is parallelised using both MPI and threading. According to our preliminary tests both methods work well at small scale. To use the threading, set the environment variable `JULIA_NUM_THREADS` to the number of threads you want to use before starting Julia and then call the [`run_particle_filter`](@ref) function normally. You can check the number of threads julia has available by calling in Julia's REPL ```julia Threads.nthreads() ``` To use the MPI parallelisation, write a Julia script that calls the [`run_particle_filter`](@ref) function for the relevant model and observations and run it in an Unix shell with ```bash mpirun -np julia ``` Note that the parallel performance may vary depending on the performance of the algorithm. In general, a degeneracy of the particle weights will lead to poor load balance and parallel performance. See [this issue](https://github.com/Team-RADDISH/ParticleDA.jl/issues/115#issuecomment-675468511) for more details. ## Kalman filters To allow validation that the filter implementations give the expected results when applied to linear-Gaussian state space models, dense and matrix-free Kalman filter implementations are available in the `ParticleDA.Kalman` module ```@docs ParticleDA.Kalman.KalmanFilter ParticleDA.Kalman.MatrixFreeKalmanFilter ParticleDA.Kalman.run_kalman_filter ParticleDA.Kalman.lmult_by_observation_matrix! ParticleDA.Kalman.pre_and_postmultiply_by_state_transition_matrix! ParticleDA.Kalman.pre_and_postmultiply_by_observation_matrix! ParticleDA.Kalman.lmult_by_state_transition_matrix! ``` ## Testing We have a basic test suite for `ParticleDA.jl`. You can run the tests by entering the package manager mode in Julia's REPL with `]` and running the command ``` test ParticleDA ``` ## License The `ParticleDA.jl` package is licensed under the MIT "Expat" License.