Skip to content

Conversation

@bouweandela
Copy link
Member

@bouweandela bouweandela commented Jan 31, 2024

Description

This pull request splits the computation up in three stages:

  1. Preprocessor functions are run in parallel using Dask without saving data
  2. Preprocessor files are populated with data in parallel using Dask
  3. Diagnostic scripts are run

Only works with max_parallel_tasks: 1 at the moment.

Ideas for further improvements:

  1. optimize multi-model functions, as these limit parallelism
  2. use one delayed per group in multi-model/ensemble means to increase parallelism
  3. try to make delayed operations 'pure', e.g. by copying the input cubes in preprocess before calling the preprocessor function
  4. see if splitting Dataset.load prior to preprocessor step concatenate up in multiple delayeds improves parallelism

Blocking issues

These are things that block this from being used in practice.

  1. ESMPy crashes if you try to from a different thread than the main one. Example script that produces the crash:
    import threading
    
    import numpy as np
    
    
    def run():
        import esmpy
        m = esmpy.Manager(debug=True)
        esmpy.Grid(np.array((10, 20)),
                   num_peri_dims=1,
                   staggerloc=[esmpy.StaggerLoc.CENTER])
    
    
    def main():
    
        thread = threading.Thread(target=run)
        thread.start()
        thread.join()
    
    
    if __name__ == '__main__':
        main()
    results in Segmentation fault (core dumped) and a log file called PET0.ESMF_LogFile is written by ESMF with the following content:
    20240226 150217.785 INFO             PET0 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
    20240226 150217.785 INFO             PET0 !!! THE ESMF_LOG IS SET TO OUTPUT ALL LOG MESSAGES !!!
    20240226 150217.785 INFO             PET0 !!!     THIS MAY CAUSE SLOWDOWN IN PERFORMANCE     !!!
    20240226 150217.785 INFO             PET0 !!! FOR PRODUCTION RUNS, USE:                      !!!
    20240226 150217.785 INFO             PET0 !!!                   ESMF_LOGKIND_Multi_On_Error  !!!
    20240226 150217.785 INFO             PET0 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
    20240226 150217.785 INFO             PET0 Running with ESMF Version   : 8.4.2
    20240226 150217.785 INFO             PET0 ESMF library build date/time: "Apr 26 2023" "11:27:56"
    20240226 150217.785 INFO             PET0 ESMF library build location : /home/conda/feedstock_root/build_artifacts/esmf_1682507633250/work
    20240226 150217.785 INFO             PET0 ESMF_COMM                   : mpiuni
    20240226 150217.785 INFO             PET0 ESMF_MOAB                   : enabled
    20240226 150217.785 INFO             PET0 ESMF_LAPACK                 : enabled
    20240226 150217.785 INFO             PET0 ESMF_NETCDF                 : enabled
    20240226 150217.785 INFO             PET0 ESMF_PNETCDF                : disabled
    20240226 150217.785 INFO             PET0 ESMF_PIO                    : disabled
    20240226 150217.785 INFO             PET0 ESMF_YAMLCPP                : enabled
    20240226 150217.785 ERROR            PET0 ESMCI_VM.C:2169 ESMCI::VM::getCurrent() Internal error: Bad condition  - - Could not determine current VM
    
    Issue reported via ESMF support mailinglist

Concerns

These are things that we need to be careful about, but should not a problem.

  1. thread safety, known unsafe libraries:
    • NetCDF4 library
  2. custom configuration (config-developer, extra facets, custom cmor tables) may not be available on Dask workers
  3. is provenance correctly updated with results from preprocessing before saving?
  4. potential for re-using parts of the computation seems limited da.store loses dependency information dask/dask#8380

Before you get started

Checklist

It is the responsibility of the author to make sure the pull request is ready to review. The icons indicate whether the item will be subject to the 🛠 Technical or 🧪 Scientific review.


To help with the number pull requests:

@bouweandela bouweandela requested a review from fnattino January 31, 2024 15:27
@bouweandela bouweandela changed the title Proof of concept of running metadata and data computations on Dask Proof of concept: running metadata and data computations on Dask Feb 5, 2024
@valeriupredoi
Copy link
Contributor

I dig this PR 😍 we should talk about the bigger picture though - may be able to suggest some novel stuffs 🍺

@github-actions
Copy link
Contributor

In order to maintain a backlog of relevant pull requests, we automatically label them as stale after 180 days of inactivity.

If this pull request is still important to you, please comment below to remove the stale label. Otherwise, this pull request will be automatically closed in 60 days. If this pull request only suffers from a lack of reviewers, please tag the @ESMValGroup/technical-lead-development-team so they can help you find a suitable reviewer.

@github-actions github-actions bot added the Stale label Jun 27, 2025
@github-project-automation github-project-automation bot moved this from In Progress to Done in ESiWACE3 ESMValTool service Jun 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

No open projects

Development

Successfully merging this pull request may close these issues.

3 participants