Skip to content

Particle creation efficiency #665

@angus-g

Description

@angus-g

I realise #263 sort of touched on this issue, but I think there's still a lot to be gained here. I made a pretty minimal benchmark for doing nothing but creating JITParticle objects from a list:

import numpy as np
import parcels

n = 10000
l = np.zeros(n)
fs = parcels.FieldSet.from_data(
    {"U": np.array([[0.0]]), "V": np.array([[0.0]])},
    {"lon": np.array([0.0]), "lat": np.array([0.0])},
)
ps = parcels.ParticleSet.from_list(fs, parcels.JITParticle, l, l)

For 10,000 particles, this takes 2.5s (0.25ms/particle). This does seem to scale linearly: 100,000 particle takes 25s. In my opinion however, this is quite slow.

As some context, parcels uses the array of structures method of storing particles (https://en.wikipedia.org/wiki/AoS_and_SoA), which is the most intuitive and natural, but also triggers two big pain points in Python: creating objects, and direct loops. I've been leveraging parcels for some simulations with high particle counts, with frequent initialisation of particle sets. For this, I transposed the particle storage to SoA (angus-g@5d85fb5). Running the same benchmark on my branch takes 2.6s for 10,000,000 particles (1000x speedup; this would take something like 40 minutes normally).

I don't think this should be taken as advocation for switching the particle storage scheme, rather as motivation for seeking out different available methods. One thought that comes to mind is to use Cython rather than JIT particles: the interpolation routines can all be written in Cython-flavoured Python, particles can be AoS of fast extension types, and the need to generate C code on-the-fly may be removed.

Initial field sampling

The other big performance killer during particle creation is specifying a variable on a derived particle class whose initial value is sampled from a field. Because this happens on the Python side, it is very expensive. My workaround has been to initialise the variable value to zero, then run a sample-only kernel for zero time. This has the desired effect, utilising the much more efficient interpolation routines, and is scalable to high particle counts. Perhaps this mechanism could be formalised (at least for JIT particles).

Other possible improvements

While particle creation was a big showstopper for me, there are a couple of other pretty low-hanging fruit. There are a lot of loops over particles in Kernel, in the form of list comprehensions (L320-321, L347-348). These aren't even part of the core advection flow, and yet completely dominate runtime for higher particle counts.

The same issue, but perhaps a lower priority is during particle writes, which involve the same looping over particles to extract the data. The recent change to using numpy dumps as an intermediate format has helped with this, and at some point the actual IO cost dominates anyway.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions