Skip to content

Avoid overhead in 2-D (or higher) hist operations #3443

@SimonHeybrock

Description

@SimonHeybrock

hist uses bin when more than 1 dimension is involved. If there are many auxiliary coordinates that do not participate in the bin or subsequent hist operation then bin has to handle them, i.e., copy all the elements, etc. This can become costly:

import scipp as sc

da = sc.data.table_xyz(100_000_000)
da.variances = da.values
da.masks["mask1"] = da.coords["y"] > 0.5 * sc.Unit("m")
da.masks["mask2"] = da.coords["z"] > 0.5 * sc.Unit("m")

dummy = [f"dummy{i}" for i in range(10)]
for name in dummy:
    da.coords[name] = da.coords["x"].copy()

x = sc.linspace('x', 0, 1, 14*32+1, unit='m')
da.hist(x=x, y=100)  # 2 s
da.drop_coords(dummy).hist(x=x, y=100)  # 1 s

It should be simple to avoid this by changing the implementation of hist on the Python side.

Metadata

Metadata

Assignees

Labels

optimisationIncreases performance (hopefully)

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions