Skip to content

Is the mechanism for controlling whether hist replaces dimensions too indirect? #3435

@SimonHeybrock

Description

@SimonHeybrock

According to the docstring:

When histogramming a dimension with an existing dimension-coord, the binning for
the dimension is modified, i.e., the input and the output will have the same
dimension labels.

When histogramming by non-dimension-coords, the output will have new dimensions
given by the names of these coordinates. These new dimensions replace the
dimensions the input coordinates depend on.

In practice this means that:

  • A prior use of transform_coords with or without the rename_dims option affects the outcome of a subsequent hist.
  • It is possible to indirectly control which dimensions are to be removed, as shown in the example below, by renaming and/or flattening dimensions:
import scipp as sc

table = sc.data.table_xyz(1000)
binned = table.bin(x=3, y=4)  # sizes {'x': 3, 'y': 4}
binned.rename_dims(y='z').hist(z=5)  # sizes {'x': 3, 'z': 5}
binned.flatten(to='z').hist(z=5)  # sizes {'z': 5}

The mechanism was introduced since it allows the algorithm to either add a new dimension, or replace an existing dimension. But is it too confusing when working with multi-dimensional data?

Would it suffice to improve the docstring (I am thinking of adding concrete examples on how to control the behavior), or do we need to think of something else?

Note that related functions such as bin are also affected.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions