Skip to content

Operations on transposed (or similar) binned data return incorrect values in bin-element coords/masks/attrs #3277

@SimonHeybrock

Description

@SimonHeybrock

Consider

class BinVariableMakerDataArray : public variable::BinVariableMaker<DataArray> {
private:
Variable call_make_bins(const Variable &parent, const Variable &indices,
const Dim dim, const DType type,
const Dimensions &dims, const units::Unit &unit,
const bool variances) const override {
const auto &source = buffer(parent);
if (parent.dims() !=
indices
.dims()) // would need to select and copy slices from source coords
throw std::runtime_error(
"Shape changing operations with bucket<DataArray> not supported yet");
// TODO This may also fail if the input buffer has extra capacity (rows not
// in any bucket).
auto buffer = DataArray(
variable::variableFactory().create(type, dims, unit, variances),
copy(source.coords()), copy(source.masks()), copy(source.attrs()));
// TODO is the copy needed?
return make_bins(copy(indices), dim, std::move(buffer));
}

Note in particular the first TODO. This means

  1. We cannot apply operations to a slice of binned data (without copying first), instead we get an exception since the buffer length is not matching up. This is not nice, but at least correct.
  2. If indices are not in order, coords (and masks) are copied directly into the output buffer without reordering. The implicit length check is not a sufficient safe guard here. This is a bad bug.

Example:

import scipp as sc

table = sc.data.table_xyz(10)
da = table.bin(x=2, y=2)
print('data:')
print(da['x', 1]['y', 0].bins.data.value.values)
print((da.transpose() * 1)['x', 1]['y', 0].bins.data.value.values)
print((da.transpose().copy())['x', 1]['y', 0].bins.data.value.values)
print('coords:')
print(da['x', 1]['y', 0].bins.coords['x'].value.values)
print((da.transpose() * 1)['x', 1]['y', 0].bins.coords['x'].value.values)
print((da.transpose().copy())['x', 1]['y', 0].bins.coords['x'].value.values)

Output:

data (correct):
[1.08584904 1.00868831]
[1.08584904 1.00868831]
[1.08584904 1.00868831]
coords (broken!):
[0.97669977 0.96407925]
[0.38019574 0.26169242]
[0.97669977 0.96407925]

Note the copy() uses a different code path and is thus correct.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions