-
Notifications
You must be signed in to change notification settings - Fork 21
Closed
Labels
Description
Consider
scipp/lib/dataset/variable_instantiate_bin_elements.cpp
Lines 90 to 109 in 5b12dc3
| class BinVariableMakerDataArray : public variable::BinVariableMaker<DataArray> { | |
| private: | |
| Variable call_make_bins(const Variable &parent, const Variable &indices, | |
| const Dim dim, const DType type, | |
| const Dimensions &dims, const units::Unit &unit, | |
| const bool variances) const override { | |
| const auto &source = buffer(parent); | |
| if (parent.dims() != | |
| indices | |
| .dims()) // would need to select and copy slices from source coords | |
| throw std::runtime_error( | |
| "Shape changing operations with bucket<DataArray> not supported yet"); | |
| // TODO This may also fail if the input buffer has extra capacity (rows not | |
| // in any bucket). | |
| auto buffer = DataArray( | |
| variable::variableFactory().create(type, dims, unit, variances), | |
| copy(source.coords()), copy(source.masks()), copy(source.attrs())); | |
| // TODO is the copy needed? | |
| return make_bins(copy(indices), dim, std::move(buffer)); | |
| } |
Note in particular the first TODO. This means
- We cannot apply operations to a slice of binned data (without copying first), instead we get an exception since the buffer length is not matching up. This is not nice, but at least correct.
- If indices are not in order, coords (and masks) are copied directly into the output buffer without reordering. The implicit length check is not a sufficient safe guard here. This is a bad bug.
Example:
import scipp as sc
table = sc.data.table_xyz(10)
da = table.bin(x=2, y=2)
print('data:')
print(da['x', 1]['y', 0].bins.data.value.values)
print((da.transpose() * 1)['x', 1]['y', 0].bins.data.value.values)
print((da.transpose().copy())['x', 1]['y', 0].bins.data.value.values)
print('coords:')
print(da['x', 1]['y', 0].bins.coords['x'].value.values)
print((da.transpose() * 1)['x', 1]['y', 0].bins.coords['x'].value.values)
print((da.transpose().copy())['x', 1]['y', 0].bins.coords['x'].value.values)Output:
data (correct):
[1.08584904 1.00868831]
[1.08584904 1.00868831]
[1.08584904 1.00868831]
coords (broken!):
[0.97669977 0.96407925]
[0.38019574 0.26169242]
[0.97669977 0.96407925]
Note the copy() uses a different code path and is thus correct.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Done