Skip to content

Conversation

@SimonHeybrock
Copy link
Member

This was not implemented initially for simplicity, but downstream use showed that is it essential.

@SimonHeybrock SimonHeybrock requested a review from jl-wynen June 6, 2024 05:45

if isinstance(obj, scipp.Variable):
return ScippDataArrayAdapter(scipp.DataArray(obj))
return ScippDataArrayAdapter(scipp.DataArray(obj), scipp=scipp)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit awkward that we have to pass the module as an arg everywhere...
Was it to avoid importing scipp in multiple places?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, making the code cleaner.

Comment on lines +756 to +759
{
'a': sc.array(dims=['x'], values=[1, 2, 3]),
'b': sc.array(dims=['x'], values=[11, 12, 13]),
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm a little confused because the name above is a ScippDataArrayAdapter, so I thought it would be a sc.DataArray, like the pandas DataFrame above.
But I guess you just need a mapping of key to ArrayLike? Does this mean that you never use the .data in the DataArray?
I assume you need a structure that can be sliced/indexed. Could it be a DataGroup instead of a DataArray internally?

I realize this is besides the point of the current PR...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ScippDataArrayAdapter also handles scipp.Variable, interpreting the latter as an data array without coords.

I assume you need a structure that can be sliced/indexed. Could it be a DataGroup instead of a DataArray internally?

Using a dictcurrently, could in principle add support for Dataset and DataGroup. But not "instead of": The DataArray holds the values for a single node, the dict (or DataGroup, ...) maps to multiple nodes, just like the columns of pandas.DataFrame vs. its rows.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, I see now


graph = cb.Graph(g)
mapped = graph.map(node_values).reduce('c', name='d')
mapped['x'] = mapped['d']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, the 'x' here is unrelated to the 'x' dimension in

        {
            'a': sc.array(dims=['x'], values=[1, 2, 3]),
            'b': sc.array(dims=['x'], values=[11, 12, 13]),
        }

above.

If so, can we use a different name here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was asking because at first it confused me, as I thought they were related.

g.add_edge('a', 'b')
graph = cb.Graph(g)
mapped1 = graph.map({'a': [1, 2]}).reduce('b', name='d')
mapped2 = graph.map({'a': sc.array(dims=('x',), values=[1, 2])}).reduce(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand why this is different from the above? In mapped1 you have [1, 2] and in mapped2 a Scipp Variable. Isn't it the same as with the numpy array, that the types are different?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but it actually raises for different reasons, this test is to ensure that both code paths work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess my question was why is it raising for a different reason, I would have expected to raise with the same reason.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is an artifact of the usual problem of having two checks in a particular order, so depending on the exact properties of the input you get one exception or the other.

@nvaytet nvaytet self-assigned this Jun 6, 2024
@jl-wynen
Copy link
Member

jl-wynen commented Jun 7, 2024

If you still want my review, can you explain what this PR is supposed to do? This is hard to tell from the diff alone.

Base automatically changed from fix-ancestor-removal-logic to main June 10, 2024 03:18
@SimonHeybrock
Copy link
Member Author

If you still want my review, can you explain what this PR is supposed to do? This is hard to tell from the diff alone.

It allows compatible mapped values in __setitem__. Previously it was impossible to use this it the graph being set, originated, e.g., from a subgraph of the same graph, and map was applied beforehand. This shows up in practice, e.g., when mapping a reduction workflow graph over a filename, with subsequent reduce calls in several places, to be recombined into a single graph.

@pytest.mark.parametrize(
'node_values',
[
{'a': [1, 2, 3], 'b': [11, 12, 13]},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that {'a': [1, 2, 3]} is sufficient to make the test fail with the code on main. So there seems to be no need to have multiple mapped nodes to test this behaviour. Can you add a test that only maps one node?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

SimonHeybrock and others added 2 commits June 10, 2024 12:36
Co-authored-by: Jan-Lukas Wynen <j-l.wynen@hotmail.de>
@SimonHeybrock SimonHeybrock merged commit 3b67e2d into main Jun 10, 2024
@SimonHeybrock SimonHeybrock deleted the allow-compat-indicies-and-values-insetitem branch June 10, 2024 12:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

4 participants