Skip to content

Conversation

@jokasimr
Copy link
Contributor

Fixes #3685

With this change, this is the output from the MRE in the issue:

<scipp.DataArray>
Dimensions: Sizes[x:4, ]
Coordinates:
  comment                    string        <no unit>  ()  "abc"
* x                           int64  [dimensionless]  (x)  [0, 1, 2, 3]
Data:
                              int64  [dimensionless]  (x)  [0, 1, 2, 3]

 <scipp.DataArray>
Dimensions: Sizes[x:4, ]
Coordinates:
  comment                    string        <no unit>  ()  "abc"
* x                           int64  [dimensionless]  (x)  [0, 1, 2, 3]
Data:
                              int64  [dimensionless]  (x)  [0, 2, 4, 6]

 <scipp.DataArray>
Dimensions: Sizes[x:4, ]
Coordinates:
  comment                    string        <no unit>  ()  "123"
* x                           int64  [dimensionless]  (x)  [0, 1, 2, 3]
Data:
                              int64  [dimensionless]  (x)  [0, 1, 2, 3]


<scipp.DataArray>
Dimensions: Sizes[b:2, x:4, ]
Coordinates:
  comment                    string        <no unit>  ()  "abc"
* x                           int64  [dimensionless]  (x)  [0, 1, 2, 3]
Data:
                              int64  [dimensionless]  (b, x)  [0, 1, ..., 4, 6]


<scipp.DataArray>
Dimensions: Sizes[b:2, x:4, ]
Coordinates:
* x                           int64  [dimensionless]  (x)  [0, 1, 2, 3]
Data:
                              int64  [dimensionless]  (b, x)  [0, 2, ..., 6, 9]

This seems more reasonable because comment is not made aligned.
But I'm not sure if this behavior makes most sense here in every case.

How do we actually want this to work?

@jokasimr jokasimr marked this pull request as draft April 22, 2025 11:09
@jokasimr
Copy link
Contributor Author

jokasimr commented Apr 24, 2025

Hm, the test case that makes this fail is:

assert_identical(
    sc.concat((a['x', 0], a['x', 1]), dim='x'),
    a['x', :2]
)

and that doesn't hold now because a['x', 0].coords['x'] is unaligned and then sc.concat((a['x', 0], a['x', 1]), dim='x').coords['x'] is unaligned, while a['x', :2].coords['x'] is aligned.

So unless we want to make a['x', 0].coords['x'] aligned, or remove the identity in the example above, the approach in this PR is not going to work.

@jokasimr
Copy link
Contributor Author

jokasimr commented Apr 24, 2025

xarray behaves the same as scipp if the unaligned coordinate is renamed to the same as the concatenation dimension:

import numpy as np
import xarray as xr

da1 = xr.DataArray(
    np.arange(4),
    {'x': np.arange(4), 'b': np.array('abc')},
    dims=('x',)
)
da2 = xr.DataArray(
    2 * np.arange(4),
    {'x': np.arange(4), 'b': np.array('abc')},
    dims=('x',)
)
da3 = xr.DataArray(
    np.arange(4),
    {'x': np.arange(4), 'b': np.array('123')},
    dims=('x',)
)
print(da1, da2 ,da3)

comb = xr.concat([da1, da2], dim='b')
print(comb)

print(comb + da3)
<xarray.DataArray (x: 4)>
array([0, 1, 2, 3])
Coordinates:
  * x        (x) int64 0 1 2 3
    b        <U3 'abc' <xarray.DataArray (x: 4)>
array([0, 2, 4, 6])
Coordinates:
  * x        (x) int64 0 1 2 3
    b        <U3 'abc' <xarray.DataArray (x: 4)>
array([0, 1, 2, 3])
Coordinates:
  * x        (x) int64 0 1 2 3
    b        <U3 '123'
<xarray.DataArray (b: 2, x: 4)>
array([[0, 1, 2, 3],
       [0, 2, 4, 6]])
Coordinates:
  * x        (x) int64 0 1 2 3
  * b        (b) <U3 'abc' 'abc'
<xarray.DataArray (b: 2, x: 4)>
array([[0, 2, 4, 6],
       [0, 3, 6, 9]])
Coordinates:
  * x        (x) int64 0 1 2 3
  * b        (b) <U3 'abc' 'abc'

in the sense that b from d3 is dropped in favor of b from concat((d1, d2), dim='b').

@SimonHeybrock
Copy link
Member

Hm, the test case that makes this fail is:

assert_identical(
    sc.concat((a['x', 0], a['x', 1]), dim='x'),
    a['x', :2]
)

and that doesn't hold now because a['x', 0].coords['x'] is unaligned and then sc.concat((a['x', 0], a['x', 1]), dim='x').coords['x'] is unaligned, while a['x', :2].coords['x'] is aligned.

So unless we want to make a['x', 0].coords['x'] aligned, or remove the identity in the example above, the approach in this PR is not going to work.

  • a['x', 0].coords['x'] needs to be unaligned since otherwise many ops such as a / a['x', 0] will fail.
  • concat was designed to make the coord aligned again in this case, and I think that makes sense for such "dimension coords" (Xarray terminology)?

@jokasimr
Copy link
Contributor Author

concat was designed to make the coord aligned again in this case, and I think that makes sense for such "dimension coords" (Xarray terminology)?

Yes this is how I understand it too. Xarray behaves same as scipp when the coord is a 'dimension coord', and different otherwise. But scipp doesn't have the concept of a dimension coord', so we have to choose one behavior.

It seems to me we can't do much about this.

@SimonHeybrock
Copy link
Member

concat was designed to make the coord aligned again in this case, and I think that makes sense for such "dimension coords" (Xarray terminology)?

Yes this is how I understand it too. Xarray behaves same as scipp when the coord is a 'dimension coord', and different otherwise. But scipp doesn't have the concept of a dimension coord', so we have to choose one behavior.

Well, if the coord has the same name as the dimension then it is a dimension coord. Others are not.

It seems to me we can't do much about this.

Can't we not set the alignment flag for non-dimension coords (that where not aligned in the input)?

@jokasimr
Copy link
Contributor Author

Can't we not set the alignment flag for non-dimension coords (that where not aligned in the input)?

Aha I didn't know we could take the name of the coord into account, in that case yeah we can do that. I'll try it out

@jokasimr jokasimr marked this pull request as ready for review April 25, 2025 08:42
@jokasimr
Copy link
Contributor Author

Not sure if we want to make the coord aligned if all instances of the coord in the inputs are aligned, or if we want to make the coord aligned if any instances of the coord in the input are aligned.

When all instances of the coord in the input have the same alignment then all and any are going to produce the same result.

If the different instances of the coord in the input have different alignment, should we prefer to set the coord on the concatenated array aligned or unaligned?

@SimonHeybrock
Copy link
Member

Not sure if we want to make the coord aligned if all instances of the coord in the inputs are aligned, or if we want to make the coord aligned if any instances of the coord in the input are aligned.

If any, because we should never remove and "aligned" flag, since users may rely on it for safety.

When all instances of the coord in the input have the same alignment then all and any are going to produce the same result.

If the different instances of the coord in the input have different alignment, should we prefer to set the coord on the concatenated array aligned or unaligned?

@github-project-automation github-project-automation bot moved this to In progress in Development Board Apr 25, 2025
@jokasimr jokasimr moved this from In progress to Selected in Development Board Apr 25, 2025
@SimonHeybrock
Copy link
Member

SimonHeybrock commented Apr 25, 2025

I think this may have worked correctly before the alignment flag was introduced (when coords meant "aligned"andattrs` meant "unaligned). Please check if the behavior after this fix brings us back to how it worked before: https://github.com/scipp/scipp/pull/3153/files#diff-ab8d4f264378e9dc974da66797785e393da2f84544f571ba8b5a5394c1dffc45. I think it is correct now, as it was?

Edit: Have to correct myself, it looks like it was broken before, since meta was used in the code?

@jokasimr jokasimr enabled auto-merge (squash) April 25, 2025 13:23
@jokasimr jokasimr merged commit 3cfbd32 into main Apr 25, 2025
4 checks passed
@jokasimr jokasimr deleted the concat-alignment branch April 25, 2025 13:28
@github-project-automation github-project-automation bot moved this from Selected to Done in Development Board Apr 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Unaligned coords become aligned in concat

3 participants