Avoid making many dimension labels #3748

jl-wynen · 2025-08-20T06:52:44Z

This still generates new dimension labels. But for it to lead to using too many labels, someone would have to pass the same variable through 2^16 nested calls of functions using new_dim_for. That seems highly unlikely.

Fixes #3747.

nvaytet · 2025-08-20T06:56:17Z

src/scipp/core/dimensions.py

+_USED_AUX_DIMS = []
+
+
+def new_dim_for(*data: Variable | DataArray) -> str:


Instead of uuid, can we simply use something like str(x.dims)?

Sure, but what does that gain us? If the same dim gets reused for another variable later on, str(x.dims) for the previous x no longer has any meaning.

but what does that gain us

Avoid having to store _USED_AUX_DIMS?
As far as I understand, then new dims are created locally and thrown away immediately after? (maybe I missed something?)

If the same dim gets reused for another variable later on, str(x.dims) for the previous x no longer has any meaning.

I didn't get that part. Can you explain maybe with an example?

My implementation stores all labels that it generates and tries to reuse them whenever possible. So the dims are not thrown away but kept for later, sort of like the underlying mechanism keeps the dim ids around as well.

If we use str(x.dims), then we actually throw away the dim after using it. But the id will stick around. That is fine if we generate the dim repeatedly for inputs that have the same dims. But if the input dims change, then we generate a new dim. My proposed implementation does not.

I didn't get that part. Can you explain maybe with an example?

I thought you wanted to keep the result of str(x.dims) in _USED_AUX_DIMS and reuse it. In that case, you might, e.g., generate the dim from a var with ['x', 'y'] and get 'x-y' but then later use that dim for another variable with dims ['t']. So the temporary label would bear no relation to that new variable.
I initially thought that you wanted to do this to get more understandable dim labels in which case, this could lead to confusion. But ultimately, it does not matter.

I thought you wanted to keep the result of str(x.dims) in _USED_AUX_DIMS and reuse it.

Yeah no I was thnking of doing the same kind of fix I did in Tof.

Yes, I understood now. But I think that this still ultimately leads to more allocated dim labels. Not 2^16 but still more than needed.

I like the approach to store and reuse all the uuids, but maybe that is just because I had the same idea. 😬

Can we document and make this part of the public API? Downstream packages may want to and should use this, if we put in the effort to have a proven and tested solution.

This breaks an import cycle.

nvaytet · 2025-08-20T09:41:48Z

src/scipp/core/dimensions.py

Can you make a few unit tests (e.g. that it works with scalar, Variable, and DataArray)?

SimonHeybrock · 2025-08-20T10:01:16Z

Include new_dim_for in the API docs?

SimonHeybrock · 2025-08-20T11:08:31Z

src/scipp/core/dimensions.py

+    """Return a dimension label that is not in the input's dimensions.
+
+    The returned label is intended for temporarily reshaping an array and should
+    not become visible to users.
+    The label is guaranteed to not be present in the input, but it may be used in
+    other variables.


Maybe we should explain here why using uuid or other generated labels is problematic, i.e., why one should use this function?

jokasimr · 2025-08-20T12:07:35Z

src/scipp/core/dimensions.py

+    :
+        A dimension label that is not in any variable or data array in ``data``.
+    """
+    used = {*(x.dims for x in data)}


I don't understand this.
From what I can tell:

used is a set of tuples

_USED_AUX_DIMS is a list of strings

therefore dim not in used will always evaluate to True

I imagine that is not the intended behavior?

You are right, this does not unpack the dims properly. I updated it to use chain.

Do we have a test that checks if the problem is resolved? Maybe something like the experiment Celine did

No. How would you do that? We can make 2**16 variables using

def test_new_dim_reuses_labels() -> None: # Scipp supports up to 2**16 different dim labels. # Make more arrays than that to test that new_dim_for reuses labels. for _ in range(2**16 + 10): dim = sc.new_dim_for(sc.arange('x', 1000)) # This should not raise: sc.array(dims=[dim], values=[1, 2, 3])

But that test alone takes 7s with a debug build.

We cannot simply check that _USED_AUX_DIMS does not grow during a given test because it may be used concurrently by a test on a different thread.

I can get it down to 3s by pulling sc.arange('x', 1000) out of the loop. Is this acceptable?

Hmm, _USED_AUX_DIMS is not thread save, right? Does this mean that, e.g., using Scipp with Dask can break badly?

I can get it down to 3s ... Is this acceptable?

Yeah, seems acceptable to me. But if we're confident the implementation works as expected and is unlikely to break we don't necessarily need to run it every test run. I think it's fine to only run expensive tests more rarely.

jl-wynen requested a review from nvaytet August 20, 2025 06:52

nvaytet reviewed Aug 20, 2025

View reviewed changes

jl-wynen force-pushed the no-uuid branch from 3fbe15a to b08addd Compare August 20, 2025 09:15

jl-wynen added 2 commits August 20, 2025 11:21

Avoid making many dimension labels

1850c9f

Import bins in function

01d464d

This breaks an import cycle.

jl-wynen force-pushed the no-uuid branch from b08addd to 01d464d Compare August 20, 2025 09:27

Export new_dim_for at top level

89f42e1

nvaytet reviewed Aug 20, 2025

View reviewed changes

Add tests for new_dim_for

b686fdf

Add new_dim_for to API ref

d23e880

SimonHeybrock reviewed Aug 20, 2025

View reviewed changes

jokasimr mentioned this pull request Aug 20, 2025

fix: remove uuid usage, replace by fixed aux dimension names #3749

Closed

jokasimr reviewed Aug 20, 2025

View reviewed changes

Correctly build set

2770e34

jl-wynen force-pushed the no-uuid branch from 9c3dd0b to d5675a4 Compare August 20, 2025 13:22

Explain why new_dim_for is good

32ce770

jl-wynen force-pushed the no-uuid branch from d5675a4 to 32ce770 Compare August 20, 2025 13:39

jl-wynen mentioned this pull request Aug 21, 2025

Make tmp dims with hard-coded strings instead of generator #3750

Merged

jl-wynen closed this Aug 22, 2025

		_USED_AUX_DIMS = []


		def new_dim_for(*data: Variable \| DataArray) -> str:

Avoid making many dimension labels #3748

Avoid making many dimension labels #3748

Uh oh!

Conversation

jl-wynen commented Aug 20, 2025 • edited by SimonHeybrock Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SimonHeybrock commented Aug 20, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jokasimr Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jl-wynen commented Aug 20, 2025 •

edited by SimonHeybrock

Loading

jokasimr Aug 20, 2025 •

edited

Loading