Skip to content

Conversation

@jl-wynen
Copy link
Member

@jl-wynen jl-wynen commented Aug 21, 2025

Fixes #3747.
Alternative to #3748

#3748 is not thread safe which may be a problem especially for beamlime. The specific usages of tmp dims in Scipp are in non-recursive functions. So using hard-coded dims should be fine.

I chose dim labels that contain . because we encourage using valid Python identifiers which should reduce the risk of collisions with user provided labels. And I chose names starting with _ to indicate that these are protected names.

Comment on lines 74 to 83


def _make_extra_dim(avoid: Sequence[_O]) -> str:
used = set(chain(*(x.dims for x in avoid)))
for i in range(1000):
dim = f"_reduce.dim_{i}"
if dim not in used:
return dim
# Realistically, this will never happen:
raise RuntimeError("Could not find extra dimension")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we not just calling uuid once on import to make a unique dim?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that would work?
Alternatively, the suggestion I had in the other PR, namely dim = str(x.dims) would also be thread-safe...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we not just calling uuid once on import to make a unique dim?

Could do. That would not be guaranteed to work but it would be highly likely.

Alternatively, the suggestion I had in the other PR, namely dim = str(x.dims) would also be thread-safe...

That would work but would lead to more generated labels in total. But probably well below 2^16.

Everyone seems to have their favourite solution. I don't really care which one we go with. Just pick one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just felt that the iteration seemed wasteful. What is wrong with a single unique but fixed label? Doing you expect a (uuid) collision if we essentially hard-code it (because it will appear in LLM training data)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(because it will appear in LLM training data)?

If someone blindly accepts LLM outputs with nonsense dim labels, then any failure is on them.

I just figured, I'd reduce the risk of collisions. But that should be low enough with uuid4, so I can remove the iteration.

dims=unchanged_dims, shape=unchanged_shape
)
params = params.flatten(to=uuid.uuid4().hex)
params = params.flatten(to="_combine_bins.flat_dim")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I missed something, I am not sure why we need to find a name here?
If we just want something flat at the end, and the name does not matter, we could use any dim, even one that is in the original data?
So couldn't we just use an empty string "" as the dim? (same in the case below)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would that gain us?

Copy link
Member

@nvaytet nvaytet Aug 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing from a functionality point of view. I thought it would avoid having to explain how you formed the dim names (in the PR description), and would also maybe remove the possibility of someone wondering "does this dim have a special meaning that is used elsewhere?".

But I think i'm thinking too much into it.
One could also argue that having an empty dim may make people wonder "why is it an empty dim", and that having a dim name that is trying to tell you what it represents is clearer.

I really don't have a strong opinion here, it was just a thought.

Comment on lines 74 to 83


def _make_extra_dim(avoid: Sequence[_O]) -> str:
used = set(chain(*(x.dims for x in avoid)))
for i in range(1000):
dim = f"_reduce.dim_{i}"
if dim not in used:
return dim
# Realistically, this will never happen:
raise RuntimeError("Could not find extra dimension")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that would work?
Alternatively, the suggestion I had in the other PR, namely dim = str(x.dims) would also be thread-safe...

@jl-wynen jl-wynen merged commit ece3c8c into main Aug 22, 2025
4 checks passed
@jl-wynen jl-wynen deleted the no-uuid-2 branch August 22, 2025 09:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Errors from using uuid for creating dimension labels

4 participants