Skip to content

Generate deterministic ids when formatting notebooks#9359

Merged
zanieb merged 7 commits intomainfrom
zb/notebook-id-fmt
Jan 4, 2024
Merged

Generate deterministic ids when formatting notebooks#9359
zanieb merged 7 commits intomainfrom
zb/notebook-id-fmt

Conversation

@zanieb
Copy link
Copy Markdown
Member

@zanieb zanieb commented Jan 2, 2024

When formatting notebooks, we populate the id field for cells that do not have one. Previously, we generated a UUID v4 which resulted in non-deterministic formatting. Here, we generate the UUID from a seeded random number generator instead of using true randomness. For example, here are the first five ids it would generate:

7fb27b94-1602-401d-9154-2211134fc71a
acae54e3-7e7d-407b-bb7b-55eff062a284
9a63283c-baf0-4dbc-ab1f-6479b197f3a8
8dd0d809-2fe7-4a7c-9628-1538738b07e2
72eea511-9410-473a-a328-ad9291626812

We also add a check that an id is not present in another cell to prevent accidental introduction of duplicate ids.

The specification is lax, and we could just use incrementing integers e.g. 0, 1, ... but I have a minor preference for retaining the UUID format. Some discussion here — I'm happy to go either way though.

Discovered via #9293

Comment thread crates/ruff_notebook/src/notebook.rs Outdated
Comment thread crates/ruff_notebook/src/notebook.rs Outdated
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Jan 2, 2024

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

@zanieb zanieb force-pushed the zb/notebook-id-fmt branch from 411f988 to f4edaa6 Compare January 3, 2024 02:28
@zanieb zanieb marked this pull request as ready for review January 3, 2024 03:38
@zanieb zanieb requested review from dhruvmanila and konstin January 3, 2024 16:40
Comment thread crates/ruff_notebook/src/notebook.rs Outdated
// https://github.com/jupyter/enhancement-proposals/blob/master/62-cell-id/cell-id.md#questions
if raw_notebook.nbformat == 4 && raw_notebook.nbformat_minor >= 5 {
// We use a mock random number generator to generate deterministic uuids
let mut rng = rand::rngs::mock::StepRng::new(0, 1);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use a seeded number generator rather than the StepRng one, so that the IDs are deterministic but look random rather than structured as they do now? Or was this the only option for deterministic UUIDs?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure there's another option for a seeded number generator, this one was just the most obvious way I saw.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure we can use the StdRng seeded with 0

7fb27b94-1602-401d-9154-2211134fc71a
acae54e3-7e7d-407b-bb7b-55eff062a284
9a63283c-baf0-4dbc-ab1f-6479b197f3a8
8dd0d809-2fe7-4a7c-9628-1538738b07e2
72eea511-9410-473a-a328-ad9291626812

@zanieb zanieb added the formatter Related to the formatter label Jan 3, 2024
Copy link
Copy Markdown
Member

@konstin konstin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd go with natural numbers for simplicity but it doesn't matter much since the users shouldn't see that id anyway.

@zanieb zanieb merged commit aaa0097 into main Jan 4, 2024
@zanieb zanieb deleted the zb/notebook-id-fmt branch January 4, 2024 15:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

formatter Related to the formatter

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants