Skip to content

Switch UUIDs to UUIDv7#4666

Merged
milosgajdos merged 1 commit into
distribution:mainfrom
binaryfire:main
Aug 8, 2025
Merged

Switch UUIDs to UUIDv7#4666
milosgajdos merged 1 commit into
distribution:mainfrom
binaryfire:main

Conversation

@binaryfire

@binaryfire binaryfire commented Jul 2, 2025

Copy link
Copy Markdown
Contributor

This PR switches UUIDs to UUIDv7. UUIDv7s are time ordered which makes them more efficient to store and query.
Here are some UUIDv4 vs UUIDv7 benchmarks with Postgres:

There's no downside to switching - these are still valid UUIDs so the change is fully backwards compatible. Being able to store things like events with UUIDv7 IDs will be beneficial to everyone.

Also, this makes it easier to change the UUID version in the future.

Closes: #4665

@thaJeztah

thaJeztah commented Jul 4, 2025

Copy link
Copy Markdown
Member

Do you have any information where querying these UUIDs would be relevant and changing would benefit performance? As far as I can see, these UUIDs are not stored anywhere, and only used for OTEL traces, and event IDs (which needed something unique).

Other uses are as part of tests, which just needed a sample value.

I think v7 is even (although very marginal) slower;

package main

import (
	"testing"

	"github.com/google/uuid"
)

func BenchmarkUUIDv4(b *testing.B) {
	b.ReportAllocs()
	for range b.N {
		uuid.NewString()
	}
}

func BenchmarkUUIDv7(b *testing.B) {
	b.ReportAllocs()
	for range b.N {
		uuid.Must(uuid.NewV7()).String()
	}
}
BenchmarkUUIDv4-10    	 3664420	       320.2 ns/op	      64 B/op	       2 allocs/op
BenchmarkUUIDv7-10    	 3247585	       370.2 ns/op	      64 B/op	       2 allocs/op

@binaryfire

binaryfire commented Jul 4, 2025

Copy link
Copy Markdown
Contributor Author

@thaJeztah Good point. The main thing I'm looking for is UUIDv7s for event ids. Reason being we store these in our Postgres db after processing incoming webhooks.

If you check out the links I shared above you'll see that insertion performance with Postgres is 30% better with UUIDv7s vs UUIDv4s (i.e. more or less as fast as bigints):
image

The only reason I replaced all occurences with the v7 logic was for consistency. But another option could be leaving everything as is and just using v7s for event IDs? Since they're probably the only thing people store in their own backends.

@thaJeztah

Copy link
Copy Markdown
Member

Oops sorry for the delay, I missed your last comment!

Reason being we store these in our Postgres db after processing incoming webhooks.

Thanks, with that context, this change makes a lot more sense to me! Sorry if my earlier comment came across bad; mostly trying to avoid code changes for "theoretical cases" - I've run into those in various projects, so context matters!

Based on this, I think the change looks reasonable.

Some quick digging; if my information is correct, UUIDv7 has less entropy (62-74 bit vs 122 bit) - not sure if that matters here, but just in case it's a concern.

The only reason I replaced all occurences with the v7 logic was for consistency. But another option could be leaving everything as is and just using v7s for event IDs? Since they're probably the only thing people store in their own backends.

It's probably fine to keep the changes as-is, even if not strictly needed for all. The only concern I had was the utility package (but mostly from a perspective that this project used to have a uuid package, that others found, and which now became a public API used externally). This PR keeps it inside internal/ so that should not be a concern, other than "it may be tempting for someone to open a PR to make it public", but that's a future concern 😂

cc @milosgajdos - in case you have thoughts

@binaryfire

Copy link
Copy Markdown
Contributor Author

Thanks, with that context, this change makes a lot more sense to me! Sorry if my earlier comment came across bad; mostly trying to avoid code changes for "theoretical cases" - I've run into those in various projects, so context matters!

No worries at all! Totally understand.

Some quick digging; if my information is correct, UUIDv7 has less entropy (62-74 bit vs 122 bit) - not sure if that matters here, but just in case it's a concern.

Yeah there's a bit less entropy because of the timestamp data but they're still considered unique. The chance of collisions is infinitesimally small. The aim with v7 was to solve the index performance issues while maintaining uniqueness. Here's a case study which outlines the real world benefits. A couple of years old but a good read: https://buildkite.com/resources/blog/goodbye-integers-hello-uuids/

@milosgajdos milosgajdos left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im fine with these changes. LGTM. Thanks

PTAL @thaJeztah

@milosgajdos

Copy link
Copy Markdown
Member

Actually @binaryfire mind squashing commits, please

@thaJeztah

Copy link
Copy Markdown
Member

Actually @binaryfire mind squashing commits, please

Ah, yes; LGTM after it's squashed

Signed-off-by: Raj Siva-Rajah <raj@zapzap.cloud>
@binaryfire

binaryfire commented Jul 31, 2025

Copy link
Copy Markdown
Contributor Author

Commits have been squashed.

@milosgajdos milosgajdos requested a review from thaJeztah August 8, 2025 05:19
@milosgajdos

Copy link
Copy Markdown
Member

@thaJeztah PTAL

@thaJeztah thaJeztah left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the nudge

LGTM, thank you!

@milosgajdos milosgajdos merged commit f9fa205 into distribution:main Aug 8, 2025
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Performance improvement: Use UUIDv7 from google/uuid instead of UUIDv4

4 participants