Skip to content

Conversation

@shashidhar-bm
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

Completes the hex encoding optimization work from #19568 by replacing write! format strings with lookup tables in the remaining instances (hex and sha1 functions in spark module).

What changes are included in this PR?

Avoid using write! with a format string and use a more efficient approach

Are these changes tested?

Yes

Are there any user-facing changes?

No.

@github-actions github-actions bot added the spark label Dec 31, 2025
Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @ShashidharM0118.

@getChan
Copy link
Contributor

getChan commented Jan 1, 2026

Have you benchmarked it? I’m curious about the performance impact.

@shashidhar-bm
Copy link
Contributor Author

Have you benchmarked it? I’m curious about the performance impact.

I ran benchmarks for the supported input types (Binary/String for hex, Binary for sha1)

Array (1024 rows)

Function Input Before After Improvement
hex Binary 1.13 ms 0.22 ms 5.1x
hex String 3.01 ms 0.67 ms 4.5x
sha1 Binary 0.71 ms 0.34 ms 2.1x

Scalar (Single Value)

Function Input Before After Improvement
hex Binary 2.98 µs 1.83 µs 1.6x
hex String 3.13 µs 1.92 µs 1.6x
sha1 Binary 2.96 µs 1.90 µs 1.5x

@comphead comphead added this pull request to the merge queue Jan 1, 2026
Merged via the queue into apache:main with commit 9a9ff8d Jan 1, 2026
28 checks passed
@shashidhar-bm shashidhar-bm deleted the optimize-hex-spark branch January 2, 2026 12:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants