perf: Improve performance of hex encoding in spark functions #19586

shashidhar-bm · 2025-12-31T18:02:25Z

Which issue does this PR close?

Part of Review uses of "{b:02x}" to hex-encode bytes #19569

Rationale for this change

Completes the hex encoding optimization work from #19568 by replacing write! format strings with lookup tables in the remaining instances (hex and sha1 functions in spark module).

What changes are included in this PR?

Avoid using write! with a format string and use a more efficient approach

Are these changes tested?

Yes

Are there any user-facing changes?

No.

andygrove

LGTM. Thanks @ShashidharM0118.

getChan · 2026-01-01T02:50:48Z

Have you benchmarked it? I’m curious about the performance impact.

shashidhar-bm · 2026-01-01T09:19:25Z

Have you benchmarked it? I’m curious about the performance impact.

I ran benchmarks for the supported input types (Binary/String for hex, Binary for sha1)

Array (1024 rows)

Function	Input	Before	After	Improvement
`hex`	Binary	1.13 ms	0.22 ms	5.1x
`hex`	String	3.01 ms	0.67 ms	4.5x
`sha1`	Binary	0.71 ms	0.34 ms	2.1x

Scalar (Single Value)

Function	Input	Before	After	Improvement
`hex`	Binary	2.98 µs	1.83 µs	1.6x
`hex`	String	3.13 µs	1.92 µs	1.6x
`sha1`	Binary	2.96 µs	1.90 µs	1.5x

perf: optimize hex encoding using lookup tables

c896204

github-actions bot added the spark label Dec 31, 2025

andygrove approved these changes Jan 1, 2026

View reviewed changes

comphead added this pull request to the merge queue Jan 1, 2026

Merged via the queue into apache:main with commit 9a9ff8d Jan 1, 2026
28 checks passed

shashidhar-bm deleted the optimize-hex-spark branch January 2, 2026 12:05

andygrove mentioned this pull request Jan 2, 2026

Improve performance of sha hashing expressions apache/datafusion-comet#3029

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: Improve performance of hex encoding in spark functions #19586

perf: Improve performance of hex encoding in spark functions #19586

Uh oh!

shashidhar-bm commented Dec 31, 2025

Uh oh!

andygrove left a comment

Uh oh!

getChan commented Jan 1, 2026

Uh oh!

shashidhar-bm commented Jan 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

perf: Improve performance of hex encoding in spark functions #19586

perf: Improve performance of hex encoding in spark functions #19586

Uh oh!

Conversation

shashidhar-bm commented Dec 31, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

andygrove left a comment

Choose a reason for hiding this comment

Uh oh!

getChan commented Jan 1, 2026

Uh oh!

shashidhar-bm commented Jan 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants