-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-54693][CORE][TESTS] Add LZ4TPCDSDataBenchmark #53453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
TL;DR - my test results show lz4-java 1.10.1 is about 10~15% slower on lz4 compression than 1.8.0, and is about 5% slower on lz4 decompression even with migrating to suggested safeDecompressor (#53454) Here is my local test result: lz4-java 1.8.0 with fastDecompressor (state before aa65bda) lz4-java 1.10.1 with fastDecompressor (current master state, significant decompress perf drop!!!) lz4-java 1.10.1 with safeDecompressor (#53454) |
core/src/test/scala/org/apache/spark/io/LZ4TPCDSDataBenchmark.scala
Outdated
Show resolved
Hide resolved
core/src/test/scala/org/apache/spark/io/LZ4TPCDSDataBenchmark.scala
Outdated
Show resolved
Hide resolved
core/src/test/scala/org/apache/spark/io/LZ4TPCDSDataBenchmark.scala
Outdated
Show resolved
Hide resolved
core/src/test/scala/org/apache/spark/io/LZ4TPCDSDataBenchmark.scala
Outdated
Show resolved
Hide resolved
…21, Scala 2.13, split 1 of 1)
…17, Scala 2.13, split 1 of 1)
… (JDK 21, Scala 2.13, split 1 of 1)
… (JDK 17, Scala 2.13, split 1 of 1)
|
@LuciferYang, I did a refactor to make |
| /** | ||
| * Any code before running any benchmark, e.g., data preparation | ||
| */ | ||
| def beforeAll(): Unit = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My suggestion is not to add the beforeAll() method in the current pr If we find it to be widely applicable, we can add this method in a subsequent pr and also extract beforeAll() for more microbenchmark scenarios.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed in f02e559
|
Merged into master. Thanks @pan3793 |
|
Although I didn't get a chance to take a look at the detail, thank you so much for adding additional benchmark, @pan3793 and @LuciferYang ! |
What changes were proposed in this pull request?
Add
LZ4TPCDSDataBenchmark, testLZ4CompressionCodecagainst TPCDScatalog_sales.dat(SF1), the size is about 283M.Why are the changes needed?
Add a benchmark to measure the perf impact of lz4 security upgrading.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Added benchmark result. Since the change has a refactor that touched
ZSTDTPCDSDataBenchmark, also updated its benchmark results.Was this patch authored or co-authored using generative AI tooling?
No.