Skip to content

ESQL: Data sources: ZSTD, BZIP2#143228

Merged
bpintea merged 3 commits intoelastic:mainfrom
bpintea:esql/ds/zstd_bzip2
Feb 27, 2026
Merged

ESQL: Data sources: ZSTD, BZIP2#143228
bpintea merged 3 commits intoelastic:mainfrom
bpintea:esql/ds/zstd_bzip2

Conversation

@bpintea
Copy link
Copy Markdown
Contributor

@bpintea bpintea commented Feb 27, 2026

Add two more decompression codec plugins.
Update the tests so all the fixture (compressed) files are generated on the fly.

🤖 Developed AI-assisted.

Add two more decompression codec plugins.
Update the tests so all the fixture files are generated on the fly.
@bpintea bpintea requested a review from costin February 27, 2026 11:01
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Feb 27, 2026
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Hi @bpintea, I've created a changelog YAML for you.

@bpintea bpintea enabled auto-merge (squash) February 27, 2026 12:39
@bpintea bpintea disabled auto-merge February 27, 2026 12:52
@bpintea bpintea enabled auto-merge (squash) February 27, 2026 12:52
Copy link
Copy Markdown
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review comments

usesDefaultDistribution("to be triaged")
maxParallelForks = 1
enabled = buildParams.snapshotBuild

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this removal intentional? This makes javaRestTest run on all builds, not just snapshots. If that's the goal, great, just want to make sure it's not accidental since it's unrelated to the compression changes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was intentional, but wrong. :)
It was intending to align the testing with the other modules, but look at the wrong gradle file.

Added back in #143236

DecompressingStorageObject decompressing = new DecompressingStorageObject(rawObject, codec);
try (InputStream stream = decompressing.newStream()) {
byte[] decompressed = stream.readAllBytes();
assertArrayEquals(original, decompressed);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's bzip2 coverage here but no zstd equivalent. Would be good to add a testDecompressStreamZstd() as well so all three codecs are tested at this level.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in #143236


// Compression libs for generating .gz, .zst, .bz2 fixtures on the fly
implementation "com.github.luben:zstd-jni:1.5.7-6"
implementation "org.apache.commons:commons-compress:1.26.1"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These versions (1.5.7-6, 1.26.1) are duplicated from the plugin build.gradle files. Worth centralizing them so they don't drift, maybe a shared versions block or reference the plugin's own version declaration.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in #143236


tasks.named("thirdPartyAudit").configure {
ignoreMissingClasses()
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blanket ignoreMissingClasses() can mask real problems. Can we list the specific classes instead? commons-compress pulls in optional deps like org.tukaani.xz.*, org.brotli.* etc., just ignore those explicitly.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in #143236

@Override
public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) throws IOException {
String name = file.getFileName().toString();
if (COMPRESSED_EXTENSIONS.stream().anyMatch(name::endsWith)) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: prefer a plain for loop here instead of .stream().anyMatch(), keeps it consistent with the codebase style.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, using streams is often time too heavy in hot paths, but this this is for the testing fixtures only (and streams are heavily used in the file :) )

Copy link
Copy Markdown
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@bpintea bpintea merged commit 7875a1a into elastic:main Feb 27, 2026
35 checks passed
@bpintea bpintea deleted the esql/ds/zstd_bzip2 branch February 27, 2026 14:19
bpintea added a commit to bpintea/elasticsearch that referenced this pull request Feb 27, 2026
bpintea added a commit to bpintea/elasticsearch that referenced this pull request Feb 27, 2026
bpintea added a commit to bpintea/elasticsearch that referenced this pull request Feb 27, 2026
bpintea added a commit to bpintea/elasticsearch that referenced this pull request Feb 27, 2026
szybia added a commit to szybia/elasticsearch that referenced this pull request Feb 27, 2026
…cations

* upstream/main:
  Warn on API key version mismatch (elastic#143127)
  Fixed wrong malformed value ordering in synthetic source tests (elastic#143187)
  [ML] Fix: required_native_memory_bytes Calculated with Wrong Allocation Count (elastic#143077)
  Add configureBenchmarkLogging calls across the various benchmarks (elastic#143185)
  Mute org.elasticsearch.xpack.esql.CsvIT test {csv-spec:k8s-timeseries-avg-over-time.Avg_over_time_aggregate_metric_double_implicit_casting} elastic#143292
  Give system role permission to invoke shard refresh (elastic#143190)
  Mute testSyntheticSourceWithTranslogSnapshot (elastic#143260)
  Adds ResumeInfo Tests (elastic#142769)
  Use a static method to configure benchmark logging (elastic#143056)
  add connectors release notes (elastic#142884)
  Add CI triage guidance for AI agents (elastic#142994)
  ESQL: Data sources: ZSTD, BZIP2 (elastic#143228)
  [ES|QL] Channels issue when an agg is called with the same field (elastic#142180) (elastic#142269)
  Add support for project routing in reindex requests (elastic#142240)
tballison pushed a commit to tballison/elasticsearch that referenced this pull request Mar 3, 2026
Add two more decompression codec plugins.
Update the tests so all the fixture (compressed) files are generated on the fly.

🤖 Developed AI-assisted.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >feature Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants