Skip to content

Avoid reading entire bloom filter file on reader open#139374

Merged
fcofdez merged 3 commits intoelastic:mainfrom
fcofdez:bloom-filter-retrieve-checksum
Jan 7, 2026
Merged

Avoid reading entire bloom filter file on reader open#139374
fcofdez merged 3 commits intoelastic:mainfrom
fcofdez:bloom-filter-retrieve-checksum

Conversation

@fcofdez
Copy link
Copy Markdown
Contributor

@fcofdez fcofdez commented Dec 11, 2025

Use retrieveChecksum() instead of checksumEntireFile() in ES93BloomFilterStoredFieldsFormat.
While individual bloom filter files are small, checksumming on every reader open adds up across
many segments.

We can safely defer verification since files are checksummed during merges.

Also switch to using the slice reader for merges since it's already positioned correctly in the file.

Use retrieveChecksum() instead of checksumEntireFile() in
ES93BloomFilterStoredFieldsFormat. While individual bloom filter files
are small, checksumming on every reader open adds up across many segments.

We can safely defer verification since files are checksummed during merges.
Also switch to using the slice reader for merges since it's already
positioned correctly in the file.
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

Copy link
Copy Markdown
Member

@tlrx tlrx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@fcofdez
Copy link
Copy Markdown
Contributor Author

fcofdez commented Jan 7, 2026

@elasticmachine update branch

@fcofdez
Copy link
Copy Markdown
Contributor Author

fcofdez commented Jan 7, 2026

@elasticmachine update branch

@fcofdez fcofdez merged commit db2fb44 into elastic:main Jan 7, 2026
35 checks passed
szybia added a commit to szybia/elasticsearch that referenced this pull request Jan 7, 2026
* upstream/main: (191 commits)
  Overall Decision for Deciders prioritizes THROTTLE (elastic#140237)
  Apply group by all logic not only to top-level aggregates (elastic#140248)
  [ES|QL] Refactor MV_UNION and MV_INTERSECTION to use shared set operation helper (elastic#139982)
  Avoid reading entire bloom filter file on reader open (elastic#139374)
  Mark bloom filter files for random access (elastic#139375)
  Ensure that the buffer used for ES93BloomFilterStoredFieldsFormat is zeroed (elastic#139034)
  Add busy assertion to avoid race condition for testStalledShardMigrationProperlyDetected (elastic#140230)
  Remove line number check for testTransitiveFindsDeepCallChain (elastic#140228)
  Allow a slight difference in rescored docs (elastic#139931)
  Mute org.elasticsearch.xpack.inference.integration.AuthorizationTaskExecutorIT testCreatesEisChatCompletion_DoesNotRemoveEndpointWhenNoLongerAuthorized elastic#138480
  Start exchange sink fetchers concurrently (elastic#140196)
  Allow allocation to replacement target node on vacate completion (elastic#140150)
  Ignore JNA cleaner threads in SecureHdfsRepositoryAnalysisRestIT (elastic#139925)
  DeterministicQueue refactor and enhancement (elastic#140151)
  Always error out if CCS expression shows up when CCS is not supported (elastic#139009)
  Use IllegalArgumentException over RepositoryException for readonly-repository checks (elastic#140200)
  Guard promql capabilities in AnalyzerTests (elastic#140232)
  [Inference API] Fix flaky AuthorizationTaskExecutorIT tests (elastic#139978)
  Cleaning up exitable vector value impls (elastic#140190)
  [Inference API] Fix auth exception listener not called bug (elastic#139966)
  ...
sidosera pushed a commit to sidosera/elasticsearch that referenced this pull request Jan 7, 2026
Use retrieveChecksum() instead of checksumEntireFile() in ES93BloomFilterStoredFieldsFormat.
While individual bloom filter files are small, checksumming on every reader open adds up across
many segments.

We can safely defer verification since files are checksummed during merges.

Also switch to using the slice reader for merges since it's already positioned correctly in the file.

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants