Skip to content

Store ignored source in binary doc values#143784

Merged
Kubik42 merged 8 commits intoelastic:mainfrom
Kubik42:store_synthetic_source_as_binary_dv
Mar 18, 2026
Merged

Store ignored source in binary doc values#143784
Kubik42 merged 8 commits intoelastic:mainfrom
Kubik42:store_synthetic_source_as_binary_dv

Conversation

@Kubik42
Copy link
Copy Markdown
Contributor

@Kubik42 Kubik42 commented Mar 7, 2026

No description provided.

@Kubik42 Kubik42 force-pushed the store_synthetic_source_as_binary_dv branch 2 times, most recently from e9f7d99 to e258a5d Compare March 9, 2026 17:01
@Kubik42
Copy link
Copy Markdown
Contributor Author

Kubik42 commented Mar 9, 2026

Buildkite benchmark this with elastic-logs-streams

@Kubik42
Copy link
Copy Markdown
Contributor Author

Kubik42 commented Mar 10, 2026

Hmm, all the benchmarks match what we saw earlier, with the exception of dataset size. Somehow, for the contender we went up from ~63Gb -> 83Gb.
This increase is coming from doc_values. This is probably related to changing the block size. I'll run another benchmark to confirm.

@Kubik42
Copy link
Copy Markdown
Contributor Author

Kubik42 commented Mar 10, 2026

Buildkite benchmark this with elastic-logs-streams

@Kubik42 Kubik42 force-pushed the store_synthetic_source_as_binary_dv branch from f427cf0 to 7e956bb Compare March 11, 2026 07:45
@Kubik42
Copy link
Copy Markdown
Contributor Author

Kubik42 commented Mar 11, 2026

Hmm, all the benchmarks match what we saw earlier, with the exception of dataset size. Somehow, for the contender we went up from ~63Gb -> 83Gb. This increase is coming from doc_values. This is probably related to changing the block size. I'll run another benchmark to confirm.

I confirmed this in the latest benchmark run.

@Kubik42 Kubik42 force-pushed the store_synthetic_source_as_binary_dv branch 2 times, most recently from f825723 to ae29e1f Compare March 11, 2026 23:00
@@ -0,0 +1,97 @@
/*
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've extracted all ignored source formats into separate classes as IgnoredSourceFieldMapper was getting unwieldy and difficult to read.

@Kubik42 Kubik42 force-pushed the store_synthetic_source_as_binary_dv branch 2 times, most recently from d93260b to a75fe13 Compare March 11, 2026 23:37
@Kubik42 Kubik42 marked this pull request as ready for review March 11, 2026 23:41
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@Kubik42 Kubik42 requested a review from a team as a code owner March 12, 2026 20:08
@Kubik42 Kubik42 force-pushed the store_synthetic_source_as_binary_dv branch from 4f89d21 to c452ee8 Compare March 12, 2026 21:53
@Kubik42
Copy link
Copy Markdown
Contributor Author

Kubik42 commented Mar 13, 2026

Buildkite benchmark this with elastic-logs-streams

@elasticmachine
Copy link
Copy Markdown
Collaborator

elasticmachine commented Mar 13, 2026

💚 Build Succeeded

This build ran two elastic-logs-streams benchmarks to evaluate performance impact of this PR.

History

Copy link
Copy Markdown
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Kubik42! This looks good, butI think we can simplify the IgnoredSourceLeafLoader abstraction.

@Kubik42 Kubik42 force-pushed the store_synthetic_source_as_binary_dv branch from 3fd0177 to 5891623 Compare March 16, 2026 21:51
…re using binary doc values for ignored source
@Kubik42 Kubik42 force-pushed the store_synthetic_source_as_binary_dv branch from 5891623 to 072bcf3 Compare March 17, 2026 02:39
@Kubik42 Kubik42 force-pushed the store_synthetic_source_as_binary_dv branch from 63b942c to b61ecd4 Compare March 17, 2026 05:03
Copy link
Copy Markdown
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left another question.

Also org.elasticsearch.index.mapper.BlockSourceReaderTests.testArray(...) needs to be updated with the fact that another ignore source format is being used now.

// This path is filtered by the include/exclude rules
continue;
}
objectsWithIgnoredFields.computeIfAbsent(nameValue.getParentFieldName(), k -> new ArrayList<>()).add(nameValue);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we now sometimes need to use nameValue.name() instead of nameValue.getParentFieldName()? Given that the other loadSingleIgnoredField(...) no longer exist, which would do that. I assumed we would need to add a boolean parameter to this method in order to do this. But maybe it isn't needed?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided not to add a boolean parameter as it felt like a hack to get loadIgnoredFields() to behave like loadSingleIgnoredField().

Instead, this is handled here. This is in the function that originally called loadSingleIgnoredField().

);

// index version alone is not enough, TSDB doc values setting must also be enabled.
assertNotEquals(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe assert what format would be expected?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the format is changed in the future, that would mean we'd have a failing test that needs to be updated. Ultimately, we just want to know that DOC_VALUES_IGNORED_SOURCE is not used when the criteria for it is not met. Whats used in its place is not important in the context of these tests.

@jordan-powers
Copy link
Copy Markdown
Contributor

Since you're adding a feature flag, you'll probably also want to add the >test-release tag to this PR to make sure we aren't breaking release tests with the FF off.

@martijnvg martijnvg added the test-release Trigger CI checks against release build label Mar 18, 2026
Copy link
Copy Markdown
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Kubik42, LGTM!

@Kubik42 Kubik42 merged commit 0ba0f35 into elastic:main Mar 18, 2026
35 of 36 checks passed
@Kubik42 Kubik42 deleted the store_synthetic_source_as_binary_dv branch March 18, 2026 18:31
michalborek pushed a commit to michalborek/elasticsearch that referenced this pull request Mar 23, 2026
* Introduce a new ignored source format that uses binary doc values

# Conflicts:
#	server/src/main/java/org/elasticsearch/index/IndexVersions.java

* Force enable index.use_time_series_doc_values_format_large_binary_block_size

* Revert "Force enable index.use_time_series_doc_values_format_large_binary_block_size"

This reverts commit c452ee8.

* Addressed feedback

* Addressed feedback: check that TSDB doc values format is enabled before using binary doc values for ignored source

* Addressed feedback: merge signle value loading and all value loading into one function

* Fix test failure, make FilteredIgnoredSourceDocValues final static

* Renamed LegacyIgnoredSourceEncoding to SingularIgnoredSourceEncoding, added javadocs, added feature flag

---------

Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>enhancement :StorageEngine/Mapping The storage related side of mappings Team:StorageEngine test-release Trigger CI checks against release build v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants