Add ignored field values to synthetic source#107567
Conversation
|
Hi @kkrik-es, I've created a changelog YAML for you. |
…o fix/synthetic-source/object
|
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
lkts
left a comment
There was a problem hiding this comment.
Overall LGTM, i have left some suggestions. One note i have is that it feels kind of incidental that IgnoredValuesFieldMapper is actually a mapper. I wonder if we can implement it as some kind of helper instead and it would be simpler. I don't have a good suggestion here though.
...pi-spec/src/yamlRestTest/resources/rest-api-spec/test/indices.create/20_synthetic_source.yml
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/FieldDataParseHelper.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/FieldDataParseHelper.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/IgnoredValuesFieldMapper.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/IgnoredValuesFieldMapper.java
Outdated
Show resolved
Hide resolved
# Conflicts: # server/src/main/java/module-info.java # server/src/main/resources/META-INF/services/org.elasticsearch.features.FeatureSpecification
| int additionalFieldsToAdd = getNewFieldsSize() + mapperSize; | ||
| if (indexSettings().isIgnoreDynamicFieldsBeyondLimit()) { | ||
| if (mappingLookup.exceedsLimit(indexSettings().getMappingTotalFieldsLimit(), additionalFieldsToAdd)) { | ||
| if (indexSettings().getMode().isSyntheticSourceEnabled() || SourceFieldMapper.isSynthetic(mappingLookup)) { |
There was a problem hiding this comment.
Is the idea to only apply ignored values for when maximum number of fields have been exceeded in this pr? Or also doing this for when for example object field has been disabled?
There was a problem hiding this comment.
I'd say we add more cases incrementally, to keep this one short.
| return (SourceFieldMapper) in; | ||
| } | ||
|
|
||
| public static boolean isSynthetic(MappingLookup mappingLookup) { |
There was a problem hiding this comment.
I don't think this is needed? I think mappingLookup.isSourceSynthetic() can used instead?
server/src/test/java/org/elasticsearch/index/mapper/DocCountFieldMapperTests.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/IgnoredValuesFieldMapper.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/FieldDataParseHelper.java
Outdated
Show resolved
Hide resolved
| new IgnoredValuesFieldMapper.Values(mapper.name(), parentOffset, FieldDataParseHelper.encodeToken(parser())) | ||
| ); | ||
| } catch (IOException e) { | ||
| throw new IllegalArgumentException("failed to parse field [" + mapper.name() + " ]", e); |
There was a problem hiding this comment.
Switched to throwing an exception here, fyi.
server/src/main/java/org/elasticsearch/index/mapper/IgnoredValuesFieldMapper.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/IgnoredValuesFieldMapper.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/index/mapper/DocCountFieldMapperTests.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/IgnoredValuesFieldMapper.java
Outdated
Show resolved
Hide resolved
|
|
||
| @Override | ||
| public void postParse(DocumentParserContext context) { | ||
| for (Values values : context.getIgnoredFieldValues()) { |
There was a problem hiding this comment.
maybe add an assert here that check that if synthetic source is disabled then there are no ignored field values?
Something like context.mappingLookup().isSourceSynthetic() || (context.mappingLookup().isSourceSynthetic() == false && context.getIgnoredFieldValues().isEmpty())
| * This overlaps with {@link IgnoredFieldMapper} that tracks just the ignored field names. It's worth evaluating | ||
| * if we can replace it for all use cases to avoid duplication, assuming that the storage tradeoff is favorable. | ||
| */ | ||
| public class IgnoredValuesFieldMapper extends MetadataFieldMapper { |
There was a problem hiding this comment.
Given that the purpose of the class is to store field values that would be ignored if synthetic source is enabled, maybe IgnoredSyntheticSourceValues is a better name?
There was a problem hiding this comment.
Ignored fields and values are only used for synthetic source, by definition. For instance, IgnoreMalformedStoredValues and IgnoredFieldMapper don't mention synthetic source, even though they're tied to it. I'd say we leave it as is, for simplicity and consistency?
There was a problem hiding this comment.
IgnoredFieldMapper can be used outside the context of synthetic source. Also when source is stored.
This enumerates the fields that have not been indexed, but are available in the source.
IgnoreMalformedStoredValues isn't a field mapper, but more of a helper class to deal reading/writing malformed field values. It is only used in the context of synthetic source. I think MalformedSyntheticSourceValues is a better name for this class.
There was a problem hiding this comment.
It seems to me that field mappers can generally support synthetic source. Adding that to the class name feels leaky; how the values get used is up to the callers of this class. I'm also not a big fan of longer names unless they really help with disambiguation.
There was a problem hiding this comment.
Another attempt :), what about IgnoredSourceFieldMapper? It is shorter than the other name proposal, and ties it to source, this field mapper keeps track of pieces of the _source that ended being ignored.
server/src/main/java/org/elasticsearch/index/mapper/IgnoredValuesFieldMapper.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/index/mapper/DocCountFieldMapperTests.java
Show resolved
Hide resolved
| } | ||
|
|
||
| public void trackObjectsWithIgnoredFields() { | ||
| if (values == null || values.isEmpty()) { |
There was a problem hiding this comment.
Maybe just use null for signaling that nothing needs to happen? That way the values field doesn't need to be initialized with en empty list and there is nu need to set values to null here at line 158?
server/src/main/java/org/elasticsearch/index/mapper/IgnoredValuesFieldMapper.java
Outdated
Show resolved
Hide resolved
…o fix/synthetic-source/object
martijnvg
left a comment
There was a problem hiding this comment.
Thanks for iterating here. I think it is getting close.
| import java.nio.charset.StandardCharsets; | ||
| import java.util.Arrays; | ||
|
|
||
| /** |
There was a problem hiding this comment.
The FieldData part of the name of this class is confusing with the field data abstraction that was use in search for scripting, sorting and aggregations. Maybe XContentDataHelper is a better name?
There was a problem hiding this comment.
Also I think we should make this class package protected. Its only users are in the org.elasticsearch.index.mapper package.
| byte[] nameBytes = values.name.getBytes(StandardCharsets.UTF_8); | ||
| byte[] bytes = new byte[4 + nameBytes.length + values.value.length]; | ||
| ByteUtils.writeIntLE(values.name.length() + PARENT_OFFSET_IN_NAME_OFFSET * values.parentOffset, bytes, 0); | ||
| System.arraycopy(nameBytes, 0, bytes, 4, nameBytes.length); |
There was a problem hiding this comment.
It would be nice if we could reuse the name from _ignored doc values field (Salvatore's pr will store it as doc values instead of stored fields). We end up storing it in _ignored too if number of fields is exceeded.
I think this is tricky, because then we don't know which value from _ignored field belongs to a value from_ignored_values field. In case of multi values for the same field, doc values store in alphabetic order.
There was a problem hiding this comment.
Yeah there's certain duplication here.. I added a note about this in the javadoc. I think this is good for now, let's get some mileage for this and optimize if we find out it's an issue in practice - ignored fields should be the exception after initial setup..
There was a problem hiding this comment.
Agreed, we can keep this in the back of our minds, and consider solutions to reduce storage for the two meta fields.
| * This overlaps with {@link IgnoredFieldMapper} that tracks just the ignored field names. It's worth evaluating | ||
| * if we can replace it for all use cases to avoid duplication, assuming that the storage tradeoff is favorable. | ||
| */ | ||
| public class IgnoredValuesFieldMapper extends MetadataFieldMapper { |
There was a problem hiding this comment.
Another attempt :), what about IgnoredSourceFieldMapper? It is shorter than the other name proposal, and ties it to source, this field mapper keeps track of pieces of the _source that ended being ignored.
martijnvg
left a comment
There was a problem hiding this comment.
Left a few minor comments, LGTM otherwise.
| // (N % PARENT_OFFSET_IN_NAME_OFFSET) | ||
| private static final int PARENT_OFFSET_IN_NAME_OFFSET = 1 << 16; | ||
|
|
||
| public static final String NAME = "_ignored_values"; |
There was a problem hiding this comment.
Also rename _ignored_values to _ignored_source?
|
|
||
| public static final TypeParser PARSER = new FixedTypeParser(context -> INSTANCE); | ||
|
|
||
| static final NodeFeature TRACK_IGNORED_VALUES = new NodeFeature("mapper.track_ignored_values"); |
There was a problem hiding this comment.
and rename this constant as well?
server/src/test/java/org/elasticsearch/index/mapper/XContentDataHelperTests.java
Outdated
Show resolved
Hide resolved
…83110) ## Summary Fixes #182837. Fixes #182514. The number of fields returned by the field caps API is different across ES versions in forward compatibility tests. In ES 8.15.0, the `_ignored_source` field was added (elastic/elasticsearch#107567). This fixes the API integration test for field caps to assert the correct number of fields across versions. Note that in Kibana `8.15` we refactored away from using fields caps directly in this way and removed the corresponding API endpoint and tests (#182588), that's why there's this dedicated `7.17` PR to just fix the assertions on the existing test. To test this locally, the following commands for the functional tests server and runner can be used to run the tests in different forward compatibility scenarios: ``` # 7.17 tests server node scripts/functional_tests_server.js --config x-pack/test/api_integration/config.ts # 7.17 tests runner node scripts/functional_test_runner --config x-pack/test/api_integration/config.ts # 8.14 tests server ES_SNAPSHOT_MANIFEST="https://storage.googleapis.com/kibana-ci-es-snapshots-daily/8.14.0/manifest-latest-verified.json" node scripts/functional_tests_server.js # 8.14 tests runner node scripts/functional_test_runner --config x-pack/test/api_integration/config.ts --es-version=8.14.0-SNAPSHOT # 8.15 tests server ES_SNAPSHOT_MANIFEST="https://storage.googleapis.com/kibana-ci-es-snapshots-daily/8.15.0/manifest-latest-verified.json" node scripts/functional_tests_server.js # 8.15 tests runner node scripts/functional_test_runner --config x-pack/test/api_integration/config.ts --es-version=8.15.0-SNAPSHOT ``` Note in `7.17` the API integration tests are not split up yet into several configs so the commands above will run ALL Kibaan API integration tests. The command to run the tests server for a specific ES version is also shared in the buildkite reports, for example: https://buildkite.com/elastic/kibana-7-dot-17-es-8-dot-15-forward-compatibility/builds/20#annotation-es-snapshot-manifest The versions the compatibility tests will currently run against can be found here: https://github.com/elastic/kibana/blob/main/versions.json ### Checklist - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios - [x] This was checked for breaking API changes and was [labeled appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
This PR uses infrastructure from #107567 to implement a fallback implementation of synthetic source for field mappers that don't support it natively. In that case we will store source of such field as is in a separate stored field.
…astic#183110) ## Summary Fixes elastic#182837. Fixes elastic#182514. The number of fields returned by the field caps API is different across ES versions in forward compatibility tests. In ES 8.15.0, the `_ignored_source` field was added (elastic/elasticsearch#107567). This fixes the API integration test for field caps to assert the correct number of fields across versions. Note that in Kibana `8.15` we refactored away from using fields caps directly in this way and removed the corresponding API endpoint and tests (elastic#182588), that's why there's this dedicated `7.17` PR to just fix the assertions on the existing test. To test this locally, the following commands for the functional tests server and runner can be used to run the tests in different forward compatibility scenarios: ``` # 7.17 tests server node scripts/functional_tests_server.js --config x-pack/test/api_integration/config.ts # 7.17 tests runner node scripts/functional_test_runner --config x-pack/test/api_integration/config.ts # 8.14 tests server ES_SNAPSHOT_MANIFEST="https://storage.googleapis.com/kibana-ci-es-snapshots-daily/8.14.0/manifest-latest-verified.json" node scripts/functional_tests_server.js # 8.14 tests runner node scripts/functional_test_runner --config x-pack/test/api_integration/config.ts --es-version=8.14.0-SNAPSHOT # 8.15 tests server ES_SNAPSHOT_MANIFEST="https://storage.googleapis.com/kibana-ci-es-snapshots-daily/8.15.0/manifest-latest-verified.json" node scripts/functional_tests_server.js # 8.15 tests runner node scripts/functional_test_runner --config x-pack/test/api_integration/config.ts --es-version=8.15.0-SNAPSHOT ``` Note in `7.17` the API integration tests are not split up yet into several configs so the commands above will run ALL Kibaan API integration tests. The command to run the tests server for a specific ES version is also shared in the buildkite reports, for example: https://buildkite.com/elastic/kibana-7-dot-17-es-8-dot-15-forward-compatibility/builds/20#annotation-es-snapshot-manifest The versions the compatibility tests will currently run against can be found here: https://github.com/elastic/kibana/blob/main/versions.json ### Checklist - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios - [x] This was checked for breaking API changes and was [labeled appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
This change introduces
IgnoredSourceFieldMapperto track ignored field names and values, and extendsObjectMapperto access these while constructing synthetic source.IgnoredSourceFieldMapperrelies on the generic logic for parsing and writing various supported tokens. This logic is moved toXContentDataHelperto be properly shared withIgnoreMalformedStoredValues.Related to #106825