Implement synthetic source support for annotated text field#107735
Implement synthetic source support for annotated text field#107735lkts merged 10 commits intoelastic:mainfrom
Conversation
|
Documentation preview: |
|
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
|
Related #107734. |
|
Hi @lkts, I've created a changelog YAML for you. |
...t/src/yamlRestTest/resources/rest-api-spec/test/mapper_annotatedtext/20_synthetic_source.yml
Outdated
Show resolved
Hide resolved
|
buildkite test this please |
|
@elasticmachine update branch |
...ext/src/main/java/org/elasticsearch/index/mapper/annotatedtext/AnnotatedTextFieldMapper.java
Show resolved
Hide resolved
| @Override | ||
| protected SyntheticSourceSupport syntheticSourceSupport(boolean ignoreMalformed) { | ||
| throw new AssumptionViolatedException("not supported"); | ||
| assumeFalse("ignore_malformed not supported", ignoreMalformed); |
| @@ -0,0 +1,164 @@ | |||
| --- | |||
There was a problem hiding this comment.
Can we also test a few failure scenarios? Especially because using synthetic source depends on multiple settings combined together (stored, multi-field).
Also I am wondering what happens if we have a field that is both stored and has the keyword multi-filed. I guess in that case we would prefer relying on the stored field because it would allow us to return the contents exactly as they are...including duplicates.
There was a problem hiding this comment.
Failure scenarios are tested in TextFieldFamilySyntheticSourceSupport#invalidExample.
If field is stored, we always prefer that over a multi field. I'll add a test.
| return null; | ||
| public static KeywordFieldMapper getKeywordFieldMapperForSyntheticSource(Iterable<? extends Mapper> multiFields) { | ||
| for (Mapper sub : multiFields) { | ||
| if (sub.typeName().equals(KeywordFieldMapper.CONTENT_TYPE)) { |
There was a problem hiding this comment.
nit: flipping equals reduces the chance for a NPE.
There was a problem hiding this comment.
This is existing code that i moved so i'll leave it as is if you don't mind.
| outPrimary.add(v); | ||
| } | ||
| }); | ||
| List<String> outList = store ? outPrimary : new HashSet<>(outPrimary).stream().sorted().collect(Collectors.toList()); |
There was a problem hiding this comment.
Hash set -> sort -> can we just use a sorted set?
There was a problem hiding this comment.
This is existing code that i moved so i'll leave it as is if you don't mind.
| List<String> loadBlock; | ||
| if (loadBlockFromSource) { | ||
| // The block loader infrastructure will never return nulls. Just zap them all. | ||
| loadBlock = in.stream().filter(m -> m != null).toList(); |
There was a problem hiding this comment.
can't we just filter for null above when creating in?
There was a problem hiding this comment.
in can have nulls, it is a valid value for a field
This PR adds synthetic source support for
annotated_textfields. Existing implementation fortextis reused including test infrastructure so the majority of the change is moving and making things accessible.Contributes to #106460, #78744.