Handle legacy mappings with placeholder fields#85059
Handle legacy mappings with placeholder fields#85059ywelsch merged 32 commits intoelastic:masterfrom
Conversation
|
Pinging @elastic/es-search (Team:Search) |
…ersion indexCreatedVersion)
romseygeek
left a comment
There was a problem hiding this comment.
I'm happy with the Mapper changes now, I think. Can we have some tests checking serialization of placeholder mappers with unknown fields?
| (n, c) -> new Builder(n, c.indexVersionCreated()), | ||
| notInMultiFields(CONTENT_TYPE) | ||
| notInMultiFields(CONTENT_TYPE), | ||
| Version.CURRENT.minimumCompatibilityVersion() |
There was a problem hiding this comment.
Given that there are a few of these multi-parameter calls, maybe we should add a super constructor that takes the Builder and verifier functions and passes Version.CURRENT.minimumCompatibilityVersion() as a default value?
| mappingsBuilder.startObject("date").field("type", "date").field("format", "yyyy/MM/dd").endObject(); | ||
| mappingsBuilder.endObject().endObject(); | ||
| putMappingsRequest.setJsonEntity(Strings.toString(mappingsBuilder)); | ||
| assertOK(client().performRequest(putMappingsRequest)); |
There was a problem hiding this comment.
Can you clarify why this needs to be removed?
There was a problem hiding this comment.
This is now automated by the mapping conversion. Prior to this PR, you had to manually define the mapping after importing a legacy index in order to access the fields in that index (the original mapping would only be imported to _meta/legacy-mappings section).
I've added such a test in ad99173 in addition to the ones in OldMappingsIT |
romseygeek
left a comment
There was a problem hiding this comment.
I left a couple of nits, but LGTM otherwise. Thanks for your patience on this @ywelsch!
server/src/main/java/org/elasticsearch/index/mapper/FieldMapper.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/BooleanFieldMapper.java
Outdated
Show resolved
Hide resolved
| "include_in_all", | ||
| "[include_in_all] is deprecated, the _all field have been removed in this version" | ||
| ); | ||
| if (parserContext.indexVersionCreated().isLegacyIndexVersion() == false) { |
There was a problem hiding this comment.
Do we need to guard this with a version check? I think it will still be true even for older mappings, in that we won't generate an _all field type that you can search against even if the underlying lucene field exists in the index?
There was a problem hiding this comment.
Good spot. In an earlier iteration, I was pondering on adding _all support for legacy indices, hence the version check. I've removed it now.
|
@elasticmachine run elasticsearch-ci/part-1 (unrelated failure) |
javanna
left a comment
There was a problem hiding this comment.
I left some nits and some questions, no blockers on my end. LGTM, thanks for all the iterations.
| */ | ||
| private void checkMappingsCompatibility(IndexMetadata indexMetadata) { | ||
| @Nullable | ||
| public Mapping checkMappingsCompatibility(IndexMetadata indexMetadata) { |
There was a problem hiding this comment.
nit: I find it slightly confusing that this check method is now returning the mappings. I can see how it is convenient to return the mappings obtained from the merge operation. Would it make sense renaming the method to adapt it to the updated behaviour?
| * Return a map of the non-legacy mappers that have been registered. The | ||
| * returned map uses the type of the field as a key. | ||
| */ | ||
| public Map<String, Mapper.TypeParser> getMapperParsers() { |
There was a problem hiding this comment.
I think that this method is only used in tests now: is that ok or do we need to adapt tests as a follow-up?
There was a problem hiding this comment.
I've adapted the tests and removed it in 4c86d3c
| * so that the original mapping can be preserved and proper exception messages can | ||
| * be provided when accessing these fields. | ||
| */ | ||
| public class PlaceHolderFieldMapper extends FieldMapper { |
There was a problem hiding this comment.
I tend to get confused with naming here, and maybe it is just me: would a different name like UnknownLegacyFieldMapper better represent what this mapper is used for? Shall we make it final also?
There was a problem hiding this comment.
I prefer the PlaceHolder terminology, as we might know the field type (e.g. it's a completion field, or a search_as_you_type field, which the current version/implementation knows about), but we just not chose to support for legacy indices. I'm open for changing it if we find a better name, but also happy with the current one.
Regarding "final" modifier, it's more of a personal preference, but I like to avoid adding too many restrictive modifiers in our codebase, as we're not developing a library (unlike Lucene), and it feels unnecessary doing so (putting extra focus on things that should not matter).
| this(builderFunction, contextValidator, Version.CURRENT.minimumIndexCompatibilityVersion()); | ||
| } | ||
|
|
||
| public TypeParser( |
There was a problem hiding this comment.
I believe this one could be made private
| this(builderFunction, (n, c) -> {}, Version.CURRENT.minimumIndexCompatibilityVersion()); | ||
| } | ||
|
|
||
| public TypeParser(BiFunction<String, MappingParserContext, Builder> builderFunction, Version minimumCompatibilityVersion) { |
There was a problem hiding this comment.
would it help to add javadocs here to clarify when to use which constructor? as far as I understand, the constructor that takes the version is the one called by all the mappers that are supported on legacy indices, all the rest remains the same?
| validate(); | ||
| } | ||
|
|
||
| protected void handleUnknownParam(String propName, Object propNode) { |
There was a problem hiding this comment.
should we rename it to something that removes any doubt that this is only called for legacy indices? handleLegacyindexUnknownParam ?
There was a problem hiding this comment.
I've renamed the method to handleUnknownParamOnLegacyIndex in f163412
|
|
||
| @Override | ||
| public boolean supportsVersion(Version indexCreatedVersion) { | ||
| return true; |
There was a problem hiding this comment.
this I find a bit surprising especially given that field aliases were added with 6.4 . I wonder if this is never called, in which case maybe we should throw exception instead, or if it could rely on the default impl instead.
There was a problem hiding this comment.
Even though field aliases were only added in ES 6.4, they could be used with older indices as well (i.e. you had a 5.x index for example that you kept using in 6.4 and now added the field alias to the mapping of that 5.x index). As field aliases are more of a runtime property (e.g. just like runtime fields, which you could also use on indices created before the release of runtime fields) and have no connection to the actual data, I see no reason to limit their use.
|
|
||
| @Override | ||
| public boolean supportsVersion(Version indexCreatedVersion) { | ||
| return true; |
There was a problem hiding this comment.
I wonder if overriding these two supportsVersion is needed. Maybe it does because metadata fields are always supported and they don't take a version in like FieldMapper does?
There was a problem hiding this comment.
This was never called, it mainly resulted from the fact that MetadataFieldMapper.TypeParser was extending Mapper.TypeParser (with no good reason). I separated the two in b25369e, which means we no longer have to have this method here.
|
|
||
| @Override | ||
| public boolean supportsVersion(Version indexCreatedVersion) { | ||
| return true; |
There was a problem hiding this comment.
here too, I wonder what this means. Does this get called, or does this simply mean "objects are always supported regardless of the version"?
There was a problem hiding this comment.
The latter (objects are always supported regardless of the version). This gets called indeed, e.g. when you have an object below an object.
|
Thank you for the reviews @romseygeek and @javanna! |
As part of #81210 we would like to add support for handling legacy (Elasticsearch 5 and 6) mappings in newer Elasticsearch versions. The idea is to import old mappings "as-is" into Elasticsearch 8, and adapt the mapper parsers so that they can handle those old mappings. Only a select subset of the legacy mapping will actually be parsed, and fields that are neither known to newer ES version nor supported for search will be mapped as "placeholder fields", i.e., they are still represented as fields in the system so that they can give proper error messages when queried by a user.
Fields that are supported:
5.x indices with mappings that have multiple mapping types are collapsed together on a best-effort basis before they are imported.
Relates #81210