Rewrite SourceToParse with resolved docType#36921
Conversation
We introduce a typeless API in elastic#35790 where we translate the default docType "_doc" to the user-defined docType. However, we still use docType from the source rather the translated type (i.e, type of the Mapper) when parsing a document. This leads to a situation where we have two translog operations for the same document with different types: - prvOp [Index{id='9LCpwGcBkJN7eZxaB54L', type='_doc', seqNo=1, primaryTerm=1, version=1, autoGeneratedIdTimestamp=1545125562123}] - newOp [Index{id='9LCpwGcBkJN7eZxaB54L', type='not_doc', seqNo=1, primaryTerm=1, version=1, autoGeneratedIdTimestamp=-1}] Closes elastic#36769
|
Pinging @elastic/es-search |
|
We may have a BWC issue with the Typeless API if the primary is on 7.0 while replicas on 6.x. I'll test that scenario and post the result. |
I'm working on a PR for the bulk API which just hit this. @mayya-sharipova recently changed The above detail may not change the status quo on the receiving 6.x end which you're tackling here (provided IndexRequest = _doc but mapping = custom type) but it may be useful to know the related work going on and the CI reproduce line that uncovered my issue: 2db6753 |
jpountz
left a comment
There was a problem hiding this comment.
The change looks good to me.
Yannick talked to me about this issue earlier today and I do agree that we need to find a more robust way to make typeless APIs work on indices that still have a type whose name is different from _doc.
|
run Gradle build tests 1 |
|
Thanks @jpountz for reviewing and @markharwood for the heads-up. |
Today the routing of a SourceToParse is assigned in a separate step after the object is created. We can easily forget to set the routing. With this commit, the routing must be provided in the constructor of SourceToParse. Relates #36921
We introduce a typeless API in #35790 where we translate the default docType "_doc" to the user-defined docType. However, we do not rewrite the SourceToParse with the resolved docType. This leads to a situation where we have two translog operations for the same document with different types:
prvOp [Index{id='9LCpwGcBkJN7eZxaB54L', type='_doc', seqNo=1, primaryTerm=1, version=1, autoGeneratedIdTimestamp=1545125562123}]
newOp [Index{id='9LCpwGcBkJN7eZxaB54L', type='not_doc', seqNo=1, primaryTerm=1, version=1, autoGeneratedIdTimestamp=-1}]
Closes #36769