Skip to content

2 phase indexing#14635

Merged
mergify[bot] merged 4 commits intodrop-columnfrom
b/oid-in-src2
Sep 4, 2023
Merged

2 phase indexing#14635
mergify[bot] merged 4 commits intodrop-columnfrom
b/oid-in-src2

Conversation

@BaurzhanSakhariev
Copy link
Contributor

@BaurzhanSakhariev BaurzhanSakhariev commented Aug 30, 2023

This is a pre-requisite for using column OID-s instead of names in the _source.

Phase 1: Collect new columns and do a schema update. Don't create a Lucene document with source at this point.
Phase 2: Update Indexer targets using new AddColumnResponse.

Actually writing OID in the source to be followed in #14636

Supersedes #14617 (same but resolved conflicts after last drop-column rebase (replaced test assertions to use new ReferenceAssert, introduced in #14632 and #14634) + squashed already reviewed commits for the phase 1)

@BaurzhanSakhariev BaurzhanSakhariev mentioned this pull request Aug 30, 2023
2 tasks
This is a pre-requisite for using column OID-s instead of names in the _source.

Phase 1: Collect new columns and do a schema update. Don't create a Lucene document with source at this point.
Phase 2: Index. Target references still might have unassigned OID since this is only a preparation step.
@BaurzhanSakhariev BaurzhanSakhariev force-pushed the b/oid-in-src2 branch 2 times, most recently from 33fb254 to 877a5c3 Compare August 31, 2023 07:40
@BaurzhanSakhariev
Copy link
Contributor Author

BaurzhanSakhariev commented Aug 31, 2023

Actually unreviewed commit is this one - which is about removing minNodeVersion and using only Version.CREATED for deciding whether to use OID in source or not.

Relevant comment from the old PR:
#14617 (comment)

I don't think it's safe to use this condition to decide if oid indexing should be used.

A mixed cluster is temporary, so you can end up with a table that has some documents indexed with oids and some without.

I think to avoid the streaming dependency we could look up the oids in the local state? The cluster state should have updated locally too if we got a response from the add-column action.

And reply:

As discussed internally in https://crate.slack.com/archives/CBPLN6WJK/p1693396898974569,
we can only rely on Version.created.

Also, in a follow up #14636, where OID is actually written to the source, we have kind of Function<Reference, String> sourceKeyWriter, which is either ref -> ref.column.leafName or Long.toString(ref.oid()), depending on Version.CREATED

@BaurzhanSakhariev
Copy link
Contributor Author

Hi @seut, could you please take a look?

In addition to the comment above, I also did version aware response of the TransportAddColumnAction as discussed.

77baaa8

@BaurzhanSakhariev BaurzhanSakhariev requested a review from seut August 31, 2023 12:41
Copy link
Member

@seut seut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 nice work!

@BaurzhanSakhariev BaurzhanSakhariev added the ready-to-merge Let Mergify merge the PR once approved and checks pass label Sep 4, 2023
@mergify mergify bot merged commit 005ab93 into drop-column Sep 4, 2023
@mergify mergify bot deleted the b/oid-in-src2 branch September 4, 2023 17:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-to-merge Let Mergify merge the PR once approved and checks pass

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants