Create action to migrate the contents of one index to a new index#20024
Closed
nik9000 wants to merge 18 commits intoelastic:masterfrom
Closed
Create action to migrate the contents of one index to a new index#20024nik9000 wants to merge 18 commits intoelastic:masterfrom
nik9000 wants to merge 18 commits intoelastic:masterfrom
Conversation
Member
Author
|
This is currently a very rough WIP. I'd mostly like to get feedback on the general direction before I go too deep down a rabbit hole. |
I was using a CountDownLatch like a CyclicBarrier....
Throws an exception on current requests to the same index that differ in some way.
`#equals` isn't quite right, so we make something better. And this time we test it.
You can't reuse requests in different threads or they'll be modified by different threads without any proper synchronization. And we check that the request isn't modified in unexpected ways.
This was referenced Aug 18, 2016
Member
Author
|
Sorry for leaving this open for so long. A few of us talked verbally and, while this operation would be useful for some folks, it really wouldn't be useful for upgrading indexes on startup. The reasoning is that upgrading an index requires that the cluster be stable for the duration of the upgrade and cluster startup is the time when the cluster is at its most unstable. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The standard way to change an index's mapping is to create a new index with the
new mapping,
_reindexthe documents into the new index, flip the alias fromthe old index to the new index, and then remove the old index. Traditionally
this sort of thing has been left as an exercise for those implementing an
application against Elasticsearch but I think now is the time to implement this
in Elasticsearch because:
.tasksindex for storing the results oftasks long running. While we were fairly careful in designing its mappings,
I'm under no illusion that we got it right the first try. That just isn't the
way software works. We're going to want to run this on
.tasksone day.handling upgrades to the format of the data is a concern for Logstash's
engineers.
In all of these cases the indexes are implementation details of their
application so we'd like to automatically upgrade them on startup rather than
provide upgrade scripts. That means that the application will want to migrate
its data every time it starts up so a user only has to get involved if the data
migration fails.
3 of the 5 applications that will need to do this migration live inside
Elasticsearch (Watcher and Security are a plugin,
.tasksis in coreElasticsearch). So it looks like the right place to implement this is in core
Elasticsearch. The other advantage of implementing it there is that it can be
used by the widest range of users.
This PR intends to build an action into core Elasticsearch that:
200 OKwhen the index is in the desired statealready.
important in "masterless" systems like Logstash so they can invoke this API on
startup and not have to worry about one node "winning". They all get the same
response.
responds with that information rather than some cryptic failure message.
index steps.
It exposes it with an HTTP request that looks like:
In this example
index_1is the source index andindex_2is the destinationindex. Unlike a normal create index command the
aliasessection is required.This is how
_migrateknows that the process is complete and it is a goodpractice anyway. The alias is added to the destination index after all the docs
in the source index are migrated to the destination index and the destination
index has been
_refreshed so they are visible.Like
_reindexand_delete_by_queryand_update_by_query, these requestsare "big" in that they do many things and we expect them to take a long time if
they operate on a large number of documents. This can't be helped so we want to
make sure that this request integrates well with the task management API. That
means that it should be
"cancellable": trueand it's status should be superexpressive, returning the phase of the operation currently being performed and
if that phase is reindex then it needs to return the details of the reindex's
status.
We try to limit the number of "big" operations in core Elasticsearch because
every one of them feels like a new trap we are setting for unsuspecting users.
We will need to warn users that this can take some time and put some load on
the cluster. For the users all the way at the top of the document we don't
expect this to be a problem though. A Security index with a million documents
is huge but not a ton of work for reindex. We just have to make very very
sure that it is obvious to users that doing this against an index with a
hundred million documents is going to take a long time.