Conversation
Conflicts: src/main/java/org/elasticsearch/gateway/GatewayMetaState.java
Also adds a nocommit
make flush to a refresh factor our ShadowIndexShard to have IndexShard be idential to the master and least intrusive cleanup abstractions
Conflicts: src/main/java/org/elasticsearch/index/engine/Engine.java
…dler that skip most phases and enforces shard closing on the soruce before the target opens it's engine
There was a problem hiding this comment.
I would clarify this statement? cause mapping updates do get replicated, it just takes longer since it needs to head to the master and then published to the replicas, so there is a delay in mapping introduction.
There was a problem hiding this comment.
I will clarify this comment
There was a problem hiding this comment.
can we use the same method structure between this method and the following? I like isIndex...
|
@s1monw I added a |
There was a problem hiding this comment.
should we use the IndexMetaData#isIndexUsingShadowReplicas help method here?
There was a problem hiding this comment.
Yeah, I will change this to use the helper
|
left really minor comments, it looks great. One note, should we mention in the docs the second phase tasks, like do primary promotion without failing an engine? If so, I would also add a task that on get, we automatically set the "go to primary" flag if shadow replica is used and realtime get is used? |
|
left two more comments other than that LGTM |
|
Pushed more commits hooking up the |
There was a problem hiding this comment.
can we use org.elasticsearch.cluster.routing.operation.plain.Preference.PRIMARY.type() here instead?
|
left on minor comment! LGTM feel free to push! |
|
pushed to 1.x and master! |
These commits add the shadow replicas feature for use on shared filesystems
(it does not include segment replication for non-shared filesystems yet).
If we assume that the data in the index path will already be shared across
multiple nodes, we can create and index with shadow replicas, where each replica
shard simply contains an
IndexReaderthat periodically refreshes to pick upnew segments.
All indexing operations will be executed on the primary shard, and will not be
replicated to each replica, since the data will be replicated in a different
way.
During this phase, creating an index with
index.shadow_replicas: trueandnumber_of_replicasgreater than 0 will cause operations not to undergoreplication to replica shards. An index can have either regular replicas or
shadow replicas; they are mutually exclusive for an index. The
index.shadow_replicassetting is set at index creation time and cannot bechanged dynamically.
The Elasticsearch cluster will still detect the loss of a primary shard, and
transform the replica into a primary in this situation. This transformation will
take slightly longer, since no
IndexWriterwill be maintained for each shadowreplica.
In order to ensure the data is being synchronized in a fast enough manner, The
user will need to tune the flush threshold for the index to a desired number. A
flush is needed to fsync segment files to disk, so they will be visible to all
other replica nodes. Users should test what flush threshold levels they are
comfortable with, as increased flushing can impact indexing performance. This
testing can be performed at any time, there is no need to wait for this feature
to be available first.
Once segments are available on the filesystem where the shadow replica resides,
a regular refresh (governed by the
index.refresh_interval) can be used to makethe new data searchable.
See #8976 for the overall shadow replica plan