Skip to content

Propagate last node to reinitialized routing tables#91549

Merged
elasticsearchmachine merged 5 commits intoelastic:mainfrom
DaveCTurner:2022-11-14-propagate-last-allocated-id-to-reinitialized-routing-tables
Nov 14, 2022
Merged

Propagate last node to reinitialized routing tables#91549
elasticsearchmachine merged 5 commits intoelastic:mainfrom
DaveCTurner:2022-11-14-propagate-last-allocated-id-to-reinitialized-routing-tables

Conversation

@DaveCTurner
Copy link
Copy Markdown
Member

When closing or opening an index, or restoring a snapshot over a closed index, we reinitialize its routing table from scratch and expect the gateway allocators to select the appropriate node for each shard copy. With this commit we also keep track of the last-allocated node ID for each copy which makes it more likely that the desired balance of these shards remains unchanged too.

Closes #91472

When closing or opening an index, or restoring a snapshot over a closed
index, we reinitialize its routing table from scratch and expect the
gateway allocators to select the appropriate node for each shard copy.
With this commit we also keep track of the last-allocated node ID for
each copy which makes it more likely that the desired balance of these
shards remains unchanged too.

Closes elastic#91472
@DaveCTurner DaveCTurner added >non-issue :Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) v8.6.0 labels Nov 14, 2022
@elasticsearchmachine elasticsearchmachine added the Team:Distributed Meta label for distributed team. label Nov 14, 2022
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

Copy link
Copy Markdown
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One smaller concern, otherwise this looks good.

}
final var previousNodes = new ArrayList<String>(previousShardRoutingTable.size());
previousNodes.add(primaryNode);
for (final var assignedShard : previousShardRoutingTable.assignedShards()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also includes the target of relocations. I wonder if we should only look at active shards, since anything less will anyway not be considered good enough by the gateway allocator?

The problem I see with this is that if a relocation is ongoing, we risk a copy having a last allocated node id that is much worse than it could be (i.e., a node that only has just started the recovery)?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, thanks - see bd12ab9.

assertThat(shard.unassignedInfo().getReason(), equalTo(expectedUnassignedReason));
final var lastAllocatedNodeId = shard.unassignedInfo().getLastAllocatedNodeId();
if (lastAllocatedNodeId == null) {
// restoring an index may change the number of shards/replicas so no guarantee that lastAllocatedNodeId is populated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think only the number of replicas, not the number of shards can be changed? Probably what you meant with shards/replicas, but removing "shards/" would be better I think.

Suggested change
// restoring an index may change the number of shards/replicas so no guarantee that lastAllocatedNodeId is populated
// restoring an index may change the number of replicas so no guarantee that lastAllocatedNodeId is populated

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the contrary, I didn't think there's anything to require that the snapshot has the same number of shards as index on top of which it's being restored.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, right, thanks.

Comment on lines +271 to +272
// both original and restored index must have at least one shard tho
assertTrue(foundAnyNodeIds);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this not go one line up, i.e., we can check this for every shard id?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not if the shard count can change in a restore (which AFAIK it can)

@DaveCTurner DaveCTurner added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Nov 14, 2022
Copy link
Copy Markdown
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@elasticsearchmachine elasticsearchmachine merged commit 1f72f2e into elastic:main Nov 14, 2022
@DaveCTurner DaveCTurner deleted the 2022-11-14-propagate-last-allocated-id-to-reinitialized-routing-tables branch November 14, 2022 14:40
weizijun added a commit to weizijun/elasticsearch that referenced this pull request Nov 15, 2022
* main: (163 commits)
  [DOCS] Edits frequent items aggregation (elastic#91564)
  Handle providers of optional services in ubermodule classloader (elastic#91217)
  Add `exportDockerImages` lifecycle task for exporting docker tarballs (elastic#91571)
  Fix CSV dependency report output file location in DRA CI job
  Fix variable placeholder for Strings.format calls (elastic#91531)
  Fix output dir creation in ConcatFileTask (elastic#91568)
  Fix declaration of dependencies in DRA snapshots CI job (elastic#91569)
  Upgrade Gradle Enterprise plugin to 3.11.4 (elastic#91435)
  Ingest DateProcessor (small) speedup, optimize collections code in DateFormatter.forPattern (elastic#91521)
  Fix inter project handling of generateDependenciesReport (elastic#91555)
  [Synthetics] Add synthetics-* read to fleet-server (elastic#91391)
  [ML] Copy more settings when creating DF analytics destination index (elastic#91546)
  Reduce CartesianCentroidIT flakiness (elastic#91553)
  Propagate last node to reinitialized routing tables (elastic#91549)
  Forecast write load during rollovers (elastic#91425)
  [DOCS] Warn about potential overhead of named queries (elastic#91512)
  Datastream unavailable exception metadata (elastic#91461)
  Generate docker images and dependency report in DRA ci job (elastic#91545)
  Support cartesian_bounds aggregation on point and shape (elastic#91298)
  Add support for EQL samples queries (elastic#91312)
  ...

# Conflicts:
#	x-pack/plugin/rollup/src/main/java/org/elasticsearch/xpack/downsample/RollupShardIndexer.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) :Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) >non-issue Team:Distributed Meta label for distributed team. v8.6.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[CI] AwarenessAllocationIT testAwarenessZones failing

4 participants