Skip to content

Fix race condition in Feature Migration Status API#80572

Merged
AthenaEryma merged 1 commit intoelastic:masterfrom
AthenaEryma:si/migration/fix-status-race
Nov 10, 2021
Merged

Fix race condition in Feature Migration Status API#80572
AthenaEryma merged 1 commit intoelastic:masterfrom
AthenaEryma:si/migration/fix-status-race

Conversation

@AthenaEryma
Copy link
Copy Markdown
Contributor

Prior to this commit, there is a race condition in the Feature Migration
Status API where the returned status can be MIGRATION_NEEDED, even if
a migration is already in progress (and therefore the returned value
should have been IN_PROGRESS). This commit adds a test for this case
which reliably fails without the fix, and fixes the bug.

The fix is straightforward: While we already examine the persistent task
metadata to determine progress, the part of that metadata that we
examined did was not updated until the task's been running for a bit.
However, if we check for the existence of the task metadata, that is
guaranteed to be in the cluster state by the time the request to start the
migration completes (and is removed immediately after the task finishes

  • that's why we have separate metadata for the migration results instead
    of just using the task state).

Fixes #79680

Prior to this commit, there is a race condition in the Feature Migration
Status API where the returned status can be `MIGRATION_NEEDED`, even if
a migration is already in progress (and therefore the returned value
should have been `IN_PROGRESS`). This commit adds a test for this case
which reliably fails without the fix, and fixes the bug.

The fix is straightforward: While we already examine the persistent task
metadata to determine progress, the part of that metadata that we
examined did was not updated until the task's been running for a bit.
However, if we check for the *existence* of the task metadata, that is
guaranteed to be in the cluster state by the time the request to start the
migration completes (and is removed immediately after the task finishes
- that's why we have separate metadata for the migration results instead
of just using the task state).
@elasticmachine elasticmachine added the Team:Core/Infra Meta label for core/infra team label Nov 10, 2021
@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

Copy link
Copy Markdown
Contributor

@williamrandolph williamrandolph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix makes sense, and the code LGTM! Thanks for figuring this out. I verified it on the command line too, using the setup I used previously to replicate the issue.

@AthenaEryma
Copy link
Copy Markdown
Contributor Author

Thanks, both for the review and independently verifying that this fixes things!

@AthenaEryma AthenaEryma merged commit c5baf47 into elastic:master Nov 10, 2021
AthenaEryma added a commit to AthenaEryma/elasticsearch that referenced this pull request Nov 10, 2021
Prior to this commit, there is a race condition in the Feature Migration
Status API where the returned status can be `MIGRATION_NEEDED`, even if
a migration is already in progress (and therefore the returned value
should have been `IN_PROGRESS`). This commit adds a test for this case
which reliably fails without the fix, and fixes the bug.

The fix is straightforward: While we already examine the persistent task
metadata to determine progress, the part of that metadata that we
examined did was not updated until the task's been running for a bit.
However, if we check for the *existence* of the task metadata, that is
guaranteed to be in the cluster state by the time the request to start the
migration completes (and is removed immediately after the task finishes
- that's why we have separate metadata for the migration results instead
of just using the task state).
AthenaEryma added a commit to AthenaEryma/elasticsearch that referenced this pull request Nov 10, 2021
Prior to this commit, there is a race condition in the Feature Migration
Status API where the returned status can be `MIGRATION_NEEDED`, even if
a migration is already in progress (and therefore the returned value
should have been `IN_PROGRESS`). This commit adds a test for this case
which reliably fails without the fix, and fixes the bug.

The fix is straightforward: While we already examine the persistent task
metadata to determine progress, the part of that metadata that we
examined did was not updated until the task's been running for a bit.
However, if we check for the *existence* of the task metadata, that is
guaranteed to be in the cluster state by the time the request to start the
migration completes (and is removed immediately after the task finishes
- that's why we have separate metadata for the migration results instead
of just using the task state).
elasticsearchmachine pushed a commit that referenced this pull request Nov 10, 2021
Prior to this commit, there is a race condition in the Feature Migration
Status API where the returned status can be `MIGRATION_NEEDED`, even if
a migration is already in progress (and therefore the returned value
should have been `IN_PROGRESS`). This commit adds a test for this case
which reliably fails without the fix, and fixes the bug.

The fix is straightforward: While we already examine the persistent task
metadata to determine progress, the part of that metadata that we
examined did was not updated until the task's been running for a bit.
However, if we check for the *existence* of the task metadata, that is
guaranteed to be in the cluster state by the time the request to start the
migration completes (and is removed immediately after the task finishes
- that's why we have separate metadata for the migration results instead
of just using the task state).
elasticsearchmachine pushed a commit that referenced this pull request Nov 10, 2021
Prior to this commit, there is a race condition in the Feature Migration
Status API where the returned status can be `MIGRATION_NEEDED`, even if
a migration is already in progress (and therefore the returned value
should have been `IN_PROGRESS`). This commit adds a test for this case
which reliably fails without the fix, and fixes the bug.

The fix is straightforward: While we already examine the persistent task
metadata to determine progress, the part of that metadata that we
examined did was not updated until the task's been running for a bit.
However, if we check for the *existence* of the task metadata, that is
guaranteed to be in the cluster state by the time the request to start the
migration completes (and is removed immediately after the task finishes
- that's why we have separate metadata for the migration results instead
of just using the task state).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>bug :Core/Infra/Core Core issues without another label Team:Core/Infra Meta label for core/infra team v7.16.0 v8.0.0-rc2 v8.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

POST system feature migration request returns before migration is in progress

5 participants