Instances deleted during migration cause exception in _destroy_evacuated_instances

Bug #1155152 reported by Stanislaw Pitucha
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
aeva black
Grizzly
Fix Released
High
Unassigned

Bug Description

There's a condition in instance migration + deletion, which I cannot define properly at the moment, but it results in the following state:

- instance host is already changed to the new host
- instance is still visible in the driver of the old host
- on nova-compute restart on the old host _destroy_evacuated_instances() fails at the _get_instance_nw_info() step with exception "InstanceNotFound: Instance c04a5ef9-e908-470b-9e49-0107d6327829 could not be found." from the network manager

There should be two fixes here really:

1. make _destroy_evacuated_instances() more robust so it doesn't crash the whole nova-compute in such case
2. find out how that state was created and stop it from occurring

Revision history for this message
Stanislaw Pitucha (stanislaw-pitucha) wrote :

Just to clarify previous description the issue is that the list from the driver includes instances which are already marked as deleted. Getting an instance by uuid via _get_instance_nw_info() does not look through those.

This could be most likely resolved in a similar way to what happens to instances which are reaped.

Changed in nova:
assignee: nobody → Stanislaw Pitucha (stanislaw-pitucha)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/24463

Changed in nova:
status: New → In Progress
Revision history for this message
Stanislaw Pitucha (stanislaw-pitucha) wrote :

I'll be away for a week, so if anyone wants to fix this bug instead, please go ahead.

Changed in nova:
assignee: Stanislaw Pitucha (stanislaw-pitucha) → nobody
aeva black (tenbrae)
Changed in nova:
status: In Progress → Triaged
importance: Undecided → Critical
importance: Critical → High
aeva black (tenbrae)
Changed in nova:
assignee: nobody → Devananda van der Veen (devananda)
Changed in nova:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/24463
Committed: http://github.com/openstack/nova/commit/306046c7d5c20454035fcea22ce3efeac1d11cfc
Submitter: Jenkins
Branch: master

commit 306046c7d5c20454035fcea22ce3efeac1d11cfc
Author: Stanislaw Pitucha <email address hidden>
Date: Thu Mar 14 18:37:35 2013 +0000

    After migrate, catch and remove deleted instances

    On the host init, starting with a deleted instance which has been
    previously evacuated from the host results in an InstanceNotFound
    exception. Catch and log this, and then call driver.destroy() so
    that the hypervisor driver can clean up the deleted instance.

    If we don't do this during host init, it will cause problems during
    periodic tasks.

    Fixes bug 1155152

    Co-authored-by: Devananda van der Veen <email address hidden>

    Change-Id: I979a698b8e739b9335f37b81e789285f91977a8e

Changed in nova:
status: In Progress → Fix Committed
tags: added: grizzly-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/grizzly)

Fix proposed to branch: stable/grizzly
Review: https://review.openstack.org/26615

tags: removed: grizzly-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/grizzly)

Reviewed: https://review.openstack.org/26615
Committed: http://github.com/openstack/nova/commit/c244d6617cf60a548e3edae8159553b8ad797ada
Submitter: Jenkins
Branch: stable/grizzly

commit c244d6617cf60a548e3edae8159553b8ad797ada
Author: Stanislaw Pitucha <email address hidden>
Date: Thu Mar 14 18:37:35 2013 +0000

    After migrate, catch and remove deleted instances

    On the host init, starting with a deleted instance which has been
    previously evacuated from the host results in an InstanceNotFound
    exception. Catch and log this, and then call driver.destroy() so
    that the hypervisor driver can clean up the deleted instance.

    If we don't do this during host init, it will cause problems during
    periodic tasks.

    Fixes bug 1155152

    Co-authored-by: Devananda van der Veen <email address hidden>

    Change-Id: I979a698b8e739b9335f37b81e789285f91977a8e
    (cherry picked from commit 306046c7d5c20454035fcea22ce3efeac1d11cfc)

Thierry Carrez (ttx)
Changed in nova:
milestone: none → havana-1
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: havana-1 → 2013.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.