Skip to content

Fix HNS policylist error "network not found" during network removal #2620

Merged
thaJeztah merged 3 commits intomoby:masterfrom
trapier:policylist-removal-order
Jan 18, 2023
Merged

Fix HNS policylist error "network not found" during network removal #2620
thaJeztah merged 3 commits intomoby:masterfrom
trapier:policylist-removal-order

Conversation

@trapier
Copy link

@trapier trapier commented Feb 4, 2021

Removal of PolicyLists from Windows VFP must be performed prior to removing the corresponding HNS network. Otherwise PolicyList removal fails with HNS error "The network was not found."

This ordering requirement was introduced to Windows Server 2019 in an update some time in 2020. Have reached out to Microsoft to request additional context with respect to what OS version(s) in the change was shipped with and the rationale for the change.

Accommodate the OS sequencing requirement by delaying network deletion until after cleaning up service bindings.

This PR also:

  • Introduces error logging for the affected HNS calls.
  • Provides a performance improvement to Linux overlay network cleanup by making explicit service loadbalancer dataplane cleanup logic conditional to Windows (where it is required). On Linux, the analogous ipvs dataplane cleanup occurs implicitly with the deletion of the loadbalancer net namespace.

Opening as draft to take a look at the following kv error I'm seeing during deletion on Windows. Am also seeing this on 55e924b from bump_19.03, so it's likely not related to this PR.

2/3/2021 11:04:10 PM Error (Unable to complete atomic operation, key modified) deleting object [endpoint qyuvcy6qzwnsj09ybf9cvs19h c29da54656238a07de515f732e424d9f72e0b466bd83ff0b622ad632cdc2239d], retrying...

Mirantis Ref: FIELD-3310

Trapier Marshall added 3 commits February 4, 2021 03:08
Signed-off-by: Trapier Marshall <tmarshall@mirantis.com>
Removal of PolicyLists from Windows VFP must be performed prior to
removing the HNS network. Otherwise PolicyList removal fails with
HNS error "network not found".

Signed-off-by: Trapier Marshall <tmarshall@mirantis.com>
Make the call to cleanupServiceBindings during network deletion
conditional on Windows (where it is required), thereby providing a
performance improvement to network cleanup on Linux.

Signed-off-by: Trapier Marshall <tmarshall@mirantis.com>
@JohanSpannare
Copy link

Looking forward to this fix.

Good work!

@olljanat
Copy link
Contributor

olljanat commented Mar 30, 2021

Fixes moby/moby#41354

This ordering requirement was introduced to Windows Server 2019 in an update some time in 2020.

I think that it was KB4551853 + fixes done to that one which also caused moby/moby#40998

Have reached out to Microsoft to request additional context with respect to what OS version(s) in the change was shipped with and the rationale for the change.

FYI. I also opened question to microsoft/hcsshim#988 about how HNS changes are tested as part of Windows build process. Would be nice to get some responses to that one too from your Microsoft contacts.

Comment on lines +1088 to +1090
if runtime.GOOS == "windows" {
c.cleanupServiceBindings(n.ID())
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not needed on Linux? Is it needed in other situations? e.g., I see it's called in clusterAgentInit(); that's still needed for Linux? (Was thinking if we'd need different implementations for Linux and Windows)

libnetwork/controller.go

Lines 349 to 358 in b350742

// We are leaving the cluster. Make sure we
// close the gossip so that we stop all
// incoming gossip updates before cleaning up
// any remaining service bindings. But before
// deleting the networks since the networks
// should still be present when cleaning up
// service bindings
c.agentClose()
c.cleanupServiceDiscovery("")
c.cleanupServiceBindings("")

@cpuguy83
Copy link
Member

Note we have migrated this codebase over to github.com/moby/moby/libnetwork.
We are not accepting PR's on this repo anymore except for backports to be included in moby 20.10

@olljanat
Copy link
Contributor

We are not accepting PR's on this repo anymore except for backports to be included in moby 20.10

FYI for everyone who is waiting for this fix. It looks that Mirantis did backport this on using their own copy of this repo with Mirantis/libnetwork#3 and Mirantis/libnetwork#4 and if I understand correctly it is already included Docker EE versions 19.03.16 and 20.10.5

@olljanat
Copy link
Contributor

@thaJeztah this one was included to moby by my PR moby/moby#43502 before 22.06 split so perhaps this original can be them merged as "backport to 20.10" to here?

@neersighted
Copy link
Member

This LGTM to me and should likely be merged as we accepted it on the 20.10 branch. This would additionally let Mirantis stop maintaining a fork of libnetwork and better align what we are doing with upstream.

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, right let's get this one in for 20.10

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants