Gracefully remove LB endpoints from services #2112

ctelfer · 2018-03-14T15:43:00Z

This patch attempts to allow endpoints to complete servicing connections
while being removed from a service. The change adds a flag to the
endpoint.deleteServiceInfoFromCluster() method to indicate whether this
removal should fully remove connectivity through the load balancer
to the endpoint or should just disable directing further connections to
the endpoint. If the flag is 'false', then the load balancer assigns
a weight of 0 to the endpoint but does not remove it as a linux load
balancing destination. It does remove the endpoint as a docker load
balancing endpoint but tracks it in a special map of "disabled-but-not-
destroyed" load balancing endpoints. This allows traffic to continue
flowing, at least under Linux. If the flag is 'true', then the code
removes the endpoint entirely as a load balancing destination.

The sandbox.DisableService() method invokes deleteServiceInfoFromCluster()
with the flag sent to 'false', while the endpoint.sbLeave() method invokes
it with the flag set to 'true' to complete the removal on endpoint
finalization. Renaming the endpoint invokes deleteServiceInfoFromCluster()
with the flag set to 'true' because renaming attempts to completely
remove and then re-add each endpoint service entry.

The controller.rmServiceBinding() method, which carries out the operation,
similarly gets a new flag for whether to fully remove the endpoint. If
the flag is false, it does the job of moving the endpoint from the
load balancing set to the 'disabled' set. It then removes or
de-weights the entry in the OS load balancing table via
network.rmLBBackend(). It removes the service entirely via said method
ONLY IF there are no more live or disabled load balancing endpoints.
Similarly network.addLBBackend() requires slight tweaking to properly
manage the disabled set.

Finally, this change requires propagating the status of disabled
service endpoints via the networkDB. Accordingly, the patch includes
both code to generate and handle service update messages. It also
augments the service structure with a ServiceDisabled boolean to convey
whether an endpoint should ultimately be removed or just disabled.
This, naturally, required a rebuild of the protocol buffer code as well.

This patch attempts to address the same issue raised in #2074. Manually running the tests described there work for me. I have also performed manual inspections of the IPVS state during update conditions and verified that the entries are properly deweighted and then eventually removed.

Signed-off-by: Chris Telfer ctelfer@docker.com

codecov-io · 2018-03-14T16:04:19Z

Codecov Report

❗ No coverage uploaded for pull request base (master@1b91bc9). Click here to learn what that means.
The diff coverage is 0%.

@@            Coverage Diff            @@
##             master    #2112   +/-   ##
=========================================
  Coverage          ?   40.35%           
=========================================
  Files             ?      139           
  Lines             ?    22433           
  Branches          ?        0           
=========================================
  Hits              ?     9053           
  Misses            ?    12051           
  Partials          ?     1329

Impacted Files	Coverage Δ
service.go	`0% <ø> (ø)`
agent.go	`4.72% <0%> (ø)`
service_linux.go	`3.18% <0%> (ø)`
service_common.go	`7.04% <0%> (ø)`
endpoint.go	`53.69% <0%> (ø)`
sandbox.go	`45.23% <0%> (ø)`
agent.pb.go	`0.7% <0%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1b91bc9...2af06a0. Read the comment docs.

fcrisciani

had a brief first pass, overall makes sense to me, will spend more time on it also

fcrisciani · 2018-03-14T15:59:47Z

agent.go


 	switch event := ev.(type) {
 	case networkdb.CreateEvent:
 		nid = event.NetworkID


this switch case can be definitely compacted :)

mmm... can we? I was hoping so initially, but I'm not sure how we can access the fields of each individual type except through the switch (or equivalent explicit type conversions). It's possible that my understanding of go is wrong, but from what I understand event in each of these separate switch branches has a different type. Unless we were doing C-style unsafe data overlay stuff here, I'm not sure how we can otherwise get to the fields even if they all have the same names, types and order declaration. Please do let me know if I'm wrong.

oh ok event is an empty interface, so yes never mind, if you collapse the 2 case the type will remain the same

Actually looking at it better this can be compacted as:

switch event := ev.(type) { case networkdb.CreateEvent: case networkdb.DeleteEvent: case networkdb.UpdateEvent: nid = event.NetworkID eid = event.Key value = event.Value default: logrus.Errorf("Unexpected update service table event = %#v", event) return }

both the 3 events are actually defined as:

// CreateEvent generates a table entry create event to the watchers type CreateEvent event // UpdateEvent generates a table entry update event to the watchers type UpdateEvent event // DeleteEvent generates a table entry delete event to the watchers type DeleteEvent event

so they are all sharing the same base type:

type event struct { Table string NetworkID string Key string Value []byte }

make the collapse of the 3 cases possible

What you wrote above is legal, but doesn't do what you think it does. :) CreateEvent and DeleteEvent cases are empty branches. So they will not populate nid, eid, and value. Furthermore, you can't fix that by adding fallthrough to those branches because fallthrough is specifically forbidden in type switches. However, since they are all type event under the hood, I can compact it with a second type assertion. Will try this out and repost.

Actually, I can't do that because networkdb.event is not an exported type. :(

sorry I wrote it C++ style, and yes because of event is not exported won't work if you don't change the scope. Ok I'm giving up

fcrisciani · 2018-03-14T17:27:27Z

service_common.go

 		lb = &loadBalancer{
 			vip:      vip,
 			fwMark:   fwMarkCtr,
 			backEnds: make(map[string]net.IP),


is there a reason to dup the map and not enhance the backends to have a flag enable?

That approach would certainly work as well. It would make some things cleaner and others a bit more awkward. I'll give it a quick shot and see how it looks.

ctelfer · 2018-03-15T14:40:19Z

Updated as per @fcrisciani comment to merge the disabled and backEnds maps that track load balancing endpoints.

fcrisciani · 2018-03-15T16:26:30Z

@ctelfer there is a CI failure that is suspicious:

🐳 cross-local
go build -o "bin/dnet-$GOOS-$GOARCH" ./cmd/dnet
# github.com/docker/libnetwork
./service_windows.go:151:15: cannot range over lb (type *loadBalancer)
Makefile:53: recipe for target 'cross-local' failed

ctelfer · 2018-03-15T17:13:23Z

yep ... fixed ... it was a regression caused by refactoring the load balancing backends which I didn't catch because it was only in the Windows build.

fcrisciani

LGTM

fcrisciani

actually looks like the switch can be compacted

abhi

@ctelfer for starters thanks for the detailed description of your changes. This is really helpful. I think we should make that as a readme for folks who are interested in testing and making changes as well.
Overall LGTM with one minor nit. I am an illiterate in servicebinding removal logic so I havent gotten a chance to review that bit of logic end to end.
The only question I have is - in a cluster that might have different version running. Will the unrmarshalls fail coz of protobuf changes ?

abhi · 2018-03-16T16:52:16Z

endpoint.go


+	if ep.svcID != "" {
+		if err := ep.deleteServiceInfoFromCluster(sb, true, "sbLeave"); err != nil {
+			logrus.Warnf("Could not cleanup service info on container %s disconnect: %v", ep.name, err)


nit: Failed to cleanup

This patch attempts to allow endpoints to complete servicing connections while being removed from a service. The change adds a flag to the endpoint.deleteServiceInfoFromCluster() method to indicate whether this removal should fully remove connectivity through the load balancer to the endpoint or should just disable directing further connections to the endpoint. If the flag is 'false', then the load balancer assigns a weight of 0 to the endpoint but does not remove it as a linux load balancing destination. It does remove the endpoint as a docker load balancing endpoint but tracks it in a special map of "disabled-but-not- destroyed" load balancing endpoints. This allows traffic to continue flowing, at least under Linux. If the flag is 'true', then the code removes the endpoint entirely as a load balancing destination. The sandbox.DisableService() method invokes deleteServiceInfoFromCluster() with the flag sent to 'false', while the endpoint.sbLeave() method invokes it with the flag set to 'true' to complete the removal on endpoint finalization. Renaming the endpoint invokes deleteServiceInfoFromCluster() with the flag set to 'true' because renaming attempts to completely remove and then re-add each endpoint service entry. The controller.rmServiceBinding() method, which carries out the operation, similarly gets a new flag for whether to fully remove the endpoint. If the flag is false, it does the job of moving the endpoint from the load balancing set to the 'disabled' set. It then removes or de-weights the entry in the OS load balancing table via network.rmLBBackend(). It removes the service entirely via said method ONLY IF there are no more live or disabled load balancing endpoints. Similarly network.addLBBackend() requires slight tweaking to properly manage the disabled set. Finally, this change requires propagating the status of disabled service endpoints via the networkDB. Accordingly, the patch includes both code to generate and handle service update messages. It also augments the service structure with a ServiceDisabled boolean to convey whether an endpoint should ultimately be removed or just disabled. This, naturally, required a rebuild of the protocol buffer code as well. Signed-off-by: Chris Telfer <ctelfer@docker.com>

ctelfer · 2018-03-16T19:39:06Z

Per @abhi 's question I did look into backward compatibility. The protocol buffers are specifically designed so that an older parser can still parse messages that are like older ones, but with new fields added. The parser simply skips over the TLV encoded fields. So, older endpoints will not choke on the new protocol buffer messages.

One thing that will change, however, in a mixed-version environment is that the older engines will receive networkDB update messages and they have never had to process these before. The older engines will notice and discard said messages, since the event handling code gracefully handled this case. However, it will log a warning, so documentation will ultimately need to be updated to indicate this for mixed-engine environments should we accept this change.

ctelfer · 2018-03-16T19:47:32Z

I have tested and pushed another round of changes based on the feedback here addressing the nit that @abhi mentioned and also improving the clarity of warning logs for overlapping EnableService/DisableService calls which can happen during high-churn scenarios. (This was a known, pre-existing problem: see the code in agent.go/addServiceInfoToCluster())

@fcrisciani and I also took a deeper look at the code as it gets invoked in moby. One thing we noticed was that @abhi added sane behavior to cause moby to call sandbox.DisableService() before sandbox.Delete(). Since sandbox.Delete() (via endpoint.sbLeave()) will now also disable the load balancing services with this patch, that modification can be removed when integrating with moby. Strictly speaking, this removal is not necessary: it just avoids propagating unnecessary swarm messages and making extra system calls for down-weighting an endpoint that is microseconds from full removal anyway.

fcrisciani

LGTM

abhi

LGTM

abhi · 2018-03-19T16:43:13Z

@ctelfer I think it makes sense. I will review the Moby/moby change to understand the behavior.

ddebroy

Had a couple of comments.

ddebroy · 2018-03-19T17:33:34Z

agent.go

+	// Update existing record to indicate that the service is disabled
+	inBuf, err := a.networkDB.GetEntry(libnetworkEPTable, n.ID(), ep.ID())
+	if err != nil {
+		logrus.Warnf("disableServiceInNetworkDB GetEntry failed for %s %s err:%s", ep.id, n.id, err)


Should we use ep.ID() and n.ID() in the warnings too (rather than ep.id/n.id)?

That would have been more consistent I guess. But (unless I am misunderstanding something fundamental) they had better produce the same result or something is seriously wrong. :)

ddebroy · 2018-03-19T18:40:37Z

agent.go

 			if n.ingress {
 				ingressPorts = ep.ingressPorts
 			}
-			if err := c.rmServiceBinding(ep.svcName, ep.svcID, n.ID(), ep.ID(), name, ep.virtualIP, ingressPorts, ep.svcAliases, ep.myAliases, ep.Iface().Address().IP, "deleteServiceInfoFromCluster", true); err != nil {


Not specific to this PR but had a concern since we are touching this area overall: Will it be safer to remove the service binding (to stop resolving the names) first and then call disableServiceInNetworkDB/agent.networkDB.DeleteEntry

That's a fair question. Since the usual order is "lookup name, issue connection request, get redirected by NAT, get load balanced, connect to server", it would make sense to disable the name lookup first and then the load balancing. Or from a code standpoint, since we do (1) set up load balancing and then (2) set up name resolution, it makes sense to tear them down in the opposite order. Patch was already merged, but we could create another PR to address this. I'll confess that I haven't looked in detail at why it removal matched instead of reversed the order of "undo"s. Would need to verify there are no issues there.

On a bit further reflection, though, this doesn't get hit as much as I first thought. Clients don't generally connect to the service backends through their backend service names. They connect through the front-end load balancer. The DNS resolution for this doesn't go away until the last backend goes away. So at that point the connection is supposed to fail. The only difference will be whether the client receives a "host unreachable" or a "name resolution failure". And even then, only if it happens to occur within the fractions of a second between removing the front-end IP address and the DNS entry.

Agree with the fact that any negative effect of this will be extremely short lived. The comment was more in the context of proper ordering of cleanup ops. We can address this in a potential cleanup patch later.

selansen · 2018-03-19T19:05:06Z

agent.go

 	return nil
 }

+func disableServiceInNetworkDB(a *agent, n *network, ep *endpoint) {


disableServiceInNetworkDB - API handles many error conditions. But we are not propagating to the caller like deleteServiceInfoFromCluster. Dont we need to propagate any error messages to the caller ?

Fair question. In this case I think "no". The main reason I put that logic in a separate function was for code style and to keep both the indent level and the length of deleteServiceInfoFromCluster() from growing too much. However, some of the errors that said helper function can encounter are warning-level and some are error-level. Returning an error back to the caller instead of doing the logging directly would require parsing the error type and issuing logs which would negate the readability benefits of using the helper function in the first place.

If the result of any of these errors would have altered the control flow of deleteServiceInfoFromCluster(), then returning the error would have been absolutely the right thing to do. However, there is really no way to recover from said errors as evidenced by the handling in the if fullremove { branch above the disableServiceInNetworkDB() call. (which was the state of the code before this patch). If that branch encounters an error creating a cluster event it warns and moves on.

Also, for the record, the error-level logs in disableServiceInNetworkDB() are basically assertions. The only reason they should ever trigger is that there is some critical error in our automatically-generated protocol buffer code.

fcrisciani requested review from abhi and fcrisciani March 14, 2018 15:47

fcrisciani reviewed Mar 14, 2018

View reviewed changes

ctelfer force-pushed the graceful-lbrm branch from 17243e6 to 55253b4 Compare March 15, 2018 14:39

ctelfer force-pushed the graceful-lbrm branch from 55253b4 to 16188ba Compare March 15, 2018 14:48

ctelfer force-pushed the graceful-lbrm branch from 16188ba to dd2b1d1 Compare March 15, 2018 16:44

fcrisciani approved these changes Mar 16, 2018

View reviewed changes

fcrisciani reviewed Mar 16, 2018

View reviewed changes

abhi reviewed Mar 16, 2018

View reviewed changes

ctelfer force-pushed the graceful-lbrm branch from dd2b1d1 to 2af06a0 Compare March 16, 2018 19:33

fcrisciani approved these changes Mar 16, 2018

View reviewed changes

abhi approved these changes Mar 19, 2018

View reviewed changes

fcrisciani merged commit 2bf6330 into moby:master Mar 19, 2018

ctelfer deleted the graceful-lbrm branch March 19, 2018 18:00

ddebroy reviewed Mar 19, 2018

View reviewed changes

selansen reviewed Mar 19, 2018

View reviewed changes

ctelfer mentioned this pull request Mar 19, 2018

Import libnetwork fix for rolling updates moby/moby#36638

Merged

thaJeztah mentioned this pull request Apr 18, 2018

[18.03] Update libnetwork docker-archive/docker-ce#514

Merged

fcrisciani mentioned this pull request May 14, 2018

Swarm routing failures moby/moby#32079

Closed

thaJeztah mentioned this pull request Jun 5, 2018

Cherry-pick graceful LB changes #2157

Merged

BretFisher mentioned this pull request Jul 18, 2018

Is there a roadmap for docker swarm? moby/swarmkit#2665

Open

corhere mentioned this pull request Jan 13, 2026

libnetwork: fix graceful service endpoint removal moby/moby#51842

Merged

This was referenced Jan 13, 2026

[docker-29.x backport] libnetwork: fix graceful service endpoint removal moby/moby#51843

Merged

[25.0 backport] libnetwork: fix graceful service endpoint removal moby/moby#51844

Merged

Gracefully remove LB endpoints from services #2112

Gracefully remove LB endpoints from services #2112

Uh oh!

Conversation

ctelfer commented Mar 14, 2018

Uh oh!

codecov-io commented Mar 14, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

fcrisciani left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ctelfer commented Mar 15, 2018

Uh oh!

fcrisciani commented Mar 15, 2018

Uh oh!

ctelfer commented Mar 15, 2018

Uh oh!

fcrisciani left a comment

Choose a reason for hiding this comment

Uh oh!

fcrisciani left a comment

Choose a reason for hiding this comment

Uh oh!

abhi left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ctelfer commented Mar 16, 2018

Uh oh!

ctelfer commented Mar 16, 2018

Uh oh!

fcrisciani left a comment

Choose a reason for hiding this comment

Uh oh!

abhi left a comment

Choose a reason for hiding this comment

Uh oh!

abhi commented Mar 19, 2018

Uh oh!

ddebroy left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ddebroy Mar 19, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ctelfer Mar 19, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

codecov-io commented Mar 14, 2018 •

edited

Loading

ddebroy Mar 19, 2018 •

edited

Loading

ctelfer Mar 19, 2018 •

edited

Loading

ctelfer Mar 19, 2018 •

edited

Loading