swarm mode network inspect should provide cluser-wide task details by sanimej · Pull Request #1674 · moby/libnetwork

sanimej · 2017-03-07T23:48:47Z

In swarm mode network inspect shows only the information about the endpoints local to the node. This is because the earlier libnetwork design for overlay relied on the KV store which is not available in the swarm mode. Lack of the cluster-wide visibility makes it difficult to verify if all nodes have a consistent state for the services.

This PR adds the support in libnetwork to add a UX option: docker network inspect --verbose nw which will display all the services in a network with the backend task IP and the host in which the task resides. This is a first step towards better debugging/diagnostics support for swarm mode services. It will used to provide a troubleshooting container that can fetch this control plane state and probe the kernel state to verify if they match.

For example: In a 3 node cluster with two services: S1 with 3 replicas and S2 with 1 replica

vagrant@net-3:~$ docker network inspect --verbose ov1
[
    {
        "Name": "ov1",
        "Id": "ybmyjvao9vtzy3oorxbssj13b",
        "Created": "2017-03-07T22:08:56.471374351Z",
        "Scope": "swarm",
        "Driver": "overlay",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": [
                {
                    "Subnet": "10.0.0.0/24",
                    "Gateway": "10.0.0.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": false,
        "Containers": {
            "5f125d4718f351333decbf34e02c3227a4455731b067cf6921cf9ffaa6779148": {
                "Name": "s1.2.mnzsnw7vadg2ra6mboz4ccuyb",
                "EndpointID": "54a324b9ebe5fc37c65e29cdb8eaa3becfa736e3a8a78a858bd4352a744c6879",
                "MacAddress": "02:42:0a:00:00:04",
                "IPv4Address": "10.0.0.4/24",
                "IPv6Address": ""
            }
        },
        "Options": {
            "com.docker.network.driver.overlay.vxlanid_list": "4097"
        },
        "Labels": {},
        "Peers": [
            {
                "Name": "net-3-3090671942ce",
                "IP": "192.168.33.13"
            },
            {
                "Name": "net-2-fb80208efd75",
                "IP": "192.168.33.12"
            },
            {
                "Name": "net-1-6ecbc0040a73",
                "IP": "192.168.33.11"
            }
        ],
        "Services": {
            "s1": {
                "VIP": "10.0.0.2",
                "Ports": [],
                "Tasks": [
                    {
                        "Name": "s1.1.530e00o1oj9l1kivoci7oupzt",
                        "EndpointID": "f0a46deaf7ce1d87bf42437188a7ba4d8d6ae3e98fe439db95171d8330e278a0",
                        "EndpointIP": "10.0.0.3",
                        "Info": "Host IP 192.168.33.12"
                    },
                    {
                        "Name": "s1.3.98jplsdb5j7ozibrbv0bfs3fg",
                        "EndpointID": "005c24c13b1ddb0395de122ae7c75c889cebc52d4cc61ee918ef3a9dbffa2b27",
                        "EndpointIP": "10.0.0.5",
                        "Info": "Host IP 192.168.33.11"
                    },
                    {
                        "Name": "s1.2.mnzsnw7vadg2ra6mboz4ccuyb",
                        "EndpointID": "54a324b9ebe5fc37c65e29cdb8eaa3becfa736e3a8a78a858bd4352a744c6879",
                        "EndpointIP": "10.0.0.4",
                        "Info": "Host IP 192.168.33.13"
                    }
                ]
            },
            "s2": {
                "VIP": "10.0.0.6",
                "Ports": [],
                "Tasks": [
                    {
                        "Name": "s2.1.spc20i5to6crco9wdh5t3gzh2",
                        "EndpointID": "9d36c87e72236b0c5e0c5794b6963c84c3449ecbab5648ccb861993c760c30b9",
                        "EndpointIP": "10.0.0.7",
                        "Info": "Host IP 192.168.33.12"
                    }
                ]
            }
        }
    }
]

Signed-off-by: Santhosh Manohar santhosh@docker.com

aboch · 2017-03-08T00:13:04Z

+	// relevant info about the endpoint.
+	d, err := n.driver(true)
+	if err != nil {
+		logrus.Errorf("Can't get the driver for network, %s\n", n.ID())


Please the \n from the log and add some context, like
logrus.Errorf("Can't get the driver for network %s while retrieving services info: %v", n.ID(), err)

aboch · 2017-03-08T00:14:18Z


+	//Services returns a map of services keyed by the service name with the details
+	//of all the tasks that belong to the service. Applicable only in swarm mode.
+	Services() map[string]ServiceInfo


As done for Peers(), please move Services() under NetworkInfo

aboch · 2017-03-08T00:14:18Z


+	//Services returns a map of services keyed by the service name with the details
+	//of all the tasks that belong to the service. Applicable only in swarm mode.
+	Services() map[string]ServiceInfo


As for Peers(), please move Services() under NetworkInfo

aboch · 2017-03-08T00:16:10Z


+func (d *driver) DecodeTableEntry(tablename string, key string, value []byte) (string, string) {
+	if tablename != ovPeerTable {
+		logrus.Errorf("DecodeTableEntry: unexpected table name %s", tablename)


Please replace %s with %q here

%s is just fine here.

aboch · 2017-03-08T00:17:16Z

+
+	var peer PeerRecord
+	if err := proto.Unmarshal(value, &peer); err != nil {
+		logrus.Errorf("DecodeTableEntry: failed to unmarshal peer record: %v for key %s", err, key)


logrus.Errorf("DecodeTableEntry: failed to unmarshal peer record for key %q: %v", key, err)

aboch · 2017-03-08T00:19:41Z

+		return "", ""
+	}
+
+	return key, "Host IP " + peer.TunnelEndpointIP


IMO we should separate the property from the value with a semicolon
return key, "Host IP: " + peer.TunnelEndpointIP

aboch · 2017-03-08T00:31:37Z

 	EventNotify(event EventType, nid string, tableName string, key string, value []byte)

+	// DecodeTableEntry passes the driver a key, value pair from table it registered
+	// with libnetwork and expects the driver to return a {endpoint ID, info string}


I think this function is generic enough and could support the decode for any other hypothetical tables that driver might use in future, besides the endpoint_table. Therefore I would not say in the comment that the driver has to return the endpoint ID in the tuple.

Actually, could there be a case where the first element of the returned tuple would differ from the input parameter key ? Was wondering whether we need the decode function to return again the key or just the decoded value.

Actually, could there be a case where the first element of the returned tuple would differ from the input parameter key ? - Key is what the driver passed to networkdb. It could be anything the driver uses to identify the endpoint. So the driver should be able to get the endpoint ID and return it along with a human readable info string. In the case of overlay driver key happens to be the endpoint ID.

With the current APIs driver doesn't convey what a given table is used for. Even though the DecodeTableEntry API is generic without knowing what item we are dealing with it can't be presented meaningfully in the UX. Since the current usage is only for endpoint (driver creates only one entry during the join) we are using it to associate with the libnetwork's endpoint info. If we allow the driver to create arbitrary tables we have to make some changes to the APIs. So the comment is valid and correct for the current usage.

@sanimej it seems there is an inconsistent assumption here. Neither AddTableEntry nor EventNotify assumes a endpoint-id tie-in. But the DecodeTableEntry seems to have it with an assumption that overlay driver depends on endpoint-id. Its very hard to predict how other drivers might use this functionality and hence I would request you to narrow down the AddTableEntry and EventNotify to carry the endpoint-id as well (if possible have a typedef of string to be explicit about the fact that it is endpoint-id). That way you are free to make such an assumption here.

Infact your point is valid that AddTableEntry is called only in endpoint related join call. But the APIs dont assume that. Especially EventNotify. Hence, My recommendation would be include the endpoint-id in the existing table operations. Ofcourse, it can be redundant for overlay driver and it need not use it. But if the API dictates as such, it becomes more correct.

@mavenugo But the DecodeTableEntry seems to have it with an assumption that overlay driver depends on endpoint-id. This is not correct. The reason why DecodeTableEntry expects a tuple {endpoint ID, info} back is precisely because libnetwork can't make that assumption. In future if a driver starts using networkDB for some random data unrelated to endpoint when DecodeTableEntry is called from the table name and key it can identify what is the data being stored and return an empty string, "",""

to narrow down the AddTableEntry and EventNotify to carry the endpoint-id as well - I don't think that would be a good API. What do we do for drivers that want to use networkDB for non-endpoint related data ? For ex: if a driver wants to send a per network data it needs the network id somewhere. So either we end up multiple flavors of the API or a bunch of parameters not relevant for a particular usage.

I think what we can do is add an option to TableEventRegister to indicate what object the drivers want to store in that table, the options can be ENDPOINT_OBJECT, NETWORK_OBJECT, OPAQUE. So in this scenario for example, libnetwork will call DecodeTableEntry only for the table that is associated with ENDPOINT_OBJECT.

Since we haven't exposed networkdb yet to remote drivers may be this is a change we can make now without any backward compatibility issues. But please note that as explained above this PR doesn't make any hard assumptions and the network API enhancement can be done independently.

the options can be ENDPOINT_OBJECT, NETWORK_OBJECT, OPAQUE. - I think this will be quite useful. For ex: overlay driver can create a table for NETWORK_OBJECT, which it can use to store the network level info, for ex: vxlan id. So in the network inspect verbose we can present the vxlan id as well. moby/moby#31559

aboch · 2017-03-08T00:34:42Z

+}
+
+func (n *network) Services() map[string]ServiceInfo {
+	eps := make(map[string]epRecord)


please move this after the two checks

This is just fine. My suggestion is to avoid nit picks on coding style/personal preference unless there is a clear violation of go idioms that are not caught by go vet or a common practice applied consistently across all Docker projects, or at least libnetwork.

aboch · 2017-03-08T00:36:38Z

+	for eid, value := range entries {
+		var epRec EndpointRecord
+		if err := proto.Unmarshal(value.([]byte), &epRec); err != nil {
+			logrus.Errorf("Unmarshal of %s failed", eid)


Please add some context to this log
logrus.Errorf("Unmarshal of endpoint record for %s failed, while retrieving services info for network %s: %v", eid, n.ID(), err)

aboch · 2017-03-08T00:49:36Z

+		for key, value := range entries {
+			epID, info := d.DecodeTableEntry(table, key, value.([]byte))
+			if ep, ok := eps[epID]; !ok {
+				logrus.Errorf("epRecord update attempt for driver info, but missing for %s", epID)


Please reword.
This could be a real discrepancy between driver state and libnetwork core state, but it could also be a genuine difference if a new task was scheduled on this node while running the inspect verbose command.

IMO, given this is for diagnostic, we should also report the spurious driver enpoint entry and mark it as such.
We could have an extra flag in epRecord like as an example Inconsistent which the network inspect cli will print only if true.

I can remove the error message. The main purpose of the the verbose command is to expose the docker daemon's state and use it to match against the kernel state through an out of band tool, like from a separate privileged container. If the driver has any spurious info it should be reflected in the kernel state and caught by that.

In the user output lets not add something that is vague and hard to explain. Obviously it can be added later if there is a compelling reason.

aboch · 2017-03-08T00:50:55Z

+		ports := []string{}
+		if s.Ports == nil {
+			for _, port := range epr.ep.IngressPorts {
+				p := fmt.Sprintf("Target %d, Publish %d", port.TargetPort, port.PublishedPort)


I would separate Target and Publish labels from their value with a semicolon.

aboch · 2017-03-08T00:52:44Z

 }

+type Task struct {
+	Name       string


Would it be helpful to also report the aliases ?

sanimej · 2017-03-08T17:02:30Z

@aboch Fixed the comments and added a const for "endpoint_table". Alias can be included when we provide UX for it. PTAL

mavenugo · 2017-03-09T10:46:16Z

+type ServiceInfo struct {
+	VIP   string
+	Tasks []Task
+	Ports []string


Though it is NOT part of the gossip, I think it will be extremely helpful to have FWMarker part of the ServiceInfo. Since FWMarker plays a major role in the load-balancing mechanism, we can use it for troubleshooting purposes, especially when used in conjunction with an external tool to perform consistency checks.

Agree. Its good to include the FWMarker. Instead of showing it as FWMarker which is very implementation specific we can call it Local LB ID which also conveys the id is local to the node and need not be same across all nodes.

mavenugo · 2017-03-09T11:02:21Z

 	EventNotify(event EventType, nid string, tableName string, key string, value []byte)

+	// DecodeTableEntry passes the driver a key, value pair from table it registered
+	// with libnetwork and expects the driver to return a {endpoint ID, info string}


@sanimej it seems there is an inconsistent assumption here. Neither AddTableEntry nor EventNotify assumes a endpoint-id tie-in. But the DecodeTableEntry seems to have it with an assumption that overlay driver depends on endpoint-id. Its very hard to predict how other drivers might use this functionality and hence I would request you to narrow down the AddTableEntry and EventNotify to carry the endpoint-id as well (if possible have a typedef of string to be explicit about the fact that it is endpoint-id). That way you are free to make such an assumption here.

Infact your point is valid that AddTableEntry is called only in endpoint related join call. But the APIs dont assume that. Especially EventNotify. Hence, My recommendation would be include the endpoint-id in the existing table operations. Ofcourse, it can be redundant for overlay driver and it need not use it. But if the API dictates as such, it becomes more correct.

mavenugo · 2017-03-09T11:03:55Z

+func (d *driver) DecodeTableEntry(tablename string, key string, value []byte) (string, string) {
+	return "", ""
+}
+


hmm... cross build would have caught this duplicate ?

sanimej · 2017-03-10T22:29:38Z

@mavenugo @aboch Updated PR has these changes..

When drivers register with libnetwork for using networkdb a type has to be associate with the table. Libnetwork can use it to determine if a particular table can be decoded and associated with libnetwork objects.
Service info will also include the fwMarker per service. This value can be different across nodes.

PTAL

aboch · 2017-03-10T22:34:44Z

+			continue
+		}
+		i := n.getController().getLBIndex(epRec.ServiceID, nid, epRec.IngressPorts)
+		fmt.Println("GOT LB INDEX..", i)


Remove the println

aboch · 2017-03-10T22:46:57Z

+		id:    sid,
+		ports: portConfigs(ingressPorts).String(),
+	}
+	s, ok := c.serviceBindings[skey]


Please access serviceBindings under controller lock

aboch · 2017-03-10T22:48:14Z

+	if !ok {
+		return 0
+	}
+	return int(s.loadBalancers[nid].fwMark)


please lock the service before accessing loadBalancers

sanimej · 2017-03-10T23:01:23Z

@aboch Thanks. Addressed the comments..

mavenugo · 2017-03-11T00:10:10Z


+func (d *driver) DecodeTableEntry(tablename string, key string, value []byte) (string, map[string]string) {
+	return "", nil
+}


Windows overlay driver also uses the networkdb. I guess it is valid to have this function populated ?

Windows overlay seem to be using providerAddress for VTEP which is set differently..

func (d *driver) convertToOverlayNetwork(v *hcsshim.HNSNetwork) *network { n := &network{ id: v.Name, hnsId: v.Id, driver: d, endpoints: endpointTable{}, subnets: []*subnet{}, providerAddress: v.ManagementIP, }

I would prefer leaving this stub for now and @msabansal address it. This can be done as a bug fix.

👍 thats works.

mavenugo · 2017-03-11T00:22:37Z

 	inDelete     bool
 	ingress      bool
-	driverTables []string
+	driverTables []networkDBTable


I have to dig into the changes a bit more. But can you pls confirm if this PR will have any impact with upgrades or interoperability with existing cluster with older versions nodes

Since there are no changes to the external APIs I don't think there is any backward compatibility concern.

ok. thanks for the confirmation.

Signed-off-by: Santhosh Manohar <santhosh@docker.com>

mavenugo

LGTM

aboch · 2017-03-12T00:39:06Z

LGTM

BretFisher · 2018-09-06T00:58:56Z

Does this actually work? docker network inspect --verbose <overlay-name> doesn't change the results.

ctelfer · 2018-09-06T13:48:07Z

Yes, this does work, or at least it has worked for me the past N times I have used it. My understanding is that there should only be a difference in the results if the gossip network state differs from the node's network state (for the node that docker network inspect is run against). This likely only happens under heavy load or some kind of error condition.

selansen · 2018-09-06T13:52:06Z

Are you having any scale setup or running the command on simple cluster ?
Like @ctelfer mentioned I haven't noticed any issue.

fcrisciani · 2018-09-06T15:31:50Z

@BretFisher keep in mind that a node will be part of a network iff it has at least one container on that network (a part from the ingress network that is always joined by a swarm node). Said that the inspect verbose for a network on a node that does not have any container using it, won't give you any details (manager included)

BretFisher · 2018-09-06T17:01:52Z

Ah, ok the part I was missing is that it's a node-local result and the node must have containers on that network. My results are now as expected, thanks!

aboch reviewed Mar 8, 2017

View reviewed changes

Comment thread agent.go

}

type Task struct {

Name string

Copy link
Copy Markdown

Contributor

aboch Mar 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be helpful to also report the aliases ?

sanimej force-pushed the inspect branch 6 times, most recently from b2bf511 to 59a2815 Compare March 8, 2017 16:59

mavenugo reviewed Mar 9, 2017

View reviewed changes

sanimej mentioned this pull request Mar 10, 2017

Add verbose flag to network inspect to show all services & tasks in swarm mode moby/moby#31710

Merged

sanimej force-pushed the inspect branch 4 times, most recently from 60250e4 to 176de4a Compare March 10, 2017 22:24

aboch reviewed Mar 10, 2017

View reviewed changes

sanimej force-pushed the inspect branch from 176de4a to cc64b2a Compare March 10, 2017 23:00

mavenugo reviewed Mar 11, 2017

View reviewed changes

sanimej force-pushed the inspect branch from cc64b2a to 12bc0cb Compare March 11, 2017 01:09

swarm mode network inspect should provide cluser-wide task details

4693eab

Signed-off-by: Santhosh Manohar <santhosh@docker.com>

sanimej force-pushed the inspect branch from 12bc0cb to 4693eab Compare March 11, 2017 03:12

mavenugo approved these changes Mar 12, 2017

View reviewed changes

aboch merged commit bb4fe0b into moby:master Mar 12, 2017

AkihiroSuda mentioned this pull request Mar 17, 2017

[nit] overlay: "Host IP" -> "HostIP" for consistency #1692

Closed

Conversation

sanimej commented Mar 7, 2017

Uh oh!

aboch Mar 8, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aboch Mar 8, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aboch Mar 8, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sanimej Mar 10, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sanimej Mar 8, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aboch Mar 8, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sanimej commented Mar 8, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sanimej commented Mar 10, 2017

Uh oh!

aboch Mar 10, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sanimej commented Mar 10, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aboch Mar 8, 2017 •

edited

Loading

aboch Mar 8, 2017 •

edited

Loading

aboch Mar 8, 2017 •

edited

Loading

sanimej Mar 10, 2017 •

edited

Loading

sanimej Mar 8, 2017 •

edited

Loading

aboch Mar 8, 2017 •

edited

Loading

sanimej commented Mar 8, 2017 •

edited

Loading

aboch Mar 10, 2017 •

edited

Loading