Skip to content

swarm mode network inspect should provide cluser-wide task details#1674

Merged
aboch merged 1 commit intomoby:masterfrom
sanimej:inspect
Mar 12, 2017
Merged

swarm mode network inspect should provide cluser-wide task details#1674
aboch merged 1 commit intomoby:masterfrom
sanimej:inspect

Conversation

@sanimej
Copy link
Copy Markdown

@sanimej sanimej commented Mar 7, 2017

In swarm mode network inspect shows only the information about the endpoints local to the node. This is because the earlier libnetwork design for overlay relied on the KV store which is not available in the swarm mode. Lack of the cluster-wide visibility makes it difficult to verify if all nodes have a consistent state for the services.

This PR adds the support in libnetwork to add a UX option: docker network inspect --verbose nw which will display all the services in a network with the backend task IP and the host in which the task resides. This is a first step towards better debugging/diagnostics support for swarm mode services. It will used to provide a troubleshooting container that can fetch this control plane state and probe the kernel state to verify if they match.

For example: In a 3 node cluster with two services: S1 with 3 replicas and S2 with 1 replica

vagrant@net-3:~$ docker network inspect --verbose ov1
[
    {
        "Name": "ov1",
        "Id": "ybmyjvao9vtzy3oorxbssj13b",
        "Created": "2017-03-07T22:08:56.471374351Z",
        "Scope": "swarm",
        "Driver": "overlay",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": [
                {
                    "Subnet": "10.0.0.0/24",
                    "Gateway": "10.0.0.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": false,
        "Containers": {
            "5f125d4718f351333decbf34e02c3227a4455731b067cf6921cf9ffaa6779148": {
                "Name": "s1.2.mnzsnw7vadg2ra6mboz4ccuyb",
                "EndpointID": "54a324b9ebe5fc37c65e29cdb8eaa3becfa736e3a8a78a858bd4352a744c6879",
                "MacAddress": "02:42:0a:00:00:04",
                "IPv4Address": "10.0.0.4/24",
                "IPv6Address": ""
            }
        },
        "Options": {
            "com.docker.network.driver.overlay.vxlanid_list": "4097"
        },
        "Labels": {},
        "Peers": [
            {
                "Name": "net-3-3090671942ce",
                "IP": "192.168.33.13"
            },
            {
                "Name": "net-2-fb80208efd75",
                "IP": "192.168.33.12"
            },
            {
                "Name": "net-1-6ecbc0040a73",
                "IP": "192.168.33.11"
            }
        ],
        "Services": {
            "s1": {
                "VIP": "10.0.0.2",
                "Ports": [],
                "Tasks": [
                    {
                        "Name": "s1.1.530e00o1oj9l1kivoci7oupzt",
                        "EndpointID": "f0a46deaf7ce1d87bf42437188a7ba4d8d6ae3e98fe439db95171d8330e278a0",
                        "EndpointIP": "10.0.0.3",
                        "Info": "Host IP 192.168.33.12"
                    },
                    {
                        "Name": "s1.3.98jplsdb5j7ozibrbv0bfs3fg",
                        "EndpointID": "005c24c13b1ddb0395de122ae7c75c889cebc52d4cc61ee918ef3a9dbffa2b27",
                        "EndpointIP": "10.0.0.5",
                        "Info": "Host IP 192.168.33.11"
                    },
                    {
                        "Name": "s1.2.mnzsnw7vadg2ra6mboz4ccuyb",
                        "EndpointID": "54a324b9ebe5fc37c65e29cdb8eaa3becfa736e3a8a78a858bd4352a744c6879",
                        "EndpointIP": "10.0.0.4",
                        "Info": "Host IP 192.168.33.13"
                    }
                ]
            },
            "s2": {
                "VIP": "10.0.0.6",
                "Ports": [],
                "Tasks": [
                    {
                        "Name": "s2.1.spc20i5to6crco9wdh5t3gzh2",
                        "EndpointID": "9d36c87e72236b0c5e0c5794b6963c84c3449ecbab5648ccb861993c760c30b9",
                        "EndpointIP": "10.0.0.7",
                        "Info": "Host IP 192.168.33.12"
                    }
                ]
            }
        }
    }
]

Signed-off-by: Santhosh Manohar santhosh@docker.com

Comment thread agent.go Outdated
// relevant info about the endpoint.
d, err := n.driver(true)
if err != nil {
logrus.Errorf("Can't get the driver for network, %s\n", n.ID())
Copy link
Copy Markdown
Contributor

@aboch aboch Mar 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please the \n from the log and add some context, like
logrus.Errorf("Can't get the driver for network %s while retrieving services info: %v", n.ID(), err)

Comment thread network.go Outdated

//Services returns a map of services keyed by the service name with the details
//of all the tasks that belong to the service. Applicable only in swarm mode.
Services() map[string]ServiceInfo
Copy link
Copy Markdown
Contributor

@aboch aboch Mar 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As done for Peers(), please move Services() under NetworkInfo

Comment thread network.go Outdated

//Services returns a map of services keyed by the service name with the details
//of all the tasks that belong to the service. Applicable only in swarm mode.
Services() map[string]ServiceInfo
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for Peers(), please move Services() under NetworkInfo


func (d *driver) DecodeTableEntry(tablename string, key string, value []byte) (string, string) {
if tablename != ovPeerTable {
logrus.Errorf("DecodeTableEntry: unexpected table name %s", tablename)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please replace %s with %q here

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

%s is just fine here.

Comment thread drivers/overlay/joinleave.go Outdated

var peer PeerRecord
if err := proto.Unmarshal(value, &peer); err != nil {
logrus.Errorf("DecodeTableEntry: failed to unmarshal peer record: %v for key %s", err, key)
Copy link
Copy Markdown
Contributor

@aboch aboch Mar 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logrus.Errorf("DecodeTableEntry: failed to unmarshal peer record for key %q: %v", key, err)

Comment thread drivers/overlay/joinleave.go Outdated
return "", ""
}

return key, "Host IP " + peer.TunnelEndpointIP
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we should separate the property from the value with a semicolon
return key, "Host IP: " + peer.TunnelEndpointIP

Comment thread driverapi/driverapi.go Outdated
EventNotify(event EventType, nid string, tableName string, key string, value []byte)

// DecodeTableEntry passes the driver a key, value pair from table it registered
// with libnetwork and expects the driver to return a {endpoint ID, info string}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this function is generic enough and could support the decode for any other hypothetical tables that driver might use in future, besides the endpoint_table. Therefore I would not say in the comment that the driver has to return the endpoint ID in the tuple.

Actually, could there be a case where the first element of the returned tuple would differ from the input parameter key ? Was wondering whether we need the decode function to return again the key or just the decoded value.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, could there be a case where the first element of the returned tuple would differ from the input parameter key ? - Key is what the driver passed to networkdb. It could be anything the driver uses to identify the endpoint. So the driver should be able to get the endpoint ID and return it along with a human readable info string. In the case of overlay driver key happens to be the endpoint ID.

With the current APIs driver doesn't convey what a given table is used for. Even though the DecodeTableEntry API is generic without knowing what item we are dealing with it can't be presented meaningfully in the UX. Since the current usage is only for endpoint (driver creates only one entry during the join) we are using it to associate with the libnetwork's endpoint info. If we allow the driver to create arbitrary tables we have to make some changes to the APIs. So the comment is valid and correct for the current usage.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sanimej it seems there is an inconsistent assumption here. Neither AddTableEntry nor EventNotify assumes a endpoint-id tie-in. But the DecodeTableEntry seems to have it with an assumption that overlay driver depends on endpoint-id. Its very hard to predict how other drivers might use this functionality and hence I would request you to narrow down the AddTableEntry and EventNotify to carry the endpoint-id as well (if possible have a typedef of string to be explicit about the fact that it is endpoint-id). That way you are free to make such an assumption here.

Infact your point is valid that AddTableEntry is called only in endpoint related join call. But the APIs dont assume that. Especially EventNotify. Hence, My recommendation would be include the endpoint-id in the existing table operations. Ofcourse, it can be redundant for overlay driver and it need not use it. But if the API dictates as such, it becomes more correct.

Copy link
Copy Markdown
Author

@sanimej sanimej Mar 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mavenugo But the DecodeTableEntry seems to have it with an assumption that overlay driver depends on endpoint-id. This is not correct. The reason why DecodeTableEntry expects a tuple {endpoint ID, info} back is precisely because libnetwork can't make that assumption. In future if a driver starts using networkDB for some random data unrelated to endpoint when DecodeTableEntry is called from the table name and key it can identify what is the data being stored and return an empty string, "",""

to narrow down the AddTableEntry and EventNotify to carry the endpoint-id as well - I don't think that would be a good API. What do we do for drivers that want to use networkDB for non-endpoint related data ? For ex: if a driver wants to send a per network data it needs the network id somewhere. So either we end up multiple flavors of the API or a bunch of parameters not relevant for a particular usage.

I think what we can do is add an option to TableEventRegister to indicate what object the drivers want to store in that table, the options can be ENDPOINT_OBJECT, NETWORK_OBJECT, OPAQUE. So in this scenario for example, libnetwork will call DecodeTableEntry only for the table that is associated with ENDPOINT_OBJECT.

Since we haven't exposed networkdb yet to remote drivers may be this is a change we can make now without any backward compatibility issues. But please note that as explained above this PR doesn't make any hard assumptions and the network API enhancement can be done independently.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the options can be ENDPOINT_OBJECT, NETWORK_OBJECT, OPAQUE. - I think this will be quite useful. For ex: overlay driver can create a table for NETWORK_OBJECT, which it can use to store the network level info, for ex: vxlan id. So in the network inspect verbose we can present the vxlan id as well. moby/moby#31559

Comment thread agent.go
}

func (n *network) Services() map[string]ServiceInfo {
eps := make(map[string]epRecord)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please move this after the two checks

Copy link
Copy Markdown
Author

@sanimej sanimej Mar 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just fine. My suggestion is to avoid nit picks on coding style/personal preference unless there is a clear violation of go idioms that are not caught by go vet or a common practice applied consistently across all Docker projects, or at least libnetwork.

Comment thread agent.go Outdated
for eid, value := range entries {
var epRec EndpointRecord
if err := proto.Unmarshal(value.([]byte), &epRec); err != nil {
logrus.Errorf("Unmarshal of %s failed", eid)
Copy link
Copy Markdown
Contributor

@aboch aboch Mar 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some context to this log
logrus.Errorf("Unmarshal of endpoint record for %s failed, while retrieving services info for network %s: %v", eid, n.ID(), err)

Comment thread agent.go Outdated
for key, value := range entries {
epID, info := d.DecodeTableEntry(table, key, value.([]byte))
if ep, ok := eps[epID]; !ok {
logrus.Errorf("epRecord update attempt for driver info, but missing for %s", epID)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please reword.
This could be a real discrepancy between driver state and libnetwork core state, but it could also be a genuine difference if a new task was scheduled on this node while running the inspect verbose command.

IMO, given this is for diagnostic, we should also report the spurious driver enpoint entry and mark it as such.
We could have an extra flag in epRecord like as an example Inconsistent which the network inspect cli will print only if true.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can remove the error message. The main purpose of the the verbose command is to expose the docker daemon's state and use it to match against the kernel state through an out of band tool, like from a separate privileged container. If the driver has any spurious info it should be reflected in the kernel state and caught by that.

In the user output lets not add something that is vague and hard to explain. Obviously it can be added later if there is a compelling reason.

Comment thread agent.go Outdated
ports := []string{}
if s.Ports == nil {
for _, port := range epr.ep.IngressPorts {
p := fmt.Sprintf("Target %d, Publish %d", port.TargetPort, port.PublishedPort)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would separate Target and Publish labels from their value with a semicolon.

Comment thread agent.go
}

type Task struct {
Name string
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be helpful to also report the aliases ?

@sanimej sanimej force-pushed the inspect branch 6 times, most recently from b2bf511 to 59a2815 Compare March 8, 2017 16:59
@sanimej
Copy link
Copy Markdown
Author

sanimej commented Mar 8, 2017

@aboch Fixed the comments and added a const for "endpoint_table". Alias can be included when we provide UX for it. PTAL

Comment thread agent.go Outdated
type ServiceInfo struct {
VIP string
Tasks []Task
Ports []string
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though it is NOT part of the gossip, I think it will be extremely helpful to have FWMarker part of the ServiceInfo. Since FWMarker plays a major role in the load-balancing mechanism, we can use it for troubleshooting purposes, especially when used in conjunction with an external tool to perform consistency checks.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Its good to include the FWMarker. Instead of showing it as FWMarker which is very implementation specific we can call it Local LB ID which also conveys the id is local to the node and need not be same across all nodes.

Comment thread driverapi/driverapi.go Outdated
EventNotify(event EventType, nid string, tableName string, key string, value []byte)

// DecodeTableEntry passes the driver a key, value pair from table it registered
// with libnetwork and expects the driver to return a {endpoint ID, info string}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sanimej it seems there is an inconsistent assumption here. Neither AddTableEntry nor EventNotify assumes a endpoint-id tie-in. But the DecodeTableEntry seems to have it with an assumption that overlay driver depends on endpoint-id. Its very hard to predict how other drivers might use this functionality and hence I would request you to narrow down the AddTableEntry and EventNotify to carry the endpoint-id as well (if possible have a typedef of string to be explicit about the fact that it is endpoint-id). That way you are free to make such an assumption here.

Infact your point is valid that AddTableEntry is called only in endpoint related join call. But the APIs dont assume that. Especially EventNotify. Hence, My recommendation would be include the endpoint-id in the existing table operations. Ofcourse, it can be redundant for overlay driver and it need not use it. But if the API dictates as such, it becomes more correct.

func (d *driver) DecodeTableEntry(tablename string, key string, value []byte) (string, string) {
return "", ""
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm... cross build would have caught this duplicate ?

@sanimej
Copy link
Copy Markdown
Author

sanimej commented Mar 10, 2017

@mavenugo @aboch Updated PR has these changes..

  • When drivers register with libnetwork for using networkdb a type has to be associate with the table. Libnetwork can use it to determine if a particular table can be decoded and associated with libnetwork objects.
  • Service info will also include the fwMarker per service. This value can be different across nodes.

PTAL

Comment thread agent.go Outdated
continue
}
i := n.getController().getLBIndex(epRec.ServiceID, nid, epRec.IngressPorts)
fmt.Println("GOT LB INDEX..", i)
Copy link
Copy Markdown
Contributor

@aboch aboch Mar 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the println

Comment thread service_common.go
id: sid,
ports: portConfigs(ingressPorts).String(),
}
s, ok := c.serviceBindings[skey]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please access serviceBindings under controller lock

Comment thread service_common.go Outdated
if !ok {
return 0
}
return int(s.loadBalancers[nid].fwMark)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please lock the service before accessing loadBalancers

@sanimej
Copy link
Copy Markdown
Author

sanimej commented Mar 10, 2017

@aboch Thanks. Addressed the comments..


func (d *driver) DecodeTableEntry(tablename string, key string, value []byte) (string, map[string]string) {
return "", nil
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Windows overlay driver also uses the networkdb. I guess it is valid to have this function populated ?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Windows overlay seem to be using providerAddress for VTEP which is set differently..

func (d *driver) convertToOverlayNetwork(v *hcsshim.HNSNetwork) *network {
        n := &network{
                id:              v.Name,
                hnsId:           v.Id,
                driver:          d,
                endpoints:       endpointTable{},
                subnets:         []*subnet{},
                providerAddress: v.ManagementIP,
        }

I would prefer leaving this stub for now and @msabansal address it. This can be done as a bug fix.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 thats works.

Comment thread network.go
inDelete bool
ingress bool
driverTables []string
driverTables []networkDBTable
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to dig into the changes a bit more. But can you pls confirm if this PR will have any impact with upgrades or interoperability with existing cluster with older versions nodes

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there are no changes to the external APIs I don't think there is any backward compatibility concern.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. thanks for the confirmation.

Signed-off-by: Santhosh Manohar <santhosh@docker.com>
Copy link
Copy Markdown
Contributor

@mavenugo mavenugo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@aboch
Copy link
Copy Markdown
Contributor

aboch commented Mar 12, 2017

LGTM

@BretFisher
Copy link
Copy Markdown

Does this actually work? docker network inspect --verbose <overlay-name> doesn't change the results.

@ctelfer
Copy link
Copy Markdown
Contributor

ctelfer commented Sep 6, 2018

Yes, this does work, or at least it has worked for me the past N times I have used it. My understanding is that there should only be a difference in the results if the gossip network state differs from the node's network state (for the node that docker network inspect is run against). This likely only happens under heavy load or some kind of error condition.

@selansen
Copy link
Copy Markdown
Contributor

selansen commented Sep 6, 2018

Are you having any scale setup or running the command on simple cluster ?
Like @ctelfer mentioned I haven't noticed any issue.

@fcrisciani
Copy link
Copy Markdown

@BretFisher keep in mind that a node will be part of a network iff it has at least one container on that network (a part from the ingress network that is always joined by a swarm node). Said that the inspect verbose for a network on a node that does not have any container using it, won't give you any details (manager included)

@BretFisher
Copy link
Copy Markdown

Ah, ok the part I was missing is that it's a node-local result and the node must have containers on that network. My results are now as expected, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants