swarm mode network inspect should provide cluser-wide task details#1674
swarm mode network inspect should provide cluser-wide task details#1674aboch merged 1 commit intomoby:masterfrom
Conversation
| // relevant info about the endpoint. | ||
| d, err := n.driver(true) | ||
| if err != nil { | ||
| logrus.Errorf("Can't get the driver for network, %s\n", n.ID()) |
There was a problem hiding this comment.
Please the \n from the log and add some context, like
logrus.Errorf("Can't get the driver for network %s while retrieving services info: %v", n.ID(), err)
|
|
||
| //Services returns a map of services keyed by the service name with the details | ||
| //of all the tasks that belong to the service. Applicable only in swarm mode. | ||
| Services() map[string]ServiceInfo |
There was a problem hiding this comment.
As done for Peers(), please move Services() under NetworkInfo
|
|
||
| //Services returns a map of services keyed by the service name with the details | ||
| //of all the tasks that belong to the service. Applicable only in swarm mode. | ||
| Services() map[string]ServiceInfo |
There was a problem hiding this comment.
As for Peers(), please move Services() under NetworkInfo
|
|
||
| func (d *driver) DecodeTableEntry(tablename string, key string, value []byte) (string, string) { | ||
| if tablename != ovPeerTable { | ||
| logrus.Errorf("DecodeTableEntry: unexpected table name %s", tablename) |
There was a problem hiding this comment.
Please replace %s with %q here
|
|
||
| var peer PeerRecord | ||
| if err := proto.Unmarshal(value, &peer); err != nil { | ||
| logrus.Errorf("DecodeTableEntry: failed to unmarshal peer record: %v for key %s", err, key) |
There was a problem hiding this comment.
logrus.Errorf("DecodeTableEntry: failed to unmarshal peer record for key %q: %v", key, err)
| return "", "" | ||
| } | ||
|
|
||
| return key, "Host IP " + peer.TunnelEndpointIP |
There was a problem hiding this comment.
IMO we should separate the property from the value with a semicolon
return key, "Host IP: " + peer.TunnelEndpointIP
| EventNotify(event EventType, nid string, tableName string, key string, value []byte) | ||
|
|
||
| // DecodeTableEntry passes the driver a key, value pair from table it registered | ||
| // with libnetwork and expects the driver to return a {endpoint ID, info string} |
There was a problem hiding this comment.
I think this function is generic enough and could support the decode for any other hypothetical tables that driver might use in future, besides the endpoint_table. Therefore I would not say in the comment that the driver has to return the endpoint ID in the tuple.
Actually, could there be a case where the first element of the returned tuple would differ from the input parameter key ? Was wondering whether we need the decode function to return again the key or just the decoded value.
There was a problem hiding this comment.
Actually, could there be a case where the first element of the returned tuple would differ from the input parameter key ? - Key is what the driver passed to networkdb. It could be anything the driver uses to identify the endpoint. So the driver should be able to get the endpoint ID and return it along with a human readable info string. In the case of overlay driver key happens to be the endpoint ID.
With the current APIs driver doesn't convey what a given table is used for. Even though the DecodeTableEntry API is generic without knowing what item we are dealing with it can't be presented meaningfully in the UX. Since the current usage is only for endpoint (driver creates only one entry during the join) we are using it to associate with the libnetwork's endpoint info. If we allow the driver to create arbitrary tables we have to make some changes to the APIs. So the comment is valid and correct for the current usage.
There was a problem hiding this comment.
@sanimej it seems there is an inconsistent assumption here. Neither AddTableEntry nor EventNotify assumes a endpoint-id tie-in. But the DecodeTableEntry seems to have it with an assumption that overlay driver depends on endpoint-id. Its very hard to predict how other drivers might use this functionality and hence I would request you to narrow down the AddTableEntry and EventNotify to carry the endpoint-id as well (if possible have a typedef of string to be explicit about the fact that it is endpoint-id). That way you are free to make such an assumption here.
Infact your point is valid that AddTableEntry is called only in endpoint related join call. But the APIs dont assume that. Especially EventNotify. Hence, My recommendation would be include the endpoint-id in the existing table operations. Ofcourse, it can be redundant for overlay driver and it need not use it. But if the API dictates as such, it becomes more correct.
There was a problem hiding this comment.
@mavenugo But the DecodeTableEntry seems to have it with an assumption that overlay driver depends on endpoint-id. This is not correct. The reason why DecodeTableEntry expects a tuple {endpoint ID, info} back is precisely because libnetwork can't make that assumption. In future if a driver starts using networkDB for some random data unrelated to endpoint when DecodeTableEntry is called from the table name and key it can identify what is the data being stored and return an empty string, "",""
to narrow down the AddTableEntry and EventNotify to carry the endpoint-id as well - I don't think that would be a good API. What do we do for drivers that want to use networkDB for non-endpoint related data ? For ex: if a driver wants to send a per network data it needs the network id somewhere. So either we end up multiple flavors of the API or a bunch of parameters not relevant for a particular usage.
I think what we can do is add an option to TableEventRegister to indicate what object the drivers want to store in that table, the options can be ENDPOINT_OBJECT, NETWORK_OBJECT, OPAQUE. So in this scenario for example, libnetwork will call DecodeTableEntry only for the table that is associated with ENDPOINT_OBJECT.
Since we haven't exposed networkdb yet to remote drivers may be this is a change we can make now without any backward compatibility issues. But please note that as explained above this PR doesn't make any hard assumptions and the network API enhancement can be done independently.
There was a problem hiding this comment.
the options can be ENDPOINT_OBJECT, NETWORK_OBJECT, OPAQUE. - I think this will be quite useful. For ex: overlay driver can create a table for NETWORK_OBJECT, which it can use to store the network level info, for ex: vxlan id. So in the network inspect verbose we can present the vxlan id as well. moby/moby#31559
| } | ||
|
|
||
| func (n *network) Services() map[string]ServiceInfo { | ||
| eps := make(map[string]epRecord) |
There was a problem hiding this comment.
please move this after the two checks
There was a problem hiding this comment.
This is just fine. My suggestion is to avoid nit picks on coding style/personal preference unless there is a clear violation of go idioms that are not caught by go vet or a common practice applied consistently across all Docker projects, or at least libnetwork.
| for eid, value := range entries { | ||
| var epRec EndpointRecord | ||
| if err := proto.Unmarshal(value.([]byte), &epRec); err != nil { | ||
| logrus.Errorf("Unmarshal of %s failed", eid) |
There was a problem hiding this comment.
Please add some context to this log
logrus.Errorf("Unmarshal of endpoint record for %s failed, while retrieving services info for network %s: %v", eid, n.ID(), err)
| for key, value := range entries { | ||
| epID, info := d.DecodeTableEntry(table, key, value.([]byte)) | ||
| if ep, ok := eps[epID]; !ok { | ||
| logrus.Errorf("epRecord update attempt for driver info, but missing for %s", epID) |
There was a problem hiding this comment.
Please reword.
This could be a real discrepancy between driver state and libnetwork core state, but it could also be a genuine difference if a new task was scheduled on this node while running the inspect verbose command.
IMO, given this is for diagnostic, we should also report the spurious driver enpoint entry and mark it as such.
We could have an extra flag in epRecord like as an example Inconsistent which the network inspect cli will print only if true.
There was a problem hiding this comment.
I can remove the error message. The main purpose of the the verbose command is to expose the docker daemon's state and use it to match against the kernel state through an out of band tool, like from a separate privileged container. If the driver has any spurious info it should be reflected in the kernel state and caught by that.
In the user output lets not add something that is vague and hard to explain. Obviously it can be added later if there is a compelling reason.
| ports := []string{} | ||
| if s.Ports == nil { | ||
| for _, port := range epr.ep.IngressPorts { | ||
| p := fmt.Sprintf("Target %d, Publish %d", port.TargetPort, port.PublishedPort) |
There was a problem hiding this comment.
I would separate Target and Publish labels from their value with a semicolon.
| } | ||
|
|
||
| type Task struct { | ||
| Name string |
There was a problem hiding this comment.
Would it be helpful to also report the aliases ?
b2bf511 to
59a2815
Compare
|
@aboch Fixed the comments and added a const for "endpoint_table". Alias can be included when we provide UX for it. PTAL |
| type ServiceInfo struct { | ||
| VIP string | ||
| Tasks []Task | ||
| Ports []string |
There was a problem hiding this comment.
Though it is NOT part of the gossip, I think it will be extremely helpful to have FWMarker part of the ServiceInfo. Since FWMarker plays a major role in the load-balancing mechanism, we can use it for troubleshooting purposes, especially when used in conjunction with an external tool to perform consistency checks.
There was a problem hiding this comment.
Agree. Its good to include the FWMarker. Instead of showing it as FWMarker which is very implementation specific we can call it Local LB ID which also conveys the id is local to the node and need not be same across all nodes.
| EventNotify(event EventType, nid string, tableName string, key string, value []byte) | ||
|
|
||
| // DecodeTableEntry passes the driver a key, value pair from table it registered | ||
| // with libnetwork and expects the driver to return a {endpoint ID, info string} |
There was a problem hiding this comment.
@sanimej it seems there is an inconsistent assumption here. Neither AddTableEntry nor EventNotify assumes a endpoint-id tie-in. But the DecodeTableEntry seems to have it with an assumption that overlay driver depends on endpoint-id. Its very hard to predict how other drivers might use this functionality and hence I would request you to narrow down the AddTableEntry and EventNotify to carry the endpoint-id as well (if possible have a typedef of string to be explicit about the fact that it is endpoint-id). That way you are free to make such an assumption here.
Infact your point is valid that AddTableEntry is called only in endpoint related join call. But the APIs dont assume that. Especially EventNotify. Hence, My recommendation would be include the endpoint-id in the existing table operations. Ofcourse, it can be redundant for overlay driver and it need not use it. But if the API dictates as such, it becomes more correct.
| func (d *driver) DecodeTableEntry(tablename string, key string, value []byte) (string, string) { | ||
| return "", "" | ||
| } | ||
|
|
There was a problem hiding this comment.
hmm... cross build would have caught this duplicate ?
60250e4 to
176de4a
Compare
|
@mavenugo @aboch Updated PR has these changes..
PTAL |
| continue | ||
| } | ||
| i := n.getController().getLBIndex(epRec.ServiceID, nid, epRec.IngressPorts) | ||
| fmt.Println("GOT LB INDEX..", i) |
| id: sid, | ||
| ports: portConfigs(ingressPorts).String(), | ||
| } | ||
| s, ok := c.serviceBindings[skey] |
There was a problem hiding this comment.
Please access serviceBindings under controller lock
| if !ok { | ||
| return 0 | ||
| } | ||
| return int(s.loadBalancers[nid].fwMark) |
There was a problem hiding this comment.
please lock the service before accessing loadBalancers
|
@aboch Thanks. Addressed the comments.. |
|
|
||
| func (d *driver) DecodeTableEntry(tablename string, key string, value []byte) (string, map[string]string) { | ||
| return "", nil | ||
| } |
There was a problem hiding this comment.
Windows overlay driver also uses the networkdb. I guess it is valid to have this function populated ?
There was a problem hiding this comment.
Windows overlay seem to be using providerAddress for VTEP which is set differently..
func (d *driver) convertToOverlayNetwork(v *hcsshim.HNSNetwork) *network {
n := &network{
id: v.Name,
hnsId: v.Id,
driver: d,
endpoints: endpointTable{},
subnets: []*subnet{},
providerAddress: v.ManagementIP,
}
I would prefer leaving this stub for now and @msabansal address it. This can be done as a bug fix.
| inDelete bool | ||
| ingress bool | ||
| driverTables []string | ||
| driverTables []networkDBTable |
There was a problem hiding this comment.
I have to dig into the changes a bit more. But can you pls confirm if this PR will have any impact with upgrades or interoperability with existing cluster with older versions nodes
There was a problem hiding this comment.
Since there are no changes to the external APIs I don't think there is any backward compatibility concern.
There was a problem hiding this comment.
ok. thanks for the confirmation.
Signed-off-by: Santhosh Manohar <santhosh@docker.com>
|
LGTM |
|
Does this actually work? |
|
Yes, this does work, or at least it has worked for me the past N times I have used it. My understanding is that there should only be a difference in the results if the gossip network state differs from the node's network state (for the node that |
|
Are you having any scale setup or running the command on simple cluster ? |
|
@BretFisher keep in mind that a node will be part of a network iff it has at least one container on that network (a part from the ingress network that is always joined by a swarm node). Said that the inspect verbose for a network on a node that does not have any container using it, won't give you any details (manager included) |
|
Ah, ok the part I was missing is that it's a node-local result and the node must have containers on that network. My results are now as expected, thanks! |
In swarm mode
network inspectshows only the information about the endpoints local to the node. This is because the earlier libnetwork design for overlay relied on the KV store which is not available in the swarm mode. Lack of the cluster-wide visibility makes it difficult to verify if all nodes have a consistent state for the services.This PR adds the support in libnetwork to add a UX option:
docker network inspect --verbose nwwhich will display all the services in a network with the backend task IP and the host in which the task resides. This is a first step towards better debugging/diagnostics support for swarm mode services. It will used to provide a troubleshooting container that can fetch thiscontrol planestate and probe the kernel state to verify if they match.For example: In a 3 node cluster with two services: S1 with 3 replicas and S2 with 1 replica
Signed-off-by: Santhosh Manohar santhosh@docker.com