Conversation
daemon/events.go
Outdated
There was a problem hiding this comment.
Maybe we should create a func eventTimestamp(swarmapi.Meta, swarmapi.WatchActionKind) time.Time function that handles extracting the timestamp. Then we can call it in generateClusterEvent and pass in the timestamp instead of needing to handle it in all the different switch statements.
daemon/cluster/noderunner.go
Outdated
There was a problem hiding this comment.
Would it be useful for Watch to support something like Kind: "*"?
In general I want people to filter for the specific events of interest, but the Docker events API is a bit of an unusual case.
There was a problem hiding this comment.
We don't want to get task change notifications. They are too many and are implementation details that we don't want user rely on that.
cf3e70e to
8b6feed
Compare
a25b632 to
9dd7ff8
Compare
9dd7ff8 to
a42a92c
Compare
|
Test failure looks unrelated. It's already tracked by #33041. |
|
Please review. |
daemon/events.go
Outdated
daemon/events.go
Outdated
There was a problem hiding this comment.
Can we use make these string literals const-ants?
There was a problem hiding this comment.
Ideally they should be defined in swarmkit. But I don't find them there. I don't think they should be added here.
daemon/cluster/cluster.go
Outdated
There was a problem hiding this comment.
Comment doesn't match the field name.
Maybe this can just be Events? I like to avoid putting type information in the field names.
There was a problem hiding this comment.
Change to WatchStream.
daemon/cluster/noderunner.go
Outdated
There was a problem hiding this comment.
I can't see any code path where it is possible for n.cluster to be nil. I'd suggest removing this - it is a little misleading to suggest that n.cluster can be nil, when other code would blow up in this case.
daemon/cluster/noderunner.go
Outdated
There was a problem hiding this comment.
logrus.WithError(err).Error("failed to receive changes from store watch API")
daemon/cluster/noderunner.go
Outdated
daemon/cluster/noderunner.go
Outdated
There was a problem hiding this comment.
Select on this channel write and <-ctx.Done().
There was a problem hiding this comment.
When <-ctx.Done(), shouldn't watch.Recv() return with error? If it doesn't return with error, it may cause a thread leak. Adding <-ctx.Done() along with the channel write wouldn't help because it may not reach here.
daemon/events.go
Outdated
daemon/events.go
Outdated
daemon/events.go
Outdated
There was a problem hiding this comment.
Don't we need something to check if oldNode is nil? The old value is not guaranteed to be available.
There was a problem hiding this comment.
Why is that? The watch API asks for old object. In an update case, the old object should always be available.
There was a problem hiding this comment.
Providing the old object is best-effort. We're not able to do it if you're looking at past events. I don't think the events code is using this right now, but the point is that the API doesn't guarantee it can provide the old object. Even if it did, this code should still check. It's node good for the client side of a client/server system to crash if the server omits expected values.
daemon/events.go
Outdated
There was a problem hiding this comment.
Don't we need something to check if oldService is nil? The old value is not guaranteed to be available.
daemon/events/events.go
Outdated
a42a92c to
00358e7
Compare
daemon/events.go
Outdated
There was a problem hiding this comment.
Don't we need to check if OldObject is nil before calling GetNode, etc.?
There was a problem hiding this comment.
The GetNode function handles it directly.
func (m *Object) GetNode() *Node {
if x, ok := m.GetObject().(*Object_Node); ok {
return x.Node
}
return nil
}
func (m *Object) GetObject() isObject_Object {
if m != nil {
return m.Object
}
return nil
}
|
LGTM |
|
Please rebase |
daemon/events.go
Outdated
There was a problem hiding this comment.
Maybe this should be swarm. Not sure though.
There was a problem hiding this comment.
I'll update it to swarm.
daemon/events.go
Outdated
There was a problem hiding this comment.
I think we should output node.Role (attributes["role..."]) rather than node.Spec.DesiredRole
There was a problem hiding this comment.
Changed to role.new.
There was a problem hiding this comment.
The code is still using node.Spec.DesiredRole. I believe @aluzzardi wanted this to look at node.Role instead.
b90a001 to
c8681d4
Compare
c8681d4 to
41a6963
Compare
|
Ping @AkihiroSuda @vdemeester @tonistiigi. Please take a look if you are interested. |
cmd/dockerd/daemon.go
Outdated
There was a problem hiding this comment.
There shouldn't be a need to close this. I think there is a possibility that cluster can still attempt to write to this chan after Cleanup has been called(although very unlikely as shutdownDaemon is not fast).
There was a problem hiding this comment.
Thanks. Let me remove it.
There was a problem hiding this comment.
If it is not closed, the ProcessClusterNotifications goroutine will never exit.
There was a problem hiding this comment.
@aaronlehmann It should take a context then. Or the cluster.Cleanup() should close it after it knows there are no writes coming.
There was a problem hiding this comment.
Add a context to ProcessClusterNotifications.
cmd/dockerd/daemon.go
Outdated
There was a problem hiding this comment.
No need to change but wondering why you chose to pass in a channel and handle this in dockerd instead of defining a LogEvent interface in cluster, that would be called to process events.
There was a problem hiding this comment.
It was decided to add a watch API to expose cluster change. But do not add a separate LogEvent interface for simplicity.
moby/swarmkit#491 (comment)
There was a problem hiding this comment.
I meant interface in docker's cluster pkg like cluster.Backend cluster.NetworkSubnetsProvider that would be implemented by daemon.Daemon
There was a problem hiding this comment.
I'll look at it in a separate change.
41a6963 to
5264148
Compare
daemon/cluster/noderunner.go
Outdated
There was a problem hiding this comment.
Does this need to also select on the Done method of the context passed to ProcessClusterNotifications, so that this won't block forever if ProcessClusterNotifications exits due to context cancellation?
There was a problem hiding this comment.
I don't think it necessary to do that. c.Cleanup() at https://github.com/moby/moby/blob/master/cmd/dockerd/daemon.go#L278 would trigger return on this goroutine. The call path is Cluster.Cleanup() -> nodeRunner.Stop() -> set nodeRunner.closed -> nodeRunner.handleNodeExit() -> cancel nodeRunner.handleControlSocketChange context -> store watch API got cancelled.
There was a problem hiding this comment.
store watch API got cancelled
This will never happen if this function is blocked on the channel write instead of calling Recv. I don't see what prevents ProcessClusterNotifications from exiting before this function.
As an aside, I think it's bad for ProcessClusterNotifications to exit on either a channel close or a context cancellation. There should be only one way to shut it down.
There was a problem hiding this comment.
Let's use the context from nodeRunner.handleControlSocketChange to make sure it exits.
vdemeester
left a comment
There was a problem hiding this comment.
LGTM 🦁 😍
(one small question, but can't wait to have that !!)
daemon/events.go
Outdated
There was a problem hiding this comment.
We don't display labels changes, by design ?
There was a problem hiding this comment.
The attributes are extra information for an event. For example, a node update event 2017-04-06T18:07:01.620569322Z node update 0urgktyobae3etk5e4n4331es (availability.new=pause, availability.old=active, name=ip-172-19-147-51) contains a fix part timestamp node update nodeID and the attributes part availability.new, availability.old, name. We can add labels change to the attributes if they are useful.
Attributes are added to events in an ad-hoc way. Since there are a lot of information in swarm objects, revealing all changes may reduce readability. For example, a node goes thru a list of changes when joining a cluster, which is internal procedure and users shouldn't put much effort to inspect them.
I'm not very sure how this will be used. We are starting with basic events moby/swarmkit#491 (comment). I think it'll be extended based on user feedbacks.
There was a problem hiding this comment.
I've seen many tools using labels to add configuration that's used when listening for events - labels would be a likely candidate to add
Signed-off-by: Dong Chen <dongluo.chen@docker.com>
5264148 to
59d45c3
Compare
This PR is ready for review.
===
This is a work item in progress. Test needs to be added. It also needs a change in Swarmkit Watch API moby/swarmkit#2099. Initial design discussion is at moby/swarmkit#491. I'd like to get design feedback.
Signed-off-by: Dong Chen dongluo.chen@docker.com
- What I did
Add swarm events to docker event stream.
- How I did it
Use Swarm Store Watch API to get change notification from Raft store. Translate that into Docker event format and push it into Docker event structure.
- How to verify it
On a manager node,
docker eventsincludes node's local events and cluster events. Cluster events have a global scope while node's local events are local. Existing event filters apply to cluster events. A newscopefilter is added.- Description for the changelog
Add cluster events to Docker event stream.