Skip to content

gossip: log to net/trace events endpoint#8812

Merged
tbg merged 2 commits intocockroachdb:masterfrom
tbg:gossip-events
Aug 25, 2016
Merged

gossip: log to net/trace events endpoint#8812
tbg merged 2 commits intocockroachdb:masterfrom
tbg:gossip-events

Conversation

@tbg
Copy link
Copy Markdown
Member

@tbg tbg commented Aug 24, 2016

This has been helpful when debugging a local cluster which did not connect
and can't hurt for #7668 as well.

Add two net/trace event logs to Gossip, one which is used in various places
and one dedicated to the bootstrap() goroutine.

Example output

18:05:41.068322 . 69 ... have clients: false, have sentinel: false
18:05:41.068326 . 4 ... no next bootstrap address; currently bootstrapping: []
18:05:41.068370 . 45 ... sleeping 1s until bootstrap
18:05:42.071958 1.003588 ... idling until bootstrap required
18:05:43.400931 1.328973 ... detected stall; commencing bootstrap
18:05:43.400935 . 4 ... have clients: false, have sentinel: false
18:05:43.709402 .308467 ... sleeping 1s until bootstrap
18:05:44.711241 1.001839 ... idling until bootstrap required

18:05:41.066532 . 10 ... initial resolvers: []
18:05:41.066538 . 6 ... now stalled
18:05:41.445657 .379118 ... NodeID set to 4
18:05:43.400535 1.954878 ... NodeDescriptor set to node_id:4 address:<network_field:"tcp" address_field:"macts.local:26257" > attrs:<>
18:05:43.400877 . 342 ... add bootstrap {tcp macts.local:26263}
18:05:43.400881 . 4 ... add bootstrap {tcp macts.local:26259}
18:05:43.400886 . 5 ... add bootstrap {tcp macts.local:26261}
18:05:43.400891 . 5 ... add resolver &{tcp macts.local:26263}
18:05:43.400895 . 4 ... add resolver &{tcp macts.local:26259}
18:05:43.400897 . 1 ... add resolver &{tcp macts.local:26261}
18:05:43.709373 .308476 ... starting new client to macts.local:26259
18:05:47.164797 3.455424 ... connected
18:06:05.788452 18.623655 ... add resolver &{tcp macts.local:26259}
18:06:05.788494 . 42 ... add bootstrap {tcp macts.local:26259}
18:06:05.788499 . 5 ... add resolver &{tcp macts.local:26261}
18:06:05.788500 . 1 ... add bootstrap {tcp macts.local:26261}
18:06:05.788502 . 2 ... add resolver &{tcp macts.local:26263}
18:06:05.788503 . 1 ... add bootstrap {tcp macts.local:26263}
18:06:05.788644 . 141 ... connected
18:06:47.170653 41.382009 ... gossip status (ok, 4 nodes) gossip client (1/3 cur/max conns) 1: macts.local:26259 (1m3s: infos 10/20 sent/received, bytes 5662B/2740B sent/received) gossip server (2/3 cur/max conns, infos 78/82 sent/received, bytes 18681B/13985B sent/received) 2: macts.local:26261 (1m0s) 3: macts.local:26263 (59s)
18:07:47.167660 59.997007 ... gossip status (ok, 4 nodes) gossip client (1/3 cur/max conns) 1: macts.local:26259 (2m3s: infos 19/25 sent/received, bytes 6940B/4012B sent/received) gossip server (2/3 cur/max conns, infos 107/95 sent/received, bytes 22819B/16298B sent/received) 3: macts.local:26263 (1m59s) 2: macts.local:26261 (2m0s)
18:08:47.171295 60.003635 ... gossip status (ok, 4 nodes) gossip client (1/3 cur/max conns) 1: macts.local:26259 (3m3s: infos 34/32 sent/received, bytes 9066B/5370B sent/received) gossip server (2/3 cur/max conns, infos 150/114 sent/received, bytes 28911B/19450B sent/received) 3: macts.local:26263 (2m59s) 2: macts.local:26261 (3m0s)
18:09:47.171775 60.000480 ... gossip status (ok, 4 nodes) gossip client (1/3 cur/max conns) 1: macts.local:26259 (4m3s: infos 38/41 sent/received, bytes 9851B/6846B sent/received) gossip server (2/3 cur/max conns, infos 177/133 sent/received, bytes 33077B/22433B sent/received) 3: macts.local:26263 (3m59s) 2: macts.local:26261 (4m0s)
18:10:47.172169 60.000394 ... gossip status (ok, 4 nodes) gossip client (1/3 cur/max conns) 1: macts.local:26259 (5m3s: infos 41/51 sent/received, bytes 10471B/8232B sent/received) gossip server (2/3 cur/max conns, infos 204/152 sent/received, bytes 37181B/25165B sent/received) 2: macts.local:26261 (5m0s) 3: macts.local:26263 (4m59s)
18:11:47.169247 59.997079 ... gossip status (ok, 4 nodes) gossip client (1/3 cur/max conns) 1: macts.local:26259 (6m3s: infos 47/59 sent/received, bytes 11385B/9388B sent/received) gossip server (2/3 cur/max conns, infos 231/169 sent/received, bytes 40951B/27495B sent/received) 3: macts.local:26263 (5m59s) 2: macts.local:26261 (6m0s)


This change is Reviewable

@andreimatei
Copy link
Copy Markdown
Contributor

can you use this contraption instead of the EventLog directly?

type queueLog struct {

I was just talking to @RaduBerinde about figuring out something such that people wanting to log to such EventsLog can use the normal logging functions with a Context with a span from some magic Tracer in it such that the client is oblivious to the fact that it's going to a EventLog underneath.


Review status: 0 of 2 files reviewed at latest revision, all discussions resolved, all commit checks successful.


Comments from Reviewable

@knz
Copy link
Copy Markdown
Contributor

knz commented Aug 25, 2016

Reviewed 2 of 2 files at r1.
Review status: all files reviewed at latest revision, 1 unresolved discussion, all commit checks successful.


gossip/gossip.go, line 1035 [r1] (raw file):

          log.Infof(ctx, "node has connected to cluster via gossip")
          g.stalled = false
          g.signalConnectedLocked()

This is a logic change not related to logging, right? This call wasn't there before. Perhaps a comment is warranted to explain.


Comments from Reviewable

@RaduBerinde
Copy link
Copy Markdown
Member

Yeah - I plan to make EventLogging happen automatically through the Context when you log (just like tracing). That will be useful in a bunch of of other places. I will refactor this code then. In fact I welcome a simple usecase (the queueLog has some races that we need to fix first).


Review status: all files reviewed at latest revision, 1 unresolved discussion, all commit checks successful.


Comments from Reviewable

@RaduBerinde
Copy link
Copy Markdown
Member

We need to Finish() the EventLog, no? Otherwise it will stay open forever (problematic for tests)


Review status: all files reviewed at latest revision, 1 unresolved discussion, all commit checks successful.


Comments from Reviewable

@tbg
Copy link
Copy Markdown
Member Author

tbg commented Aug 25, 2016

I don't think we have to (no goroutine attached to it I think) but we should (Done).
Unless Andrei feels strongly, I'd like to go with what Radu proposed and leave the prettifying to him.


Review status: all files reviewed at latest revision, 1 unresolved discussion, all commit checks successful.


gossip/gossip.go, line 1035 [r1] (raw file):

Previously, knz (kena) wrote…

This is a logic change not related to logging, right? This call wasn't there before. Perhaps a comment is warranted to explain.

Agreed, I pulled out a commit and expanded somewhat.

Comments from Reviewable

@andreimatei
Copy link
Copy Markdown
Contributor

SGTM

On Thu, Aug 25, 2016 at 11:33 AM, Tobias Schottdorf <
notifications@github.com> wrote:

I don't think we have to (no goroutine attached to it I think) but we
should (Done).
Unless Andrei feels strongly, I'd like to go with what Radu proposed and

leave the prettifying to him.

Review status: all files reviewed at latest revision, 1 unresolved

discussion, all commit checks successful.

gossip/gossip.go, line 1035 [r1]
https://reviewable.io:443/reviews/cockroachdb/cockroach/8812#-KQ08PI-TtuhUhiIFptF:-KQ1K7i78dfcpnxUxzdL:b-2i779g
(raw file

g.signalConnectedLocked()
):
Previously, knz (kena) wrote…

This is a logic change not related to logging, right? This call wasn't
there before. Perhaps a comment is warranted to explain.

Agreed, I pulled out a commit and expanded somewhat.


Comments from Reviewable
https://reviewable.io:443/reviews/cockroachdb/cockroach/8812#-:-KQ0x0ae22Fkou3uUOn5:bwksa16


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#8812 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAXBcToNCxrgtRmPLk-RgMWKHuIVmIyMks5qjbW5gaJpZM4Jshsb
.

@tbg tbg force-pushed the gossip-events branch 2 times, most recently from 5df52a6 to ea7bbea Compare August 25, 2016 16:43
tbg added 2 commits August 25, 2016 12:43
This has been helpful when debugging a local cluster which did not connect
and can't hurt for cockroachdb#7668 as well.

Add two net/trace event logs to `Gossip`, one which is used in various places
and one dedicated to the `bootstrap()` goroutine.

Sample output:

18:05:41.068322	 .    69	... have clients: false, have sentinel: false
18:05:41.068326	 .     4	... no next bootstrap address; currently bootstrapping: []
18:05:41.068370	 .    45	... sleeping 1s until bootstrap
18:05:42.071958	1.003588	... idling until bootstrap required
18:05:43.400931	1.328973	... detected stall; commencing bootstrap
18:05:43.400935	 .     4	... have clients: false, have sentinel: false
18:05:43.709402	 .308467	... sleeping 1s until bootstrap
18:05:44.711241	1.001839	... idling until bootstrap required

18:05:41.066532	 .    10	... initial resolvers: []
18:05:41.066538	 .     6	... now stalled
18:05:41.445657	 .379118	... NodeID set to 4
18:05:43.400535	1.954878	... NodeDescriptor set to node_id:4 address:<network_field:"tcp" address_field:"macts.local:26257" > attrs:<>
18:05:43.400877	 .   342	... add bootstrap {tcp macts.local:26263}
18:05:43.400881	 .     4	... add bootstrap {tcp macts.local:26259}
18:05:43.400886	 .     5	... add bootstrap {tcp macts.local:26261}
18:05:43.400891	 .     5	... add resolver &{tcp macts.local:26263}
18:05:43.400895	 .     4	... add resolver &{tcp macts.local:26259}
18:05:43.400897	 .     1	... add resolver &{tcp macts.local:26261}
18:05:43.709373	 .308476	... starting new client to macts.local:26259
18:05:47.164797	3.455424	... connected
18:06:05.788452	18.623655	... add resolver &{tcp macts.local:26259}
18:06:05.788494	 .    42	... add bootstrap {tcp macts.local:26259}
18:06:05.788499	 .     5	... add resolver &{tcp macts.local:26261}
18:06:05.788500	 .     1	... add bootstrap {tcp macts.local:26261}
18:06:05.788502	 .     2	... add resolver &{tcp macts.local:26263}
18:06:05.788503	 .     1	... add bootstrap {tcp macts.local:26263}
18:06:05.788644	 .   141	... connected
18:06:47.170653	41.382009	... gossip status (ok, 4 nodes) gossip client (1/3 cur/max conns) 1: macts.local:26259 (1m3s: infos 10/20 sent/received, bytes 5662B/2740B sent/received) gossip server (2/3 cur/max conns, infos 78/82 sent/received, bytes 18681B/13985B sent/received) 2: macts.local:26261 (1m0s) 3: macts.local:26263 (59s)
18:07:47.167660	59.997007	... gossip status (ok, 4 nodes) gossip client (1/3 cur/max conns) 1: macts.local:26259 (2m3s: infos 19/25 sent/received, bytes 6940B/4012B sent/received) gossip server (2/3 cur/max conns, infos 107/95 sent/received, bytes 22819B/16298B sent/received) 3: macts.local:26263 (1m59s) 2: macts.local:26261 (2m0s)
18:08:47.171295	60.003635	... gossip status (ok, 4 nodes) gossip client (1/3 cur/max conns) 1: macts.local:26259 (3m3s: infos 34/32 sent/received, bytes 9066B/5370B sent/received) gossip server (2/3 cur/max conns, infos 150/114 sent/received, bytes 28911B/19450B sent/received) 3: macts.local:26263 (2m59s) 2: macts.local:26261 (3m0s)
18:09:47.171775	60.000480	... gossip status (ok, 4 nodes) gossip client (1/3 cur/max conns) 1: macts.local:26259 (4m3s: infos 38/41 sent/received, bytes 9851B/6846B sent/received) gossip server (2/3 cur/max conns, infos 177/133 sent/received, bytes 33077B/22433B sent/received) 3: macts.local:26263 (3m59s) 2: macts.local:26261 (4m0s)
18:10:47.172169	60.000394	... gossip status (ok, 4 nodes) gossip client (1/3 cur/max conns) 1: macts.local:26259 (5m3s: infos 41/51 sent/received, bytes 10471B/8232B sent/received) gossip server (2/3 cur/max conns, infos 204/152 sent/received, bytes 37181B/25165B sent/received) 2: macts.local:26261 (5m0s) 3: macts.local:26263 (4m59s)
18:11:47.169247	59.997079	... gossip status (ok, 4 nodes) gossip client (1/3 cur/max conns) 1: macts.local:26259 (6m3s: infos 47/59 sent/received, bytes 11385B/9388B sent/received) gossip server (2/3 cur/max conns, infos 231/169 sent/received, bytes 40951B/27495B sent/received) 3: macts.local:26263 (5m59s) 2: macts.local:26261 (6m0s)
This method must be called "regularly" to close Gossip's `Connected` channel
when the first connection is made. It was previously called in the following
locations:

- AddInfo
- handleResponse

Neither of those seems particularly idiomatic (though those calls don't hurt).
The idiomatic location (the one where Gossip state transitions from stalled
to connected) is added here along with a TODO.
@tbg
Copy link
Copy Markdown
Member Author

tbg commented Aug 25, 2016

PTAL

@RaduBerinde
Copy link
Copy Markdown
Member

:lgtm:


Review status: 0 of 3 files reviewed at latest revision, 1 unresolved discussion, all commit checks successful.


Comments from Reviewable

@tbg tbg merged commit be284fe into cockroachdb:master Aug 25, 2016
@tbg tbg deleted the gossip-events branch August 25, 2016 17:10
@andreimatei
Copy link
Copy Markdown
Contributor

:lgtm:


Review status: 0 of 3 files reviewed at latest revision, 4 unresolved discussions, all commit checks successful.


gossip/gossip.go, line 218 [r2] (raw file):

      events:            trace.NewEventLog("gossip", "gossip"),
  }
  stopper.AddCloser(stop.CloserFn(g.events.Finish))

shouldn't this be inside the New fun above?


gossip/gossip.go, line 872 [r2] (raw file):

func (g *Gossip) bootstrap() {
  g.server.stopper.RunWorker(func() {
      events := trace.NewEventLog("gossip", "bootstrap")

do we really need a new EventLog for bootstrap? Can't you just print begin/end bootstrap events on the main EventLog?


gossip/gossip.go, line 1087 [r3] (raw file):

// (with the exception of the call bootstrap loop) yet necessary. Consolidate
// and add commentary at each callsite.
func (g *Gossip) signalConnectedLocked() {

signal*If*ConnectedLocked ?
or otherwise change the imperative tone of the method name


Comments from Reviewable

@tamird
Copy link
Copy Markdown
Contributor

tamird commented Aug 25, 2016

Reviewed 1 of 2 files at r2, 2 of 2 files at r3.
Review status: all files reviewed at latest revision, 4 unresolved discussions, all commit checks successful.


gossip/gossip.go, line 218 [r2] (raw file):

Previously, andreimatei (Andrei Matei) wrote…

shouldn't this be inside the New fun above?

Looks like it is in there?

Comments from Reviewable

@andreimatei
Copy link
Copy Markdown
Contributor

Review status: all files reviewed at latest revision, 4 unresolved discussions, all commit checks successful.


gossip/gossip.go, line 218 [r2] (raw file):

Previously, tamird (Tamir Duberstein) wrote…

Looks like it is in there?

yup, parse fail, sorry

Comments from Reviewable

@tbg
Copy link
Copy Markdown
Member Author

tbg commented Aug 25, 2016

Review status: all files reviewed at latest revision, 4 unresolved discussions, all commit checks successful.


gossip/gossip.go, line 872 [r2] (raw file):

Previously, andreimatei (Andrei Matei) wrote…

do we really need a new EventLog for bootstrap? Can't you just print begin/end bootstrap events on the main EventLog?

The idea was that when Gossip isn't connected, you really want to see the gist of the `bootstrap` loop which might otherwise be flushed out/obscured by stuff on the main thread. It can only hurt if we're likely to want to correlate messages between the two streams, which I think isn't as likely. It's an experiment - we'll see.

gossip/gossip.go, line 1087 [r3] (raw file):

Previously, andreimatei (Andrei Matei) wrote…

signal*If*ConnectedLocked ?
or otherwise change the imperative tone of the method name

yeah, should probably be `maybeSignalConnectedLocked` but then with the last commit I was trying to only call this if we're actually connected (but to get there I would have to invest time understanding Gossip internals again). I was too trigger happy, so it's too late to change now.

Comments from Reviewable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants