At Lyft we have a plethora of tools for monitoring Envoy that take advantage of the logging and stats output, which are aggregated for every host into a central cluster. Sometimes you just want to see what's happening on a box right now though. I hacked together a couple of command line tools for monitoring envoy in real-time. curl and grep against the admin endpoints in combination with watch is pretty useful at times so I decided to make a dedicated tool. The other benefit of the tool is that it can diff the gauge values to give you number per interval for stats like cluster.foo.upstream_rq_2xx.
Here's example output from my tool, similar to iostat, mpstat, vmstat, etc.
$ envoystat -p http.router.downstream 1
2018/01/16 envoy 8cf90bcb/1.6.0-dev/Modified/RELEASE live 354839 354839 0
08:12:44 PM cx_active' rq_active' rq_2xx rq_4xx rq_5xx rq_total
08:12:44 PM 420 40 266 2 0 315
08:12:45 PM 420 29 305 1 0 313
08:12:46 PM 421 35 274 3 0 314
08:12:47 PM 420 27 226 2 0 244
08:12:48 PM 421 21 221 2 0 241
08:12:49 PM 420 29 246 2 0 276
08:12:50 PM 420 35 284 4 0 314
08:12:51 PM 420 29 276 2 0 290
08:12:52 PM 420 28 242 7 0 275
08:12:53 PM 421 24 221 2 0 240
^C
I wrote it in Python about 30 minutes and it wouldn't take much to make it more generic and broadly useful for various deployments of Envoy. Also planning to have a mode that analyzes the local access log and outputs top values for various fields (user agent, IP, etc).
@mccv has also been working on some cli tools. Looking for some details from him and any other opinions before building mine out more.
At Lyft we have a plethora of tools for monitoring Envoy that take advantage of the logging and stats output, which are aggregated for every host into a central cluster. Sometimes you just want to see what's happening on a box right now though. I hacked together a couple of command line tools for monitoring envoy in real-time.
curlandgrepagainst the admin endpoints in combination withwatchis pretty useful at times so I decided to make a dedicated tool. The other benefit of the tool is that it can diff the gauge values to give you number per interval for stats likecluster.foo.upstream_rq_2xx.Here's example output from my tool, similar to
iostat,mpstat,vmstat, etc.I wrote it in Python about 30 minutes and it wouldn't take much to make it more generic and broadly useful for various deployments of Envoy. Also planning to have a mode that analyzes the local access log and outputs top values for various fields (user agent, IP, etc).
@mccv has also been working on some cli tools. Looking for some details from him and any other opinions before building mine out more.