Jaeger Distributed Tracing Enabled for GitLab.com
This is an epic to track the work for rolling out distributed tracing for Gitlab.com. Some high-level ideas have been captured by @joshlambert in https://gitlab.com/gitlab-org/gitlab/-/issues/214752. This epic exists to track more specific steps on the infra side. Broadly speaking, we need to: - [x] Evaluate Elastic APM vs Jaeger vs Stackdriver Trace - [x] Ensure the application is ready to send traces to the chosen destination - [x] Deploy proof-of-concept to gstg - [ ] Test and document failure scenarios - [ ] Decide on sample rates and retention - [ ] Roll out to production (https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/6965) 2021-09-27 writing down pros and cons from a recent discussion Stackdriver supports OpenTracing so it was easy to add support for it to Thanos and is likely to continue be supported in the future. There are some features missing in Stackdriver that are working great in Jaeger, for example, filtering and labeling in Jaeger is much, much more versatile than in Stackdriver. There's some overhead for running Jaeger+ES in-house compared to running a managed solution such as Stackdriver.
epic