tracing: Use 'operation_name' to determine whether client or server s… by objectiser · Pull Request #1565 · envoyproxy/envoy

objectiser · 2017-08-29T17:19:34Z

…pan and remove annotations from propagated OtSpanContext

This is a purely internal change that should not affect any users. It is primarily to localise the decision regarding whether a client or server span should be created based on the tracing configuration associated with the filter - i.e. ingress = server span, egress = client span.

Currently this decision is based on what annotations have previously been set and passed with the OtSpanContext header.

…pan and remove annotations from propagated OtSpanContext

RomanDzhabarov

LGTM, but we do not run Zipkin internally (only LightStep)

@fabolive @rshriram on any implications to Istio

RomanDzhabarov · 2017-08-29T17:30:57Z

include/envoy/tracing/http_tracer.h

   * @param start_time initial start time for the operation captured by the child
   */
-  virtual SpanPtr spawnChild(const std::string& name, SystemTime start_time) PURE;
+  virtual SpanPtr spawnChild(const Config& config, const std::string& name,


update comment with the new @param

RomanDzhabarov · 2017-08-29T17:33:26Z

source/common/tracing/zipkin/zipkin_core_types.cc

  // Assumption: Span will have only one annotation when this method is called
  SpanContext context(*this);
-  if (context.annotationSet().sr_ && !context.annotationSet().ss_) {
+  if (annotations_[0].value() == ZipkinCoreConstants::get().SERVER_RECV) {


is it guaranteed that annotations_ has at least one element?

The existing code accesses the first element when setting the endpoint - and the comment at the top of the method suggests that a single annotation is expected.

Indeed, that is safe.

RomanDzhabarov · 2017-08-29T17:52:50Z

examples/jaeger-tracing/front-envoy-jaeger.json

          "config": {
            "tracing": {
-              "operation_name": "ingress"
+              "operation_name": "egress"


can you also check Zipkin sample configuration

I've ran the zipkin example - but would be good if someone else could run both examples to confirm.

same as the zipkin q.. should this be ingress? because front envoy is the server. what am I missing?

fabolive · 2017-08-29T18:11:32Z

@rshriram Last time I chatted with you about this, you seemed to think that the Envoy configuration generated by Istio uses the operation_name correctly, that is, "ingress" and "egress" as expected. Are you sure?
I am little nervous about this change.

fabolive · 2017-08-29T18:25:29Z

source/common/tracing/zipkin/span_context.cc

  id_ = span.id();
  parent_id_ = span.isSetParentId() ? span.parentId() : 0;

-  for (const Annotation& annotation : span.annotations()) {


You are removing the annotations from the Zipkin span context. Are you still propagating the x-ot-span-context header? Even though this header should eventually go away for the Zipkin-related code as part of issue #1364, I am wondering how you are currently handling it. It is not entirely clear to me if you have changed all parts of the Zipkin code related to that. Could you, please, comment on that?

I am not removing the annotations themselves - they are still associated with the spans. The only thing that has been removed is the flags that indicate whether a particular annotation was set of the previous context.

The x-ot-span-context header is still being propagated as before - just without the additional annotation flags. This should actually make it easier to do #1364, as otherwise you would have to find another way to transfer these annotation flags, as the standard B3 headers don't carry that information.

fabolive · 2017-08-29T18:34:48Z

source/common/tracing/zipkin/tracer.cc

  timestamp_micro =
      std::chrono::duration_cast<std::chrono::microseconds>(timestamp.time_since_epoch()).count();

-  if (previous_context.annotationSet().sr_ && !previous_context.annotationSet().cs_) {


Is it safe to disregard altogether the previous context when making this decision?

In normal application tracing, the previous (parent) span kind is not taken into account. Spans are created based on reflecting what they represent - i.e. either a client or server span. In the case of Envoy, what dictates this is whether the communication represents ingress or egress.

Agreed! This makes sense.

fabolive · 2017-08-29T18:35:03Z

source/common/tracing/zipkin/tracer.cc


    // Set the timestamp globally for the span
    span_ptr->setTimestamp(timestamp_micro);
-  } else if (previous_context.annotationSet().cs_ && !previous_context.annotationSet().sr_) {


Is it safe to disregard altogether the previous context when making this decision?

fabolive · 2017-08-29T18:35:50Z

@objectiser Please, see my questions above.

fabolive · 2017-08-29T18:51:14Z

@RomanDzhabarov The changes look good to me, too. My only remaining concern is with the Istio-generated Envoy configuration. I need to check with @rshriram, as I am not yet convinced it is compatible with this change.

mattklein123

small nits, otherwise LGTM. Will wait to hear from @rshriram on Istio.

mattklein123 · 2017-08-29T18:54:13Z

include/envoy/tracing/http_tracer.h

   * Create and start a child Span, with this Span as its parent in the trace.
   * @param name operation name captured by the spawned child
   * @param start_time initial start time for the operation captured by the child
+   * @param config the tracing configuration


nit: please order correctly. (config comes first)

Yep - my bad :)

mattklein123 · 2017-08-29T18:54:35Z

source/common/tracing/zipkin/span_context.cc

 }

 const std::string SpanContext::serializeToString() {
+


nit: no need to del newline

mattklein123 · 2017-08-29T18:55:36Z

source/common/tracing/zipkin/tracer.h

   * @param start_time The time indicating the beginning of the span.
   */
-  SpanPtr startSpan(const std::string& span_name, SystemTime timestamp);
+  SpanPtr startSpan(const Tracing::Config&, const std::string& span_name, SystemTime timestamp);


nit: update doc comments. Same below (and in potentially other places).

rshriram

There is an open bug to correctly generate the operation name in Istio, but we will get that fixed before we get to this SHA.

few nits on ingress->egress.

One other question I had is that this PR is predicated on the fact that a listener is either ingress or egress but not both. What if a user uses the same listener for ingress and egress? In this case, the operation name really has little benefit.

rshriram · 2017-08-29T19:01:17Z

examples/zipkin-tracing/front-envoy-zipkin.json

            "tracing": {
-              "operation_name": "ingress"
+              "operation_name": "egress"
            }, 


shouldn't this be ingress? This is for front envoy (user facing).. on port 80.

Prior to this change, the front-envoy generated a client span which indicates an outbound request - which is egress.

I guess I view the front-envoy as being a proxy for the user (i.e. an api gateway) - so in that respect it is representing the egress from that user app.

If we wanted the front-envoy to be reflected as a separate (api gateway) service in its own right, then possibly we need a different operation name that represents both the receiving of the request (ingress) and the routing of that request to the actual service (egress) - but from the "application" level I think this front-envoy is just acting as a 'proxy' for the user app.

Maybe the solution would be to move away from the ingress/egress terms, to instead use more tracing terminology - i.e. "span_kind": "client" (or "server") - but there would need to be a transition period where the span kind would need to be inferred from the "operation_name".

rshriram · 2017-08-29T19:02:19Z

examples/jaeger-tracing/front-envoy-jaeger.json

          "config": {
            "tracing": {
-              "operation_name": "ingress"
+              "operation_name": "egress"


same as the zipkin q.. should this be ingress? because front envoy is the server. what am I missing?

objectiser · 2017-08-30T08:54:23Z

@rshriram Could you provide a concrete example of whether a single listener would be used both for ingress and egress? As I think this would be a challenge for tracing with or without this change.

rshriram · 2017-08-30T11:42:00Z

On Wed, Aug 30, 2017 at 4:53 AM Gary Brown ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In examples/zipkin-tracing/front-envoy-zipkin.json <#1565 (comment)>: > @@ -9,7 +9,7 @@ "config": { "generate_request_id": true, "tracing": { - "operation_name": "ingress" + "operation_name": "egress" }, Prior to this change, the front-envoy generated a client span which indicates an outbound request - which is egress. I guess I view the front-envoy as being a proxy for the user (i.e. an api gateway) - so in that respect it is representing the egress from that user app. If we wanted the front-envoy to be reflected as a separate (api gateway) service in its own right, then possibly we need a different operation name that represents both the receiving of the request (ingress) and the routing of that request to the actual service (egress) - but from the "application" level I think this front-envoy is just acting as a 'proxy' for the user app.

From an end user perspective, front Envoy is as much a part of the mesh as service envoys (same as kube ingress or API gateway). So, it receives requests from users. (Server receive). Agree that there are multiple perspectives but what's the default one that a "non expert" user expect? I believe that generating a client send span from front Envoy is incorrect. It needs to be server receive. And your pr is the opportune moment to fix this imo. i.e keep this as ingress. Thoughts ? —

You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1565 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AH0qd3ko0RQzT0Bhy4IznlRpHOWH5Watks5sdSL8gaJpZM4PGR6N> .

-- ~shriram

objectiser · 2017-08-30T12:47:08Z

I was viewing the situation from the perspective of what users would see if there was no service mesh, and therefore the application was directly instrumented. In this case, they would just see 'service1' and 'service2' - so the first (root) span would be the server span for service1.

I see the service mesh as an enabler, a way to remove the tracing from the application itself, but still providing the same results. So possibly the change that needs to happen is not to include the tracing from the front-envoy, as it is only effectively an extension of the network carrying the request from the actual client application to the actual service?

rshriram · 2017-08-30T12:51:24Z

On Wed, Aug 30, 2017 at 4:54 AM Gary Brown ***@***.***> wrote: @rshriram <https://github.com/rshriram> Could you provide a concrete example of whether a single listener would be used both for ingress and egress? As I think this would be a challenge for tracing with or without this change.

Won't the annotations or tags help here? A simple example is one where I deploy Envoy in all my VMs listening on port 80. I use virtual hosts to route requests to service in local VM or services in other VMs. Inbound requests hit vm on port 80, get routed to local app. Outbound requests from app go to local Envoy port 80 (imagine HTTP_PROXY), and get routed appropriately. What I am trying to ask is if we can use a combination of annotations and the operation names to generate the right span ? In Envoy terms, this translates to using the annotations if the x-ot-.. header indicates cs/cr/ss/sr. If it's absent, derive the span type from the operation name (ingress or egress). Will this be too complicated? There is always a workaround where we force people to use distinct listeners (and hence ports) for ingress and egress, as lyft is doing today. I am totally okay if you think this will result in complicated implementation. :) Extending this concept, you could potentially use the x-ot-... header as an optional hint to Envoy to generate the right type of span. If clients omit it, we derive based on operation name or some sensible defaults. @adriancole/@fabolive thoughts ?

--

~shriram

objectiser · 2017-08-30T13:03:36Z

There is always a workaround where we force people to use distinct listeners (and hence ports) for ingress and egress, as lyft is doing today. I am totally okay if you think this will result in complicated implementation. :)

+1 would prefer this distinct listener approach - as otherwise I think it does get too complicated.

objectiser · 2017-08-30T13:42:43Z

@rshriram Out of curiosity I changed the front-envoy back to ingress and ran a test on zipkin and jaeger and here is the results:

So as you can see there are two sets of server spans - and the timeline view shows front-envoy instead of service1 - I guess the name is based on the first server annotation it finds.

The Jaeger UI misses out the service1 server span.

So I think the options currently are:

Within front-envoy, use the "operation_name" with "egress" which generates tracing emulating the client application (i.e. a "client span").
Within front-envoy, use the "operation_name" with "ingress" which essentially generates a double server span - one at the front-envoy and the other at the service.
Don't add tracing to front-envoy, so the tracing only begins (assuming that tracing context is not passed in from the client) when the request is received at the service (proxy).

The above three options are possible now without any code change - only config change - so the user could decide which strategy they want.

Variation of (2) - instead of just creating server span at the front-envoy, also create a client span for the request going from the front-envoy to the service proxy. However this then introduces tracing of the infrastructure, which may or may not be what a user (who is primarily interested in their application) is interested in.

rshriram · 2017-08-30T17:45:31Z

Yikes. If you change to egress, do you see a full cycle ?

On Wed, Aug 30, 2017 at 9:43 AM Gary Brown ***@***.***> wrote: @rshriram <https://github.com/rshriram> Out of curiosity I changed the front-envoy back to ingress and ran a test on zipkin and jaeger and here is the results: [image: envoy-ingress-zipkin] <https://user-images.githubusercontent.com/164562/29874718-8c398a22-8d8f-11e7-88f9-837c138c99e3.png> So as you can see there are two sets of server spans - and the timeline view shows front-envoy instead of service1 - I guess the name is based on the first server annotation it finds. [image: envoy-ingress-jaeger] <https://user-images.githubusercontent.com/164562/29874721-901586a0-8d8f-11e7-8047-9c07a664bf15.png> The Jaeger UI misses out the service1 server span. So I think the options currently are: 1. Within front-envoy, use the "operation_name" with "egress" which generates tracing emulating the client application (i.e. a "client span"). 2. Within front-envoy, use the "operation_name" with "ingress" which essentially generates a double server span - one at the front-envoy and the other at the service. 3. Don't add tracing to front-envoy, so the tracing only begins (assuming that tracing context is not passed in from the client) when the request is received at the service (proxy). The above three options are possible now without any code change - only config change - so the user could decide which strategy they want. 1. Variation of (2) - instead of just creating server span at the front-envoy, also create a client span for the request going from the front-envoy to the service proxy. However this then introduces tracing of the infrastructure, which may or may not be what a user (who is primarily interested in their application) is interested in. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1565 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AH0qd0AEFYQzeg1iJd-yH4O2Ueb4kPuLks5sdWbkgaJpZM4PGR6N> .

-- ~shriram

objectiser · 2017-08-30T18:11:04Z

Yes, when set to "egress" it acts as before, creating a "client span" for the front-envoy.

Description: Conclusion for two tests requiring more investigation Risk Level: None Testing: yes Docs Changes: N/A Release Notes: N/A Signed-off-by: Charles Le Borgne <cleborgne@google.com> Signed-off-by: JP Simard <jp@jpsim.com>

**Description** This fixes the inconsistency for embeddings in the span usage in translators. **Related Issues/PRs (if applicable)** Preparation for #90 Follow up on #1565 --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

#1568) **Description** This fixes the inconsistency of span usage by image gen and rerank translators **Related Issues/PRs (if applicable)** Preparation for #90 Follow up on #1565 #1567 Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

tracing: Use 'operation_name' to determine whether client or server s…

949d85e

…pan and remove annotations from propagated OtSpanContext

RomanDzhabarov reviewed Aug 29, 2017

View reviewed changes

Add @param comment for tracing config

c14c606

fabolive reviewed Aug 29, 2017

View reviewed changes

mattklein123 reviewed Aug 29, 2017

View reviewed changes

Fix parameter comments

45e2c58

rshriram reviewed Aug 29, 2017

View reviewed changes

Fix server recv annotation

f11c419

rshriram approved these changes Aug 30, 2017

View reviewed changes

mattklein123 approved these changes Aug 30, 2017

View reviewed changes

mattklein123 merged commit 04d2f9b into envoyproxy:master Aug 30, 2017

This was referenced Sep 20, 2017

Zipkin tracer shouldn't always create a new span with a 'cs' annotation #1497

Closed

Distributed tracing - spans not captured istio/old_issues_repo#72

Closed

objectiser deleted the options branch September 26, 2017 15:09

mathetake added a commit that referenced this pull request Mar 3, 2026

refactor(translator): consolidates interfaces (#1565)

e539df1

Conversation

objectiser commented Aug 29, 2017

Uh oh!

RomanDzhabarov left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fabolive commented Aug 29, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fabolive commented Aug 29, 2017

Uh oh!

fabolive commented Aug 29, 2017

Uh oh!

mattklein123 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rshriram left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

objectiser commented Aug 30, 2017

Uh oh!

rshriram commented Aug 30, 2017 via email

Uh oh!

objectiser commented Aug 30, 2017

Uh oh!

rshriram commented Aug 30, 2017 via email

Uh oh!

objectiser commented Aug 30, 2017

Uh oh!

objectiser commented Aug 30, 2017

Uh oh!

rshriram commented Aug 30, 2017 via email

Uh oh!

objectiser commented Aug 30, 2017

Uh oh!