tracing: Use 'operation_name' to determine whether client or server s…#1565
tracing: Use 'operation_name' to determine whether client or server s…#1565mattklein123 merged 4 commits intoenvoyproxy:masterfrom
Conversation
…pan and remove annotations from propagated OtSpanContext
| * @param start_time initial start time for the operation captured by the child | ||
| */ | ||
| virtual SpanPtr spawnChild(const std::string& name, SystemTime start_time) PURE; | ||
| virtual SpanPtr spawnChild(const Config& config, const std::string& name, |
| // Assumption: Span will have only one annotation when this method is called | ||
| SpanContext context(*this); | ||
| if (context.annotationSet().sr_ && !context.annotationSet().ss_) { | ||
| if (annotations_[0].value() == ZipkinCoreConstants::get().SERVER_RECV) { |
There was a problem hiding this comment.
is it guaranteed that annotations_ has at least one element?
There was a problem hiding this comment.
The existing code accesses the first element when setting the endpoint - and the comment at the top of the method suggests that a single annotation is expected.
| "config": { | ||
| "tracing": { | ||
| "operation_name": "ingress" | ||
| "operation_name": "egress" |
There was a problem hiding this comment.
can you also check Zipkin sample configuration
There was a problem hiding this comment.
I've ran the zipkin example - but would be good if someone else could run both examples to confirm.
There was a problem hiding this comment.
same as the zipkin q.. should this be ingress? because front envoy is the server. what am I missing?
|
@rshriram Last time I chatted with you about this, you seemed to think that the Envoy configuration generated by Istio uses the |
| id_ = span.id(); | ||
| parent_id_ = span.isSetParentId() ? span.parentId() : 0; | ||
|
|
||
| for (const Annotation& annotation : span.annotations()) { |
There was a problem hiding this comment.
You are removing the annotations from the Zipkin span context. Are you still propagating the x-ot-span-context header? Even though this header should eventually go away for the Zipkin-related code as part of issue #1364, I am wondering how you are currently handling it. It is not entirely clear to me if you have changed all parts of the Zipkin code related to that. Could you, please, comment on that?
There was a problem hiding this comment.
I am not removing the annotations themselves - they are still associated with the spans. The only thing that has been removed is the flags that indicate whether a particular annotation was set of the previous context.
The x-ot-span-context header is still being propagated as before - just without the additional annotation flags. This should actually make it easier to do #1364, as otherwise you would have to find another way to transfer these annotation flags, as the standard B3 headers don't carry that information.
| timestamp_micro = | ||
| std::chrono::duration_cast<std::chrono::microseconds>(timestamp.time_since_epoch()).count(); | ||
|
|
||
| if (previous_context.annotationSet().sr_ && !previous_context.annotationSet().cs_) { |
There was a problem hiding this comment.
Is it safe to disregard altogether the previous context when making this decision?
There was a problem hiding this comment.
In normal application tracing, the previous (parent) span kind is not taken into account. Spans are created based on reflecting what they represent - i.e. either a client or server span. In the case of Envoy, what dictates this is whether the communication represents ingress or egress.
There was a problem hiding this comment.
Agreed! This makes sense.
|
|
||
| // Set the timestamp globally for the span | ||
| span_ptr->setTimestamp(timestamp_micro); | ||
| } else if (previous_context.annotationSet().cs_ && !previous_context.annotationSet().sr_) { |
There was a problem hiding this comment.
Is it safe to disregard altogether the previous context when making this decision?
|
@objectiser Please, see my questions above. |
|
@RomanDzhabarov The changes look good to me, too. My only remaining concern is with the Istio-generated Envoy configuration. I need to check with @rshriram, as I am not yet convinced it is compatible with this change. |
mattklein123
left a comment
There was a problem hiding this comment.
small nits, otherwise LGTM. Will wait to hear from @rshriram on Istio.
include/envoy/tracing/http_tracer.h
Outdated
| * Create and start a child Span, with this Span as its parent in the trace. | ||
| * @param name operation name captured by the spawned child | ||
| * @param start_time initial start time for the operation captured by the child | ||
| * @param config the tracing configuration |
There was a problem hiding this comment.
nit: please order correctly. (config comes first)
| } | ||
|
|
||
| const std::string SpanContext::serializeToString() { | ||
|
|
There was a problem hiding this comment.
nit: no need to del newline
| * @param start_time The time indicating the beginning of the span. | ||
| */ | ||
| SpanPtr startSpan(const std::string& span_name, SystemTime timestamp); | ||
| SpanPtr startSpan(const Tracing::Config&, const std::string& span_name, SystemTime timestamp); |
There was a problem hiding this comment.
nit: update doc comments. Same below (and in potentially other places).
rshriram
left a comment
There was a problem hiding this comment.
There is an open bug to correctly generate the operation name in Istio, but we will get that fixed before we get to this SHA.
few nits on ingress->egress.
One other question I had is that this PR is predicated on the fact that a listener is either ingress or egress but not both. What if a user uses the same listener for ingress and egress? In this case, the operation name really has little benefit.
| "tracing": { | ||
| "operation_name": "ingress" | ||
| "operation_name": "egress" | ||
| }, |
There was a problem hiding this comment.
shouldn't this be ingress? This is for front envoy (user facing).. on port 80.
There was a problem hiding this comment.
Prior to this change, the front-envoy generated a client span which indicates an outbound request - which is egress.
I guess I view the front-envoy as being a proxy for the user (i.e. an api gateway) - so in that respect it is representing the egress from that user app.
If we wanted the front-envoy to be reflected as a separate (api gateway) service in its own right, then possibly we need a different operation name that represents both the receiving of the request (ingress) and the routing of that request to the actual service (egress) - but from the "application" level I think this front-envoy is just acting as a 'proxy' for the user app.
There was a problem hiding this comment.
Maybe the solution would be to move away from the ingress/egress terms, to instead use more tracing terminology - i.e. "span_kind": "client" (or "server") - but there would need to be a transition period where the span kind would need to be inferred from the "operation_name".
| "config": { | ||
| "tracing": { | ||
| "operation_name": "ingress" | ||
| "operation_name": "egress" |
There was a problem hiding this comment.
same as the zipkin q.. should this be ingress? because front envoy is the server. what am I missing?
|
@rshriram Could you provide a concrete example of whether a single listener would be used both for ingress and egress? As I think this would be a challenge for tracing with or without this change. |
|
On Wed, Aug 30, 2017 at 4:53 AM Gary Brown ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In examples/zipkin-tracing/front-envoy-zipkin.json
<#1565 (comment)>:
> @@ -9,7 +9,7 @@
"config": {
"generate_request_id": true,
"tracing": {
- "operation_name": "ingress"
+ "operation_name": "egress"
},
Prior to this change, the front-envoy generated a client span which
indicates an outbound request - which is egress.
I guess I view the front-envoy as being a proxy for the user (i.e. an api
gateway) - so in that respect it is representing the egress from that user
app.
If we wanted the front-envoy to be reflected as a separate (api gateway)
service in its own right, then possibly we need a different operation name
that represents both the receiving of the request (ingress) and the routing
of that request to the actual service (egress) - but from the "application"
level I think this front-envoy is just acting as a 'proxy' for the user app.
From an end user perspective, front Envoy is as much a part of the mesh as
service envoys (same as kube ingress or API gateway). So, it receives
requests from users. (Server receive). Agree that there are multiple
perspectives but what's the default one that a "non expert" user expect?
I believe that generating a client send span from front Envoy is incorrect.
It needs to be server receive. And your pr is the opportune moment to fix
this imo. i.e keep this as ingress.
Thoughts ?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1565 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AH0qd3ko0RQzT0Bhy4IznlRpHOWH5Watks5sdSL8gaJpZM4PGR6N>
.
--
~shriram
|
|
I was viewing the situation from the perspective of what users would see if there was no service mesh, and therefore the application was directly instrumented. In this case, they would just see 'service1' and 'service2' - so the first (root) span would be the server span for service1. I see the service mesh as an enabler, a way to remove the tracing from the application itself, but still providing the same results. So possibly the change that needs to happen is not to include the tracing from the front-envoy, as it is only effectively an extension of the network carrying the request from the actual client application to the actual service? |
|
On Wed, Aug 30, 2017 at 4:54 AM Gary Brown ***@***.***> wrote:
@rshriram <https://github.com/rshriram> Could you provide a concrete
example of whether a single listener would be used both for ingress and
egress? As I think this would be a challenge for tracing with or without
this change.
Won't the annotations or tags help here?
A simple example is one where I deploy Envoy in all my VMs listening on
port 80. I use virtual hosts to route requests to service in local VM or
services in other VMs. Inbound requests hit vm on port 80, get routed to
local app. Outbound requests from app go to local Envoy port 80 (imagine
HTTP_PROXY), and get routed appropriately.
What I am trying to ask is if we can use a combination of annotations and
the operation names to generate the right span ? In Envoy terms, this
translates to using the annotations if the x-ot-.. header indicates
cs/cr/ss/sr. If it's absent, derive the span type from the operation name
(ingress or egress).
Will this be too complicated? There is always a workaround where we force
people to use distinct listeners (and hence ports) for ingress and egress,
as lyft is doing today. I am totally okay if you think this will result in
complicated implementation. :)
Extending this concept, you could potentially use the x-ot-... header as an
optional hint to Envoy to generate the right type of span. If clients omit
it, we derive based on operation name or some sensible defaults.
@adriancole/@fabolive thoughts ?
--
~shriram
|
+1 would prefer this distinct listener approach - as otherwise I think it does get too complicated. |
|
@rshriram Out of curiosity I changed the front-envoy back to ingress and ran a test on zipkin and jaeger and here is the results: So as you can see there are two sets of server spans - and the timeline view shows front-envoy instead of service1 - I guess the name is based on the first server annotation it finds. The Jaeger UI misses out the service1 server span. So I think the options currently are:
The above three options are possible now without any code change - only config change - so the user could decide which strategy they want.
|
|
Yikes. If you change to egress, do you see a full cycle ?
On Wed, Aug 30, 2017 at 9:43 AM Gary Brown ***@***.***> wrote:
@rshriram <https://github.com/rshriram> Out of curiosity I changed the
front-envoy back to ingress and ran a test on zipkin and jaeger and here is
the results:
[image: envoy-ingress-zipkin]
<https://user-images.githubusercontent.com/164562/29874718-8c398a22-8d8f-11e7-88f9-837c138c99e3.png>
So as you can see there are two sets of server spans - and the timeline
view shows front-envoy instead of service1 - I guess the name is based on
the first server annotation it finds.
[image: envoy-ingress-jaeger]
<https://user-images.githubusercontent.com/164562/29874721-901586a0-8d8f-11e7-8047-9c07a664bf15.png>
The Jaeger UI misses out the service1 server span.
So I think the options currently are:
1.
Within front-envoy, use the "operation_name" with "egress" which
generates tracing emulating the client application (i.e. a "client span").
2.
Within front-envoy, use the "operation_name" with "ingress" which
essentially generates a double server span - one at the front-envoy and the
other at the service.
3.
Don't add tracing to front-envoy, so the tracing only begins (assuming
that tracing context is not passed in from the client) when the request is
received at the service (proxy).
The above three options are possible now without any code change - only
config change - so the user could decide which strategy they want.
1. Variation of (2) - instead of just creating server span at the
front-envoy, also create a client span for the request going from the
front-envoy to the service proxy. However this then introduces tracing of
the infrastructure, which may or may not be what a user (who is primarily
interested in their application) is interested in.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1565 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AH0qd0AEFYQzeg1iJd-yH4O2Ueb4kPuLks5sdWbkgaJpZM4PGR6N>
.
--
~shriram
|
|
Yes, when set to "egress" it acts as before, creating a "client span" for the front-envoy. |
Description: Conclusion for two tests requiring more investigation Risk Level: None Testing: yes Docs Changes: N/A Release Notes: N/A Signed-off-by: Charles Le Borgne <cleborgne@google.com> Signed-off-by: JP Simard <jp@jpsim.com>
Description: Conclusion for two tests requiring more investigation Risk Level: None Testing: yes Docs Changes: N/A Release Notes: N/A Signed-off-by: Charles Le Borgne <cleborgne@google.com> Signed-off-by: JP Simard <jp@jpsim.com>


…pan and remove annotations from propagated OtSpanContext
This is a purely internal change that should not affect any users. It is primarily to localise the decision regarding whether a client or server span should be created based on the tracing configuration associated with the filter - i.e. ingress = server span, egress = client span.
Currently this decision is based on what annotations have previously been set and passed with the OtSpanContext header.