Skip to content

tracing: Use 'operation_name' to determine whether client or server s…#1565

Merged
mattklein123 merged 4 commits intoenvoyproxy:masterfrom
objectiser:options
Aug 30, 2017
Merged

tracing: Use 'operation_name' to determine whether client or server s…#1565
mattklein123 merged 4 commits intoenvoyproxy:masterfrom
objectiser:options

Conversation

@objectiser
Copy link
Copy Markdown
Contributor

…pan and remove annotations from propagated OtSpanContext

This is a purely internal change that should not affect any users. It is primarily to localise the decision regarding whether a client or server span should be created based on the tracing configuration associated with the filter - i.e. ingress = server span, egress = client span.

Currently this decision is based on what annotations have previously been set and passed with the OtSpanContext header.

…pan and remove annotations from propagated OtSpanContext
Copy link
Copy Markdown
Member

@RomanDzhabarov RomanDzhabarov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but we do not run Zipkin internally (only LightStep)

@fabolive @rshriram on any implications to Istio

* @param start_time initial start time for the operation captured by the child
*/
virtual SpanPtr spawnChild(const std::string& name, SystemTime start_time) PURE;
virtual SpanPtr spawnChild(const Config& config, const std::string& name,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update comment with the new @param

// Assumption: Span will have only one annotation when this method is called
SpanContext context(*this);
if (context.annotationSet().sr_ && !context.annotationSet().ss_) {
if (annotations_[0].value() == ZipkinCoreConstants::get().SERVER_RECV) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it guaranteed that annotations_ has at least one element?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing code accesses the first element when setting the endpoint - and the comment at the top of the method suggests that a single annotation is expected.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, that is safe.

"config": {
"tracing": {
"operation_name": "ingress"
"operation_name": "egress"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you also check Zipkin sample configuration

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've ran the zipkin example - but would be good if someone else could run both examples to confirm.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as the zipkin q.. should this be ingress? because front envoy is the server. what am I missing?

@fabolive
Copy link
Copy Markdown
Contributor

@rshriram Last time I chatted with you about this, you seemed to think that the Envoy configuration generated by Istio uses the operation_name correctly, that is, "ingress" and "egress" as expected. Are you sure?
I am little nervous about this change.

id_ = span.id();
parent_id_ = span.isSetParentId() ? span.parentId() : 0;

for (const Annotation& annotation : span.annotations()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are removing the annotations from the Zipkin span context. Are you still propagating the x-ot-span-context header? Even though this header should eventually go away for the Zipkin-related code as part of issue #1364, I am wondering how you are currently handling it. It is not entirely clear to me if you have changed all parts of the Zipkin code related to that. Could you, please, comment on that?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not removing the annotations themselves - they are still associated with the spans. The only thing that has been removed is the flags that indicate whether a particular annotation was set of the previous context.

The x-ot-span-context header is still being propagated as before - just without the additional annotation flags. This should actually make it easier to do #1364, as otherwise you would have to find another way to transfer these annotation flags, as the standard B3 headers don't carry that information.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed.

timestamp_micro =
std::chrono::duration_cast<std::chrono::microseconds>(timestamp.time_since_epoch()).count();

if (previous_context.annotationSet().sr_ && !previous_context.annotationSet().cs_) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it safe to disregard altogether the previous context when making this decision?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In normal application tracing, the previous (parent) span kind is not taken into account. Spans are created based on reflecting what they represent - i.e. either a client or server span. In the case of Envoy, what dictates this is whether the communication represents ingress or egress.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed! This makes sense.


// Set the timestamp globally for the span
span_ptr->setTimestamp(timestamp_micro);
} else if (previous_context.annotationSet().cs_ && !previous_context.annotationSet().sr_) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it safe to disregard altogether the previous context when making this decision?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above.

@fabolive
Copy link
Copy Markdown
Contributor

@objectiser Please, see my questions above.

@fabolive
Copy link
Copy Markdown
Contributor

@RomanDzhabarov The changes look good to me, too. My only remaining concern is with the Istio-generated Envoy configuration. I need to check with @rshriram, as I am not yet convinced it is compatible with this change.

Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small nits, otherwise LGTM. Will wait to hear from @rshriram on Istio.

* Create and start a child Span, with this Span as its parent in the trace.
* @param name operation name captured by the spawned child
* @param start_time initial start time for the operation captured by the child
* @param config the tracing configuration
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please order correctly. (config comes first)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep - my bad :)

}

const std::string SpanContext::serializeToString() {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: no need to del newline

* @param start_time The time indicating the beginning of the span.
*/
SpanPtr startSpan(const std::string& span_name, SystemTime timestamp);
SpanPtr startSpan(const Tracing::Config&, const std::string& span_name, SystemTime timestamp);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: update doc comments. Same below (and in potentially other places).

Copy link
Copy Markdown
Member

@rshriram rshriram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an open bug to correctly generate the operation name in Istio, but we will get that fixed before we get to this SHA.

few nits on ingress->egress.

One other question I had is that this PR is predicated on the fact that a listener is either ingress or egress but not both. What if a user uses the same listener for ingress and egress? In this case, the operation name really has little benefit.

"tracing": {
"operation_name": "ingress"
"operation_name": "egress"
},
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be ingress? This is for front envoy (user facing).. on port 80.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prior to this change, the front-envoy generated a client span which indicates an outbound request - which is egress.

I guess I view the front-envoy as being a proxy for the user (i.e. an api gateway) - so in that respect it is representing the egress from that user app.

If we wanted the front-envoy to be reflected as a separate (api gateway) service in its own right, then possibly we need a different operation name that represents both the receiving of the request (ingress) and the routing of that request to the actual service (egress) - but from the "application" level I think this front-envoy is just acting as a 'proxy' for the user app.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the solution would be to move away from the ingress/egress terms, to instead use more tracing terminology - i.e. "span_kind": "client" (or "server") - but there would need to be a transition period where the span kind would need to be inferred from the "operation_name".

"config": {
"tracing": {
"operation_name": "ingress"
"operation_name": "egress"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as the zipkin q.. should this be ingress? because front envoy is the server. what am I missing?

@objectiser
Copy link
Copy Markdown
Contributor Author

@rshriram Could you provide a concrete example of whether a single listener would be used both for ingress and egress? As I think this would be a challenge for tracing with or without this change.

@rshriram
Copy link
Copy Markdown
Member

rshriram commented Aug 30, 2017 via email

@objectiser
Copy link
Copy Markdown
Contributor Author

I was viewing the situation from the perspective of what users would see if there was no service mesh, and therefore the application was directly instrumented. In this case, they would just see 'service1' and 'service2' - so the first (root) span would be the server span for service1.

I see the service mesh as an enabler, a way to remove the tracing from the application itself, but still providing the same results. So possibly the change that needs to happen is not to include the tracing from the front-envoy, as it is only effectively an extension of the network carrying the request from the actual client application to the actual service?

@rshriram
Copy link
Copy Markdown
Member

rshriram commented Aug 30, 2017 via email

@objectiser
Copy link
Copy Markdown
Contributor Author

There is always a workaround where we force people to use distinct listeners (and hence ports) for ingress and egress, as lyft is doing today. I am totally okay if you think this will result in complicated implementation. :)

+1 would prefer this distinct listener approach - as otherwise I think it does get too complicated.

@objectiser
Copy link
Copy Markdown
Contributor Author

@rshriram Out of curiosity I changed the front-envoy back to ingress and ran a test on zipkin and jaeger and here is the results:

envoy-ingress-zipkin

So as you can see there are two sets of server spans - and the timeline view shows front-envoy instead of service1 - I guess the name is based on the first server annotation it finds.

envoy-ingress-jaeger

The Jaeger UI misses out the service1 server span.

So I think the options currently are:

  1. Within front-envoy, use the "operation_name" with "egress" which generates tracing emulating the client application (i.e. a "client span").

  2. Within front-envoy, use the "operation_name" with "ingress" which essentially generates a double server span - one at the front-envoy and the other at the service.

  3. Don't add tracing to front-envoy, so the tracing only begins (assuming that tracing context is not passed in from the client) when the request is received at the service (proxy).

The above three options are possible now without any code change - only config change - so the user could decide which strategy they want.

  1. Variation of (2) - instead of just creating server span at the front-envoy, also create a client span for the request going from the front-envoy to the service proxy. However this then introduces tracing of the infrastructure, which may or may not be what a user (who is primarily interested in their application) is interested in.

@rshriram
Copy link
Copy Markdown
Member

rshriram commented Aug 30, 2017 via email

@objectiser
Copy link
Copy Markdown
Contributor Author

Yes, when set to "egress" it acts as before, creating a "client span" for the front-envoy.

jpsim pushed a commit that referenced this pull request Nov 28, 2022
Description: Conclusion for two tests requiring more investigation
Risk Level: None
Testing: yes
Docs Changes: N/A
Release Notes: N/A
Signed-off-by: Charles Le Borgne <cleborgne@google.com>
Signed-off-by: JP Simard <jp@jpsim.com>
jpsim pushed a commit that referenced this pull request Nov 29, 2022
Description: Conclusion for two tests requiring more investigation
Risk Level: None
Testing: yes
Docs Changes: N/A
Release Notes: N/A
Signed-off-by: Charles Le Borgne <cleborgne@google.com>
Signed-off-by: JP Simard <jp@jpsim.com>
mathetake added a commit that referenced this pull request Mar 3, 2026
**Description**

This fixes the inconsistency for embeddings in the span usage in
translators.

**Related Issues/PRs (if applicable)**
Preparation for #90
Follow up on #1565

---------

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
mathetake added a commit that referenced this pull request Mar 3, 2026
#1568)

**Description**

This fixes the inconsistency of span usage by image gen and rerank
translators

**Related Issues/PRs (if applicable)**
Preparation for #90 
Follow up on #1565 #1567

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants