Skip to content

Add AWS X-Ray into Envoy as one of tracing providers #5644

@luluzhao

Description

@luluzhao

AWS X-Ray Integration with Envoy proxy

Description:
Hello Envoy contributors, I am from AWS X-Ray team and we want to make contributions to Envoy so that customer can use X-Ray as one tracing option. Before sending out PRs, we have one design regarding sampling strategy want to discuss with the community and get some feedback.

I found out Envoy provides its own sampling strategy such as random sampling and customer can just configure the % of requests that will be traced, which is different from our current existing sampling strategy. How our sampling strategy works is that customer can provide rules for sampling request through our console. By customizing sampling rules, customers can control the amount of data that they want to record, and modify sampling behavior on the fly without modifying or redeploying their code. For example, customer can specify which URL path will be sampled, or which HTTP method(GET or POST) request contains will be sampled. Each rule will be applied to a certain group of requests. Basically, providing more options for sampling requests and customer won't need to modify their code if they want to change the sampling behavior.

The implementation of centralized sampling is that inside envoy proxy, there will be two long running processes which fetch sampling rules and statistics and update local rules which is stored in cache. The cache has high read frequencies since each sampling decision requires the xray tracer(inside Envoy) to read from the cache, which uses granular locking mechanisms to ensure the performance impact is minimal. Each rule has it's own lock to isolate the process of retrieving a rule from the cache and actually modifying/using the rule. The maximum sampling rules customer were allowed to provide is 25. If somehow centralized sampling is not working, tracer will try to use the default 5% sampling rate to all incoming request.

Above is the current design we have to sampling strategy. One concern we have is that does Envoy community accept we providing another sampling strategy inside envoy? If the community accepts the PR, how we should treat the sampling strategy that envoy original have? Should we mute that one? The other concern is about the performance impact for envoy if onboard with our centralized sampling. Currently, our centralized sampling is used by customer's web application where performance impact can be small. However, Envoy is a proxy which is lightweight, after adding centralized sampling, the performance impact can be big. Any suggestions here for us about which specific performance test item we should emphasis on?

I am new to Envoy and some understanding of Envoy might be wrong, feel free to point out and have discussion with me. Thank you very much.

[Relevant Links:]
AWS X-Ray
AWS X-Ray Centralized Sampling

Metadata

Metadata

Assignees

No one assigned

    Labels

    design proposalNeeds design doc/proposal before implementationno stalebotDisables stalebot from closing an issue

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions