Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 9 additions & 1 deletion docs/configuration/http_filters/router_filter.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,8 @@ redirection, the filter also handles retry, statistics, etc.
{
"name": "router",
"config": {
"dynamic_stats": "..."
"dynamic_stats": "...",
"start_child_span": "..."
}
}

Expand All @@ -22,6 +23,13 @@ dynamic_stats
<config_cluster_manager_cluster_stats_dynamic_http>`. Defaults to *true*. Can be disabled in high
performance scenarios.

.. _config_http_filters_router_start_child_span:

start_child_span
*(optional, boolean)* Whether to start a child :ref:`tracing <arch_overview_tracing>` span for
egress routed calls. This can be useful in scenarios where other filters (auth, ratelimit, etc.)
make outbound calls and have child spans rooted at the same ingress parent. Defaults to *false*.

.. _config_http_filters_router_headers:

HTTP headers
Expand Down
39 changes: 21 additions & 18 deletions docs/intro/arch_overview/tracing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,23 +33,26 @@ initiated:
* Randomly sampled via the :ref:`random_sampling <config_http_conn_man_runtime_random_sampling>`
runtime setting.

The router filter is also capable of creating a child span for egress calls via the
:ref:`start_child_span <config_http_filters_router_start_child_span>` option.

Trace context propagation
-------------------------
Envoy provides the capability for reporting tracing information regarding communications between
services in the mesh. However, to be able to correlate the pieces of tracing information generated by the
various proxies within a call flow, the services must propagate certain trace context between the inbound
and outbound requests.
services in the mesh. However, to be able to correlate the pieces of tracing information generated
by the various proxies within a call flow, the services must propagate certain trace context between
the inbound and outbound requests.

Whichever tracing provider is being used, the service should propagate the
:ref:`config_http_conn_man_headers_x-request-id` to enable logging across the invoked services
to be correlated.

The tracing providers also require additional context, to enable the parent/child relationships
between the spans (logical units of work) to be understood.
This can be achieved by using the LightStep (via OpenTracing API) or Zipkin tracer directly within the service itself,
to extract the trace context from the inbound request and inject it into any subsequent outbound requests.
This approach would also enable the service to create additional spans, describing work being done internally
within the service, that may be useful when examining the end-to-end trace.
between the spans (logical units of work) to be understood. This can be achieved by using the
LightStep (via OpenTracing API) or Zipkin tracer directly within the service itself, to extract the
trace context from the inbound request and inject it into any subsequent outbound requests. This
approach would also enable the service to create additional spans, describing work being done
internally within the service, that may be useful when examining the end-to-end trace.

Alternatively the trace context can be manually propagated by the service:

Expand All @@ -67,11 +70,10 @@ Alternatively the trace context can be manually propagated by the service:
also possible to just propagate the
:ref:`config_http_conn_man_headers_x-ot-span-context` HTTP header.

NOTE: Work is currently underway in the distributed tracing community to define a standard
for trace context propagation. Once a suitable approach has been adopted, the use of the non-standard
single header :ref:`config_http_conn_man_headers_x-ot-span-context` for propagating Zipkin trace
context will be replaced.

NOTE: Work is currently underway in the distributed tracing community to define a standard for trace
context propagation. Once a suitable approach has been adopted, the use of the non-standard single
header :ref:`config_http_conn_man_headers_x-ot-span-context` for propagating Zipkin trace context
will be replaced.

What data each trace contains
-----------------------------
Expand All @@ -89,14 +91,15 @@ associated with it. Each span generated by Envoy contains the following data:
* HTTP response code.
* Tracing system-specific metadata.

The span also includes a name (or operation) which by default is defined as the host of the invoked service.
However this can be customized using a :ref:`config_http_conn_man_route_table_decorator` on the route. The
name can also be overridden using the :ref:`config_http_filters_router_x-envoy-decorator-operation` header.
The span also includes a name (or operation) which by default is defined as the host of the invoked
service. However this can be customized using a :ref:`config_http_conn_man_route_table_decorator` on
the route. The name can also be overridden using the
:ref:`config_http_filters_router_x-envoy-decorator-operation` header.

Envoy automatically sends spans to tracing collectors. Depending on the tracing collector,
multiple spans are stitched together using common information such as the globally unique
request ID :ref:`config_http_conn_man_headers_x-request-id` (LightStep) or
the trace ID :ref:`configuration <config_tracing>` (Zipkin).

See tracing :ref:`configuration <config_tracing>` for more information on
how to setup tracing in Envoy.
See tracing :ref:`configuration <config_tracing>` for more information on how to setup tracing in
Envoy.
11 changes: 1 addition & 10 deletions source/common/grpc/async_client_impl.h
Original file line number Diff line number Diff line change
Expand Up @@ -260,14 +260,8 @@ class AsyncStreamImpl : public AsyncStream<RequestType>,
friend class AsyncClientImpl<RequestType, ResponseType>;
};

class AsyncClientTracingConfig : public Tracing::Config {
class AsyncClientTracingConfig : public Tracing::EgressConfigImpl {
public:
// Tracing::Config
Tracing::OperationName operationName() const override { return Tracing::OperationName::Egress; }
const std::vector<Http::LowerCaseString>& requestHeadersForTags() const override {
return request_headers_for_tags_;
}

struct {
const std::string STATUS = "status";
const std::string GRPC_STATUS = "grpc.status_code";
Expand All @@ -276,9 +270,6 @@ class AsyncClientTracingConfig : public Tracing::Config {
const std::string TRUE = "true";
const std::string UPSTREAM_CLUSTER_NAME = "upstream_cluster_name";
} TagStrings;

private:
const std::vector<Http::LowerCaseString> request_headers_for_tags_;
};

typedef ConstSingleton<AsyncClientTracingConfig> TracingConfig;
Expand Down
2 changes: 1 addition & 1 deletion source/common/http/async_client_impl.cc
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ AsyncClientImpl::AsyncClientImpl(const Upstream::ClusterInfo& cluster, Stats::St
Runtime::RandomGenerator& random,
Router::ShadowWriterPtr&& shadow_writer)
: cluster_(cluster), config_("http.async-client.", local_info, stats_store, cm, runtime, random,
std::move(shadow_writer), true),
std::move(shadow_writer), true, false),
dispatcher_(dispatcher) {}

AsyncClientImpl::~AsyncClientImpl() {
Expand Down
3 changes: 2 additions & 1 deletion source/common/json/config_schemas.cc
Original file line number Diff line number Diff line change
Expand Up @@ -1081,7 +1081,8 @@ const std::string Json::Schema::ROUTER_HTTP_FILTER_SCHEMA(R"EOF(
"$schema": "http://json-schema.org/schema#",
"type" : "object",
"properties" : {
"dynamic_stats" : {"type" : "boolean"}
"dynamic_stats" : {"type" : "boolean"},
"start_child_span" : {"type" : "boolean"}
},
"additionalProperties" : false
}
Expand Down
1 change: 1 addition & 0 deletions source/common/router/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -154,6 +154,7 @@ envoy_cc_library(
"//source/common/http:headers_lib",
"//source/common/http:message_lib",
"//source/common/http:utility_lib",
"//source/common/tracing:http_tracer_lib",
],
)

Expand Down
23 changes: 22 additions & 1 deletion source/common/router/router.cc
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
#include "common/http/utility.h"
#include "common/router/config_impl.h"
#include "common/router/retry_state_impl.h"
#include "common/tracing/http_tracer_impl.h"

namespace Envoy {
namespace Router {
Expand Down Expand Up @@ -736,8 +737,24 @@ void Filter::doRetry() {
}
}

Filter::UpstreamRequest::UpstreamRequest(Filter& parent, Http::ConnectionPool::Instance& pool)
: parent_(parent), conn_pool_(pool), grpc_rq_success_deferred_(false),
calling_encode_headers_(false), upstream_canary_(false), encode_complete_(false),
encode_trailers_(false) {

if (parent_.config_.start_child_span_) {
span_ = parent_.callbacks_->activeSpan().spawnChild(
Tracing::EgressConfig::get(), "router " + parent.cluster_->name() + " egress",
ProdSystemTimeSource::instance_.currentTime());
}
}

Filter::UpstreamRequest::~UpstreamRequest() {
if (per_try_timeout_) {
if (span_ != nullptr) {
// TODO(mattklein123): Add tags based on what happened to this request (retries, reset, etc.).
span_->finishSpan();
}
if (per_try_timeout_ != nullptr) {
// Allows for testing.
per_try_timeout_->disableTimer();
}
Expand Down Expand Up @@ -876,6 +893,10 @@ void Filter::UpstreamRequest::onPoolReady(Http::StreamEncoder& request_encoder,
parent_.downstream_headers_->Host()->value(host->hostname());
}

if (span_ != nullptr) {
span_->injectContext(*parent_.downstream_headers_);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I have understood correctly, this new child span would be the parent of the client spans representing the actual call to the destination service? So if there are multiple retries, then there would be a client span per retry, but they would all be contained within this new child span?

If this is the case - then I don't think you need to inject the context - that is only relevant for client spans to propagate the context to the invoked service.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be mistaken, but I think we do need the inject context here. The idea is that Envoy already creates a parent span on HTTP connection manager ingress. In the current code, we always just forward that context through the router. However, if we make multiple calls out in the context of the original ingress span (e.g., auth, ratelimit, etc.) it's unclear how the calls line up. My goal here is to have an option where every routed call creates a new span. This way we see something like:

-> ingress
  -> ratelimit
  -> auth
  -> ratelimit
  -> router
    -> some service ingress

I assume in this case I need to inject context on each outbound call? In the case of retry with this code we would see something like:

-> ingress
  -> ratelimit
  -> auth
  -> ratelimit
  -> router
    -> some service ingress
  -> router
    -> some service ingress

Right now there are no tags that indicate retry, etc. but I will add those also in a follow up.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok think I understand now.

Currently the default tracing shows spans for the inbound and outbound requests on behalf of the proxied service - but the additional tracing in your example shows internal spans representing the additional work performed by the proxy, including routing to the local service.

Was wondering whether, instead of specific config properties (e.g. start_child_span) whether a higher level config property could just enable all the internal tracing for the proxy?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was wondering whether, instead of specific config properties (e.g. start_child_span) whether a higher level config property could just enable all the internal tracing for the proxy?

I thought about this, but I think it has to be per-router because some proxies are doing both ingress and egress, and I think having a double-span in some flows would be noisy. I think it's possible that we could trigger this off of the HTTP connection manager config based on ingress/egress, but even that is messy because for an edge/middle proxy we are doing both ingress/egress in the same listener. Thoughts?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how the tracing would work if same listener used for both anyway.

But i think it would be better if could be managed by combination of tracing config ingress/egress property and another property indicating if proxy internals should also be traced.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how the tracing would work if same listener used for both anyway.

It definitely works. This is standard on edge (middle) proxy nodes. The same listener is both ingress/egress. Let me think a bit more as to whether it makes sense to derive this from the parent HTTP connection manager ingress/egress setting.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@objectiser let's sync up when you are back from vacation and we also have the email thread going. I've come full circle on this and given the range of situations we need to support I now think that the granular config option on the router is the way to go. (So basically I would like to keep this approach).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright per offline discussion I think we are OK to merge this as is. @objectiser WDYT?

}

request_encoder.encodeHeaders(*parent_.downstream_headers_,
!buffered_request_body_ && encode_complete_ && !encode_trailers_);
calling_encode_headers_ = false;
Expand Down
13 changes: 6 additions & 7 deletions source/common/router/router.h
Original file line number Diff line number Diff line change
Expand Up @@ -85,10 +85,11 @@ class FilterConfig {
FilterConfig(const std::string& stat_prefix, const LocalInfo::LocalInfo& local_info,
Stats::Scope& scope, Upstream::ClusterManager& cm, Runtime::Loader& runtime,
Runtime::RandomGenerator& random, ShadowWriterPtr&& shadow_writer,
bool emit_dynamic_stats)
bool emit_dynamic_stats, bool start_child_span)
: scope_(scope), local_info_(local_info), cm_(cm), runtime_(runtime),
random_(random), stats_{ALL_ROUTER_STATS(POOL_COUNTER_PREFIX(scope, stat_prefix))},
emit_dynamic_stats_(emit_dynamic_stats), shadow_writer_(std::move(shadow_writer)) {}
emit_dynamic_stats_(emit_dynamic_stats), start_child_span_(start_child_span),
shadow_writer_(std::move(shadow_writer)) {}

ShadowWriter& shadowWriter() { return *shadow_writer_; }

Expand All @@ -99,6 +100,7 @@ class FilterConfig {
Runtime::RandomGenerator& random_;
FilterStats stats_;
const bool emit_dynamic_stats_;
const bool start_child_span_;

private:
ShadowWriterPtr shadow_writer_;
Expand Down Expand Up @@ -181,11 +183,7 @@ class Filter : Logger::Loggable<Logger::Id::router>,
struct UpstreamRequest : public Http::StreamDecoder,
public Http::StreamCallbacks,
public Http::ConnectionPool::Callbacks {
UpstreamRequest(Filter& parent, Http::ConnectionPool::Instance& pool)
: parent_(parent), conn_pool_(pool), grpc_rq_success_deferred_(false),
calling_encode_headers_(false), upstream_canary_(false), encode_complete_(false),
encode_trailers_(false) {}

UpstreamRequest(Filter& parent, Http::ConnectionPool::Instance& pool);
~UpstreamRequest();

void encodeHeaders(bool end_stream);
Expand Down Expand Up @@ -250,6 +248,7 @@ class Filter : Logger::Loggable<Logger::Id::router>,
Buffer::WatermarkBufferPtr buffered_request_body_;
Upstream::HostDescriptionConstSharedPtr upstream_host_;
DownstreamWatermarkManager downstream_watermark_manager_{*this};
Tracing::SpanPtr span_;

bool calling_encode_headers_ : 1;
bool upstream_canary_ : 1;
Expand Down
14 changes: 14 additions & 0 deletions source/common/tracing/http_tracer_impl.h
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,20 @@ class HttpTracerUtility {
static const std::string EGRESS_OPERATION;
};

class EgressConfigImpl : public Config {
public:
// Tracing::Config
Tracing::OperationName operationName() const override { return Tracing::OperationName::Egress; }
const std::vector<Http::LowerCaseString>& requestHeadersForTags() const override {
return request_headers_for_tags_;
}

private:
const std::vector<Http::LowerCaseString> request_headers_for_tags_;
};

typedef ConstSingleton<EgressConfigImpl> EgressConfig;

class NullSpan : public Span {
public:
static NullSpan& instance() {
Expand Down
3 changes: 2 additions & 1 deletion source/server/config/http/router.cc
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,8 @@ HttpFilterFactoryCb RouterFilterConfig::createFilterFactory(const Json::Object&
stat_prefix, context.localInfo(), context.scope(), context.clusterManager(),
context.runtime(), context.random(),
Router::ShadowWriterPtr{new Router::ShadowWriterImpl(context.clusterManager())},
json_config.getBoolean("dynamic_stats", true)));
json_config.getBoolean("dynamic_stats", true),
json_config.getBoolean("start_child_span", false)));

return [config](Http::FilterChainFactoryCallbacks& callbacks) -> void {
callbacks.addStreamDecoderFilter(std::make_shared<Router::ProdFilter>(*config));
Expand Down
76 changes: 73 additions & 3 deletions test/common/router/router_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -56,12 +56,12 @@ class TestFilter : public Filter {
MockRetryState* retry_state_{};
};

class RouterTest : public testing::Test {
class RouterTestBase : public testing::Test {
public:
RouterTest()
RouterTestBase(bool start_child_span)
: shadow_writer_(new MockShadowWriter()),
config_("test.", local_info_, stats_store_, cm_, runtime_, random_,
ShadowWriterPtr{shadow_writer_}, true),
ShadowWriterPtr{shadow_writer_}, true, start_child_span),
router_(config_) {
router_.setDecoderFilterCallbacks(callbacks_);
upstream_locality_.set_zone("to_az");
Expand Down Expand Up @@ -120,6 +120,11 @@ class RouterTest : public testing::Test {
Network::Utility::parseInternetAddressAndPort("1.2.3.4:80")};
};

class RouterTest : public RouterTestBase {
public:
RouterTest() : RouterTestBase(false) { EXPECT_CALL(callbacks_, activeSpan()).Times(0); }
};

TEST_F(RouterTest, RouteNotFound) {
EXPECT_CALL(callbacks_.request_info_,
setResponseFlag(Http::AccessLog::ResponseFlag::NoRouteFound));
Expand Down Expand Up @@ -1831,5 +1836,70 @@ TEST_F(WatermarkTest, RetryRequestNotComplete) {
encoder1.stream_.resetStream(Http::StreamResetReason::RemoteReset);
}

class RouterTestChildSpan : public RouterTestBase {
public:
RouterTestChildSpan() : RouterTestBase(true) {}
};

// Make sure child spans start/inject/finish with a normal flow.
TEST_F(RouterTestChildSpan, BasicFlow) {
EXPECT_CALL(callbacks_.route_->route_entry_, timeout())
.WillOnce(Return(std::chrono::milliseconds(0)));
EXPECT_CALL(callbacks_.dispatcher_, createTimer_(_)).Times(0);

NiceMock<Http::MockStreamEncoder> encoder;
Http::StreamDecoder* response_decoder = nullptr;
Tracing::MockSpan* child_span{new Tracing::MockSpan()};
EXPECT_CALL(cm_.conn_pool_, newStream(_, _))
.WillOnce(Invoke([&](Http::StreamDecoder& decoder, Http::ConnectionPool::Callbacks& callbacks)
-> Http::ConnectionPool::Cancellable* {
response_decoder = &decoder;
EXPECT_CALL(*child_span, injectContext(_));
callbacks.onPoolReady(encoder, cm_.conn_pool_.host_);
return nullptr;
}));

Http::TestHeaderMapImpl headers;
HttpTestUtility::addDefaultHeaders(headers);
EXPECT_CALL(callbacks_.active_span_, spawnChild_(_, "router fake_cluster egress", _))
.WillOnce(Return(child_span));
router_.decodeHeaders(headers, true);

Http::HeaderMapPtr response_headers(new Http::TestHeaderMapImpl{{":status", "200"}});
EXPECT_CALL(*child_span, finishSpan());
response_decoder->decodeHeaders(std::move(response_headers), true);
}

// Make sure child spans start/inject/finish with a reset flow.
TEST_F(RouterTestChildSpan, ResetFlow) {
EXPECT_CALL(callbacks_.route_->route_entry_, timeout())
.WillOnce(Return(std::chrono::milliseconds(0)));
EXPECT_CALL(callbacks_.dispatcher_, createTimer_(_)).Times(0);

NiceMock<Http::MockStreamEncoder> encoder;
Http::StreamDecoder* response_decoder = nullptr;
Tracing::MockSpan* child_span{new Tracing::MockSpan()};
EXPECT_CALL(cm_.conn_pool_, newStream(_, _))
.WillOnce(Invoke([&](Http::StreamDecoder& decoder, Http::ConnectionPool::Callbacks& callbacks)
-> Http::ConnectionPool::Cancellable* {
response_decoder = &decoder;
EXPECT_CALL(*child_span, injectContext(_));
callbacks.onPoolReady(encoder, cm_.conn_pool_.host_);
return nullptr;
}));

Http::TestHeaderMapImpl headers;
HttpTestUtility::addDefaultHeaders(headers);
EXPECT_CALL(callbacks_.active_span_, spawnChild_(_, "router fake_cluster egress", _))
.WillOnce(Return(child_span));
router_.decodeHeaders(headers, true);

Http::HeaderMapPtr response_headers(new Http::TestHeaderMapImpl{{":status", "200"}});
response_decoder->decodeHeaders(std::move(response_headers), false);

EXPECT_CALL(*child_span, finishSpan());
encoder.stream_.resetStream(Http::StreamResetReason::RemoteReset);
}

} // namespace Router
} // namespace Envoy
3 changes: 2 additions & 1 deletion test/server/config/http/config_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -196,7 +196,8 @@ TEST(HttpFilterConfigTest, BadHealthCheckFilterConfig) {
TEST(HttpFilterConfigTest, RouterFilter) {
std::string json_string = R"EOF(
{
"dynamic_stats" : true
"dynamic_stats" : true,
"start_child_span" : true
}
)EOF";

Expand Down