-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Instrument Elasticsearch with APM #84369
Copy link
Copy link
Closed
Labels
:Core/Infra/CoreCore issues without another labelCore issues without another label:Delivery/ToolingDeveloper tooliing and automationDeveloper tooliing and automation>featureTeam:Core/InfraMeta label for core/infra teamMeta label for core/infra teamTeam:DeliveryMeta label for Delivery teamMeta label for Delivery team
Metadata
Metadata
Assignees
Labels
:Core/Infra/CoreCore issues without another labelCore issues without another label:Delivery/ToolingDeveloper tooliing and automationDeveloper tooliing and automation>featureTeam:Core/InfraMeta label for core/infra teamMeta label for core/infra teamTeam:DeliveryMeta label for Delivery teamMeta label for Delivery team
Type
Fields
Give feedbackNo fields configured for issues without a type.
NOTE: this issue will evolve as we scope out this work.
Description
"Why is Elasticsearch slow?" is a common question from users. We have tools to investigate certain aspects of this question already, for instance the search slowlog (good if the shard-level searches are slow) and the hot threads API (good if the slowness is an ongoing thing) but there are many gaps too. For instance, how would we discover that a Kibana dashboard triggers unreasonably many searches if each of those searches completes fairly quickly? How would we discover that requests are spending unexpectedly long in queues? How do we see if the slow steps all involve a particular node? What if that node is on a remote cluster? It's hard to take a structured approach to performance questions with the tools we have today.
Distributed tracing is a great way to answer questions of this nature. Elastic has a distributed tracing product, APM, which sits on top of Elasticsearch, but today Elasticsearch itself is opaque to APM: we cannot trace the execution of a request through Elasticsearch. Let's fix that.
This work will build on an existing exploratory project that instrumented a number of "tasks" in Elasticsearch. More types of tasks will be instrumented, as well as requests / responses at the REST level.
Tasks
Make sampling rate configurable- handled by APM agentMore flexible configuration of connection to APM server (TLS features, proxy support, protocol selection etc)- handled by APM agentOut-of-scope
The focus of this work is making is instrumenting Elasticsearch for Elastic's own purposes. Making it available to users and licensing it for that purpose is not currently in scope.