-
Notifications
You must be signed in to change notification settings - Fork 313
[RFC] OTel telemetry unified source #5596
Description
Overview
Today, Data prepper have three different sources for logs, traces and metrics. These three different sources need individual pipelines (source, buffer, processor [optional] and sink). Maintaining different endpoints/ports/pipelines can be difficult. For ease of initial setup and use, we propose a OTel Telemetry source in data prepper to add logs, traces and metrics in a single OTLP source separated by path suffixes as per OTel standards.
Background
OTel logs source
Supports OTel logs to be ingested via exposed OTLP (gRPC) or HTTP endpoints. This source decodes the OTel logs format into flattend structure that can be ingested in OpenSearch and other compatible sinks. Defaults to /v1/logs http path
Otel traces source
Supports OTel traces to be ingested via exposed OTLP (gRPC) or HTTP endpoints. This source consumes OTel trace data and converts it into a format compatible with OpenSearch and other analysis platforms. The source decodes spans, span events, and other trace data for later enrichment and analysis. Defaults to /v1/traces http path.
Otel metrics source
Supports OTel metrics to be ingested via exposed OTLP (gRPC) or HTTP endpoints. This source accepts metrics data including counters, gauges, histograms, and summary statistics. It transforms the metrics into formats suitable for aggregation, visualization, and long-term storage in OpenSearch and other time-series databases. Defaults to /v1/metrics http path.
Existing Otel source signals setup
Below is the current setup that users use to send Otel based logs, traces and metrics to OpenSearch indexes. The three signals land onto separate sources. The sources are differentiated by ports on the same endpoint with different path suffixes based on OTel exporter SDK as mentioned here. Each source ends up in separate sinks
otel-logs-pipeline:
source:
otel_logs_source:
ssl: false
sink:
- opensearch:
hosts: ["http://opensearch:9200"]
insecure: false
index_type: custom
index: ss4o_logs-%{yyyy.MM.dd}
bulk_size: 4
otel-traces-pipeline:
delay: "100"
source:
otel_trace_source:
ssl: false
sink:
- pipeline:
name: "traces-raw-pipeline"
- pipeline:
name: "service-map-pipeline"
traces-raw-pipeline:
source:
pipeline:
name: "otel-traces-pipeline"
processor:
- otel_trace_raw:
sink:
- opensearch:
hosts: ["http://opensearch:9200"]
insecure: false
index_type: trace-analytics-raw
service-map-pipeline:
delay: "100"
source:
pipeline:
name: "otel-traces-pipeline"
processor:
- service_map_stateful:
sink:
- opensearch:
hosts: ["https://opensearch:9200"]
insecure: false
index_type: trace-analytics-service-map
otel-metrics-pipeline:
source:
otel_metrics_source:
processor:
- otel_metrics:
calculate_histogram_buckets: true
calculate_exponential_histogram_buckets: true
exponential_histogram_max_allowed_scale: 10
flatten_attributes: false
sink:
- opensearch:
hosts: ["https://opensearch:9200"]
insecure: false
index_type: custom
index: ss4o_metrics-otel-%{yyyy.MM.dd}
bulk_size: 4
Proposal unified source for Logs, Traces and Metrics
We want to propose a single unified source for all OTel telemetry signals starting with logs, traces and metrics maintaining OTel sdk standards. The three three signals would be separated based on endpoints path suffixes and would have same host and port for data prepper. The routing to downstream pipelines would be based on meta-data routing.
This source would also have support for gRPC and http endpoints. The source would have similar auth model, config options as exisiting soruces.
# Main telemetry pipeline with unified source
otel-telemetry-pipeline:
source:
otel_telemetry_source:
ssl: false
route:
- logs: "getMetadata(\"eventType\") == \"LOG\""
- traces: "getMetadata(\"eventType\") == \"TRACE\""
- metrics: "getMetadata(\"eventType\") == \"METRIC\""
sink:
- pipeline:
name: "logs-pipeline"
routes:
- "logs"
- pipeline:
name: "traces-pipeline"
routes:
- "traces"
- pipeline:
name: "metrics-pipeline"
routes:
- "metrics"
# Logs pipeline
logs-pipeline:
source:
pipeline:
name: "otel-telemetry-pipeline"
sink:
- opensearch:
hosts: ["http://opensearch:9200"]
insecure: false
index_type: custom
index: ss4o_logs-%{yyyy.MM.dd}
bulk_size: 4
# Traces pipeline
traces-pipeline:
delay: "100"
source:
pipeline:
name: "otel-telemetry-pipeline"
sink:
- pipeline:
name: "traces-raw-pipeline"
- pipeline:
name: "service-map-pipeline"
# Traces raw processing pipeline
traces-raw-pipeline:
source:
pipeline:
name: "traces-pipeline"
processor:
- otel_trace_raw:
sink:
- opensearch:
hosts: ["http://opensearch:9200"]
insecure: false
index_type: trace-analytics-raw
# Service map processing pipeline
service-map-pipeline:
delay: "100"
source:
pipeline:
name: "traces-pipeline"
processor:
- service_map_stateful:
sink:
- opensearch:
hosts: ["https://opensearch:9200"]
insecure: false
index_type: trace-analytics-service-map
# Metrics pipeline
metrics-pipeline:
source:
pipeline:
name: "otel-telemetry-pipeline"
processor:
- otel_metrics:
calculate_histogram_buckets: true
calculate_exponential_histogram_buckets: true
exponential_histogram_max_allowed_scale: 10
flatten_attributes: false
sink:
- opensearch:
hosts: ["https://opensearch:9200"]
insecure: false
index_type: custom
index: ss4o_metrics-otel-%{yyyy.MM.dd}
bulk_size: 4
Architecture
OTLP Source unifies the three existing source OTel-logs,traces and metrics sources. This provides ease of use for a single endpoint for the three sources separated by path. It re-uses most of the existing APIs and services including the latest http-common library for GRPCserver creation.
Reusing & updating the common HTTP Server
The recent PR: #5653 introduced some common artifacts to auth and server codec model. The new OTLP source will build on the same common server model and update the codec resource usage.
Currently the createGRPCServer is used to create a single server endpoint with a unique codec to decoding the signal. A new createGRPCServer function will be created to override the current mechanism with support for multiple services with different codec formats. This update will not break existing OTel sources and moreover will support spinning up the new multi-service GRPC server.
Config details
- Default port can be kept as
21893, logically makes sense and21890/1/2are taken up by logs, traces and metrics. - The paths are separated out in the config for each type of telemetry and is made optional.
- logs_path(Optional) => A String which represents the path for sending unframed HTTP requests for logs. It should start with
/and length should be at least 1. Default is/opentelemetry.proto.collector.logs.v1.LogsService/Export. - metrics_path(Optional) => A String which represents the path for sending unframed HTTP requests for metrics. It should start with
/and length should be at least 1. Default is/opentelemetry.proto.collector.metrics.v1.MetricsService/Export. - traces_path(Optional) => A String which represents the path for sending unframed HTTP requests for traces. It should start with
/and length should be at least 1. Default is/opentelemetry.proto.collector.trace.v1.TraceService/Export.
- logs_path(Optional) => A String which represents the path for sending unframed HTTP requests for logs. It should start with
- request_timeout(Optional) => An
intrepresents request timeout in millis. Default is10000. - health_check_service(Optional) => A boolean enables a gRPC health check service under
grpc.health.v1.Health/Check. Default isfalse. - proto_reflection_service(Optional) => A boolean enables a reflection service for Protobuf services (see ProtoReflectionService and gRPC reflection docs). Default is
false. - unframed_requests(Optional) => A boolean to enable requests not framed using the gRPC wire protocol. When
health_check_serviceis true andunframed_requestsis true, enables HTTP health check service under/health. - thread_count(Optional) => the number of threads to keep in the ScheduledThreadPool. Default is
200. - max_connection_count(Optional) => the maximum allowed number of open connections. Default is
500. - compression(Optional) => The compression type applied on the client request payload. Defaults to
none. Supported values are:none: no compressiongzip: apply GZip de-compression on the incoming request.
Codec and Output format
Supported values are: 1) otel: OpenTelemetry format (default) 2)opensearch: OpenSearch format.
- output_format(Optional) => Specifies the decoded output format for all signals (logs, metrics, traces) if individual output format options are not set.
- logs_output_format(Optional) => Specifies the decoded output format for logs.
- metrics_output_format(Optional) => Specifies the decoded output format for metrics.
- traces_output_format(Optional) => Specifies the decoded output format for traces.
Source plugin's telemetry collected
The OTLP Source plugin will follow the OpenSearch data prepper standards of collecting telemetry data. This telemetry data is helpful in collecting and monitoring this plugins performance, throughput and errors.
Counter
requestTimeouts: measures total number of requests that time out.requestsReceived: measures total number of requests received by OTLP source.successRequests: measures total number of requests successfully processed by OTLP source plugin.badRequests: measures total number of requests with invalid format processed by OTLP source plugin.requestsTooLarge: measures total number of requests that exceed the maximum allowed size.internalServerError: measures total number of requests processed by OTLP source with custom exception type.
Timer
requestProcessDuration: measures latency of requests processed by OTLP source plugin in seconds.
Distribution Summary
payloadSize: measures the distribution of incoming requests payload sizes in bytes.
Supporting new getEventType function
While the unified OTLP source helps to simplify the deployment and resource consumption part of older separate OTel sources, it also makes routing to downstream pipelines difficult. To ease out this difficult in routing telemetry data based on signal type we introduce getEventType function.
DataPrepper have a great event metadata structure that users can used to route between sub-pipelines. DataPrepper already has functions supported in the expression language to help users route these events:
- getMetaData- Function to fetch the value of attribute keys from the event
- hasTags - Function to validate existence of tags in an event
getEventType function is the new function that will help the unified OTLP source to easily route the data to the three downstream telemetry pipelines based on event type: logs, traces and metrics present in the underlying jackson data structure.
- Antlr update: The expression function's grammar needs to be updated such that we don't create regressions for older expression functions and also add the new function. The main differentiation here is that; all prior functions used to take in a parameter. This new function doesn't take in a parameter.
fragment
FunctionArgs
: ((FunctionArg SPACE* COMMA SPACE*)* SPACE* FunctionArg)?
;
- Metadata function extension for expressions: This new function will use existing
getMetadata()getter andParseTreeCoercionServiceneeds to be updated to support no-argument function.
public Object evaluate(final List<Object> args, final Event event, final Function<Object, Object> convertLiteralType) {
if (!args.isEmpty()) {
throw new RuntimeException("getEventType() does not take any arguments");
}
return event.getMetadata().getEventType();
}
Alternative Approach was extend the getMetaData function
This approach was simpler, but would interfere with data present the actual telemetry itself. The implementation would check for getMetaData argument and see if users' have asked to get the eventType like: getMetaData(eventType). This approach wouldn't work if the customers have a field in the raw telemetry data called the eventType
Object value = event.getMetadata().getAttribute(argStr);
if (value == null) {
return null;
}
Auth and SSL support
- ssl(Optional) => A boolean enables TLS/SSL. Default is
true. - sslKeyCertChainFile(Optional) => A
Stringrepresents the SSL certificate chain file path or AWS S3 path. S3 path examples3://<bucketName>/<path>. Required ifsslis set totrue. - sslKeyFile(Optional) => A
Stringrepresents the SSL key file path or AWS S3 path. S3 path examples3://<bucketName>/<path>. Required ifsslis set totrue. - useAcmCertForSSL(Optional) => A boolean enables TLS/SSL using certificate and private key from AWS Certificate Manager (ACM). Default is
false. - acmCertificateArn(Optional) => A
Stringrepresents the ACM certificate ARN. ACM certificate takes preference over S3 or local file system certificate. Required ifuseAcmCertForSSLis set totrue. - awsRegion(Optional) => A
Stringrepresents the AWS region to use ACM or S3. Required ifuseAcmCertForSSLis set totrueorsslKeyCertChainFileandsslKeyFileisAWS S3 path.
Benefits of this new approach
- Simplified Configuration - Data Prepper uses a single endpoint with standard OTel paths, making the config file simpler and easier to manage.
- Easier Setup - Users can get started with fewer pipelines and a smoother setup process.
- Consistent with Industry Standards - It follows the OTel spec, making Data Prepper plug-and-play with other OpenTelemetry tools.
- Reduced Resource Usage - One server handles all signals, saving memory and simplifying deployment.
- Better Integration - All telemetry types work together in one flexible pipeline model.
- Future Extensibility - The design makes it easy to add new telemetry types down the line.
Current PoC for Otel telemetry source
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
