See Everything. From Services to AI Agents.

Traces. Logs. Metrics. Dashboards. Service maps. AI agent tracing. Built-in ML. PPL query language. One open-source platform for full-stack observability. No license fees. No lock-in.

OpenTelemetry-native. Apache 2.0. Self-host anywhere. Zero lock-in.

Live Playground View Docs GitHub

Traces

Metrics

↻

Dashboards

AI Agents

Traces

Distributed tracing with auto-generated service maps and RED metrics

Quick Start

 $ curl -fsSL https://raw.githubusercontent.com/opensearch-project/observability-stack/main/install.sh | bash

Docker, Kubernetes, or bare metal. Full stack in 5 minutes.

See It in Action

Real screenshots from the OpenSearch Observability Stack. Every feature shown here is available in the live playground.

APM Application Map showing service dependencies

APM Service Maps

Auto-generated topology maps showing service dependencies, latency, and error rates across your distributed system.

Trace detail page with span tree and RED metrics

Distributed Trace Detail

Full span tree with RED metrics, faceted fields, and cross-signal correlation to related logs.

Log Investigation

Search and analyze billions of log events with full-text search and PPL structured queries.

Agent execution graph showing DAG of agent workflow

AI Agent Execution Graph

Visualize agent workflows as DAGs — trace LLM calls, tool execution, and reasoning steps.

Prometheus Metrics

PromQL-powered metrics dashboards with custom panels, alerting, and auto-computed RED metrics.

Cross-Signal Correlation

Click from a trace span to correlated logs, or from logs to related traces. All signals connected.

Try Live Playground Explore Docs

Full-Stack Observability, One Platform

From service health to AI agent performance — traces, logs, metrics, dashboards, built-in ML, and a powerful query language

APM & Distributed Tracing

End-to-end visibility across services with auto-generated service maps, latency breakdowns, and error tracking. OpenTelemetry-native with zero proprietary agents.

Service maps and dependency visualization
P50/P95/P99 latency, error rates, and throughput
OTel-native — works with any language or framework

Metrics & Dashboards

Prometheus-compatible metrics with PromQL support. Custom dashboards and RED metrics computed automatically from trace data. All signals in one platform.

Prometheus remote-write and native PromQL support
RED metrics (Rate, Errors, Duration) auto-computed from traces
Custom dashboards with real-time panels and alerting

Log Analytics with PPL

Full-text search meets a pipe-based query language. PPL lets you filter, transform, aggregate, and correlate logs with traces — all in one query. 50+ commands and 200+ built-in functions.

Complete query language — joins, subqueries, stats, and more
ML-powered log pattern clustering with zero regex
Log-to-trace correlation via traceId in one click

AI Agent & LLM Observability

Trace AI agent workflows end-to-end. Visualize execution graphs, monitor token usage, track tool calls, and debug agent behavior with OpenTelemetry GenAI semantic conventions.

Agent tracing with tool-call and reasoning step visualization
Token usage, cost tracking, and failure rate analysis
Python/JS SDKs + MCP support — works with any AI framework

PPL Query Language

A pipe-based query language built for observability. Filter, transform, aggregate, join across indices, and run ML algorithms — all in a single query pipeline.

Cross-signal correlation: join logs with traces on traceId
Automatic error pattern clustering — no regex required
Anomaly detection and k-means clustering built into queries

ppl

source = logs-otel-v1*
| where severityText = 'ERROR'
| patterns body method=brain mode=aggregation
    by `resource.attributes.service.name`
| sort - pattern_count
| head 20

# Zero regex. ML-powered clustering.
# Try it in the Live Playground →

Built-in Machine Learning

Anomaly detection and clustering run directly in your query pipeline — no separate ML service, no model management, no data science team required.

Random Cut Forest anomaly detection per service
K-means clustering for automatic service health tiers
Trendline and rolling window analytics built in

ppl

source = otel-v1-apm-span-*
| stats avg(durationInNanos) as avg_latency
    by span(startTime, 5m) as window,
       serviceName
| ml action='train' algorithm='rcf'
    time_field='window'
    category_field='serviceName'
| where anomaly_grade > 0
| sort - anomaly_grade

# Built-in ML. No external service.

5 Minutes to Production Observability

From zero to full tracing in minutes. No complex configuration, no vendor lock-in.

main.py Python

from opensearch_genai_sdk_py import register, agent, tool

register(service_name="my-app")

@tool(name="search")
def search(query: str) -> dict:
    return search_api.query(query)

@agent(name="assistant")
def assistant(prompt):
    data = search(prompt)
    return llm.generate(prompt, context=data)

# Automatic OTEL traces captured
result = assistant("Hello AI")

Terminal Output

✓ Instrumentation initialized

✓ OTEL exporter configured

✓ Trace captured: process_request

✓ Spans exported: 3

✓ Latency: 342ms

→ View dashboard at http://localhost:8000

See Full Interactive Demo

Choose Your Integration Style

Three paths to production observability. Pick the one that fits your workflow.

GenAI SDK

One-line setup with automatic OpenTelemetry instrumentation. Decorators for agents, tools, and workflows.

example.pypython

from opensearch_genai_sdk_py import register, agent, tool
# One-line setup — configures OTEL pipeline automatically
register(service_name="my-app")
@tool(name="get_weather")
def get_weather(city: str) -> dict:
    return {"city": city, "temp": 22, "condition": "sunny"}
@agent(name="weather_assistant")
def assistant(query: str) -> str:
    data = get_weather("Paris")
    return f"{data['condition']}, {data['temp']}C"
# Automatic OTEL traces, metrics, and logs
result = assistant("What's the weather?")

Key Benefits

Zero configuration required
Automatic instrumentation of popular frameworks
Instant OTEL traces and metrics
Works with existing code
Production-ready in 5 minutes

Why OpenTelemetry Matters

The foundation for traces, metrics, and logs across services and AI. No compromises, no lock-in.

Industry Standard (CNCF)

OpenTelemetry is a CNCF graduated project, backed by major tech companies and trusted by thousands of organizations worldwide.

Learn more

Your Data, Your Rules

Own your observability data completely. Export to any backend, store locally, or switch providers anytime without losing history.

Learn more

Future-Proof Investment

Built on open standards that evolve with the industry. Your instrumentation code stays relevant as technology advances.

Learn more

No Vendor Lock-In

Switch observability backends in minutes, not months. Your instrumentation is portable across any OTEL-compatible platform.

Learn more

Language Agnostic

Consistent instrumentation across Python, JavaScript, Go, Java, and 10+ languages. One standard for your entire stack.

Learn more

Community Driven

Benefit from contributions by thousands of developers. Active community, extensive documentation, and continuous improvements.

Learn more