Skip to main content

See Everything. From Services to AI Agents.

Traces. Logs. Metrics. Dashboards. Service maps. AI agent tracing. Built-in ML. PPL query language. One open-source platform for full-stack observability. No license fees. No lock-in.

OpenTelemetry-native. Apache 2.0. Self-host anywhere. Zero lock-in.

Traces
Metrics
Dashboards
AI Agents
Traces
Distributed tracing with auto-generated service maps and RED metrics
Quick Start
$ curl -fsSL https://raw.githubusercontent.com/opensearch-project/observability-stack/main/install.sh | bash

Docker, Kubernetes, or bare metal. Full stack in 5 minutes.

Full-Stack Observability, One Platform

From service health to AI agent performance — traces, logs, metrics, dashboards, built-in ML, and a powerful query language

APM & Distributed Tracing

End-to-end visibility across services with auto-generated service maps, latency breakdowns, and error tracking. OpenTelemetry-native with zero proprietary agents.

  • Service maps and dependency visualization
  • P50/P95/P99 latency, error rates, and throughput
  • OTel-native — works with any language or framework
APM Service Map

Metrics & Dashboards

Prometheus-compatible metrics with PromQL support. Custom dashboards and RED metrics computed automatically from trace data. All signals in one platform.

  • Prometheus remote-write and native PromQL support
  • RED metrics (Rate, Errors, Duration) auto-computed from traces
  • Custom dashboards with real-time panels and alerting
Prometheus Metrics Dashboard

Log Analytics with PPL

Full-text search meets a pipe-based query language. PPL lets you filter, transform, aggregate, and correlate logs with traces — all in one query. 50+ commands and 200+ built-in functions.

  • Complete query language — joins, subqueries, stats, and more
  • ML-powered log pattern clustering with zero regex
  • Log-to-trace correlation via traceId in one click
Log Analytics Interface

AI Agent & LLM Observability

Trace AI agent workflows end-to-end. Visualize execution graphs, monitor token usage, track tool calls, and debug agent behavior with OpenTelemetry GenAI semantic conventions.

  • Agent tracing with tool-call and reasoning step visualization
  • Token usage, cost tracking, and failure rate analysis
  • Python/JS SDKs + MCP support — works with any AI framework
Agent Execution Graph

PPL Query Language

A pipe-based query language built for observability. Filter, transform, aggregate, join across indices, and run ML algorithms — all in a single query pipeline.

  • Cross-signal correlation: join logs with traces on traceId
  • Automatic error pattern clustering — no regex required
  • Anomaly detection and k-means clustering built into queries
ppl
source = logs-otel-v1*
| where severityText = 'ERROR'
| patterns body method=brain mode=aggregation
    by `resource.attributes.service.name`
| sort - pattern_count
| head 20

# Zero regex. ML-powered clustering.
# Try it in the Live Playground →

Built-in Machine Learning

Anomaly detection and clustering run directly in your query pipeline — no separate ML service, no model management, no data science team required.

  • Random Cut Forest anomaly detection per service
  • K-means clustering for automatic service health tiers
  • Trendline and rolling window analytics built in
ppl
source = otel-v1-apm-span-*
| stats avg(durationInNanos) as avg_latency
    by span(startTime, 5m) as window,
       serviceName
| ml action='train' algorithm='rcf'
    time_field='window'
    category_field='serviceName'
| where anomaly_grade > 0
| sort - anomaly_grade

# Built-in ML. No external service.

5 Minutes to Production Observability

From zero to full tracing in minutes. No complex configuration, no vendor lock-in.

main.py Python
from opensearch_genai_sdk_py import register, agent, tool

register(service_name="my-app")

@tool(name="search")
def search(query: str) -> dict:
    return search_api.query(query)

@agent(name="assistant")
def assistant(prompt):
    data = search(prompt)
    return llm.generate(prompt, context=data)

# Automatic OTEL traces captured
result = assistant("Hello AI")
Terminal Output
Instrumentation initialized
OTEL exporter configured
Trace captured: process_request
Spans exported: 3
Latency: 342ms
View dashboard at http://localhost:8000

Choose Your Integration Style

Three paths to production observability. Pick the one that fits your workflow.

GenAI SDK

One-line setup with automatic OpenTelemetry instrumentation. Decorators for agents, tools, and workflows.

example.pypython
from opensearch_genai_sdk_py import register, agent, tool
# One-line setup — configures OTEL pipeline automatically
register(service_name="my-app")
@tool(name="get_weather")
def get_weather(city: str) -> dict:
return {"city": city, "temp": 22, "condition": "sunny"}
@agent(name="weather_assistant")
def assistant(query: str) -> str:
data = get_weather("Paris")
return f"{data['condition']}, {data['temp']}C"
# Automatic OTEL traces, metrics, and logs
result = assistant("What's the weather?")

Key Benefits

  • Zero configuration required
  • Automatic instrumentation of popular frameworks
  • Instant OTEL traces and metrics
  • Works with existing code
  • Production-ready in 5 minutes

Why OpenTelemetry Matters

The foundation for traces, metrics, and logs across services and AI. No compromises, no lock-in.

Industry Standard (CNCF)

OpenTelemetry is a CNCF graduated project, backed by major tech companies and trusted by thousands of organizations worldwide.

Learn more

Your Data, Your Rules

Own your observability data completely. Export to any backend, store locally, or switch providers anytime without losing history.

Learn more

Future-Proof Investment

Built on open standards that evolve with the industry. Your instrumentation code stays relevant as technology advances.

Learn more

No Vendor Lock-In

Switch observability backends in minutes, not months. Your instrumentation is portable across any OTEL-compatible platform.

Learn more

Language Agnostic

Consistent instrumentation across Python, JavaScript, Go, Java, and 10+ languages. One standard for your entire stack.

Learn more

Community Driven

Benefit from contributions by thousands of developers. Active community, extensive documentation, and continuous improvements.

Learn more