NthLayer

Shift-left reliability for platform teams.

Define reliability requirements as code. Validate SLOs against dependency chains. Detect drift before incidents. Gate deployments on real data.

TL;DR

pip install nthlayer

⚠️ The Problem

Reliability decisions happen too late. Teams set SLOs in isolation, deploy without checking error budgets, and discover missing metrics during incidents. Dashboards are inconsistent. Alerts are copy-pasted. Nobody validates whether a 99.99% target is even achievable given dependencies.

💡 The Solution

NthLayer moves reliability left:

service.yaml → validate → check-deploy → deploy
                  │            │
                  │            └── Error budget ok? Drift acceptable?
                  │
                  └── SLO feasible? Dependencies support it? Metrics exist?

⚡ Core Features

Drift Detection

Predict SLO exhaustion before it happens. Don't wait for the budget to hit zero.

$ nthlayer drift payment-api

payment-api: CRITICAL
  Current: 73.2% budget remaining
  Trend: -2.1%/day (gradual decline)
  Projection: Budget exhausts in 23 days

  Recommendation: Investigate error rate increase before next release

Dependency-Aware SLO Validation

Your SLO ceiling is your weakest dependency chain. NthLayer calculates it.

$ nthlayer validate-slo payment-api

Target: 99.99% availability
Dependencies:
  → postgresql (99.95%)
  → redis (99.99%)
  → user-service (99.9%)

Serial availability: 99.84%
✗ INFEASIBLE: Target exceeds dependency ceiling by 0.15%

Recommendation: Reduce target to 99.8% or improve user-service SLO

Deployment Gates

Block deploys when error budget is exhausted or drift is critical.

$ nthlayer check-deploy payment-api

ERROR: Deployment blocked
  - Error budget: -47 minutes (exhausted)
  - Drift severity: critical
  - 3 P1 incidents in last 7 days

Exit code: 2 (BLOCKED)

Blast Radius Analysis

Understand impact before making changes.

$ nthlayer blast-radius payment-api

Direct dependents (3):
  • checkout-service (critical) - 847K req/day
  • order-service (critical) - 523K req/day
  • refund-worker (standard) - 12K req/day

Transitive impact: 12 services, 2.1M daily requests
Risk: HIGH - affects checkout flow

Metric Recommendations

Enforce OpenTelemetry conventions. Know what's missing before production.

$ nthlayer recommend-metrics payment-api

Required (SLO-critical):
  ✓ http.server.request.duration    FOUND
  ✗ http.server.active_requests     MISSING

Run with --show-code for instrumentation examples.

Artifact Generation

Generate dashboards, alerts, and SLOs from a single spec.

$ nthlayer apply service.yaml

Generated:
  → dashboard.json (Grafana)
  → alerts.yaml (Prometheus)
  → recording-rules.yaml (Prometheus)
  → slos.yaml (OpenSLO)

🚀 Quick Start

# Install
pip install nthlayer

# Create a service spec
nthlayer init

# Validate and generate
nthlayer apply service.yaml

# Check deployment readiness
nthlayer check-deploy payment-api

Minimal `service.yaml`

name: payment-api
tier: critical
type: api
team: payments

dependencies:
  - postgresql
  - redis

NthLayer also supports the OpenSRM format (apiVersion: opensrm/v1) for contracts, deployment gates, and more. See full spec reference for all options.

🔄 CI/CD Integration

# GitHub Actions
- name: Validate reliability
  run: |
    nthlayer validate-slo ${{ matrix.service }}
    nthlayer check-deploy ${{ matrix.service }}

Works with: GitHub Actions, GitLab CI, ArgoCD, Tekton, Jenkins

🎯 How It's Different

Traditional Approach	NthLayer
Set SLOs in isolation	Validate against dependency chains
Alert when budget exhausted	Predict exhaustion with drift detection
Discover missing metrics in incidents	Enforce before deployment
Manual dashboard creation	Generate from spec
"Is this ready?" = opinion	"Is this ready?" = deterministic check

📚 Documentation

Full Documentation - Comprehensive guides and reference.

Guide	Description
Quick Start	Get running in 5 minutes
Drift Detection	Predict SLO exhaustion
Dependency Discovery	Automatic dependency mapping
CI/CD Integration	Pipeline setup
CLI Reference	All commands

🗺️ Roadmap

Agentic Inference (Planned)

nthlayer infer will use a model to analyse a codebase and propose an OpenSRM manifest for it. The model examines the code, identifies services, infers appropriate SLO targets, and generates a draft service.reliability.yaml that NthLayer then validates and generates artifacts from.

This follows Zero Framework Cognition: the model provides judgment (what SLOs does this service need?), and NthLayer provides transport (validate the manifest, generate the monitoring artifacts). Clean boundary between reasoning and deterministic transformation.

OpenSRM Ecosystem

NthLayer is one component in the OpenSRM ecosystem. Each component solves a complete problem independently, and they compose when used together through shared OpenSRM manifests and OTel telemetry conventions.

                        ┌─────────────────────────┐
                        │     OpenSRM Manifest     │
                        │  (the shared contract)   │
                        └────────────┬────────────┘
                                     │
                    reads            │           reads
               ┌─────────────┬──────┴──────┬─────────────┐
               ▼             ▼             ▼             ▼
         ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
         │ MEASURE  │ │>NTHLAYER<│ │CORRELATE │ │ RESPOND  │
         │          │ │          │ │          │ │          │
         │ quality  │ │ generate │ │correlate │ │ incident │
         │+govern   │ │ monitoring│ │ signals  │ │ response │
         │+cost     │ │          │ │          │ │          │
         └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘
              │             │             │             │
              └─────────────┴──────┬──────┴─────────────┘
                                   ▼
                     ┌──────────────────────────┐
                     │      Verdict Store       │
                     │  (shared data substrate) │
                     │ create · resolve · link  │
                     │ accuracy · gaming-check  │
                     └────────────┬─────────────┘
                                  │ OTel side-effects
                                  ▼
                     ┌──────────────────────────┐
                     │    OTel Collector /      │
                     │   Prometheus / Grafana   │
                     └──────────────────────────┘

              Learning loop (post-incident):
              nthlayer-respond findings → manifest updates
              → NthLayer regenerates → nthlayer-measure
              refines → nthlayer-correlate improves → OpenSRM

How NthLayer fits in:

NthLayer reads OpenSRM manifests and generates the monitoring infrastructure (Prometheus rules, Grafana dashboards, PagerDuty config) that the rest of the ecosystem relies on
Verdict operations emit OTel side-effects (gen_ai.decision.*, gen_ai.override.*) that flow into Prometheus. NthLayer generates dashboards for these metrics alongside service dashboards — NthLayer reads from Prometheus, not the Verdict Store directly.
NthLayer exports service topology that nthlayer-correlate uses for topology-aware signal correlation
nthlayer-respond's post-incident findings feed back into NthLayer as rule refinements (alerts that should have fired earlier or didn't fire at all)

Each component works alone. Someone who just needs reliability-as-code adopts NthLayer without needing nthlayer-measure, nthlayer-correlate, or nthlayer-respond.

Component	What it does	Link
OpenSRM	Specification for declaring service reliability requirements	OpenSRM
nthlayer-learn	Data primitive for recording AI judgments and measuring correctness	nthlayer-learn
nthlayer-measure	Quality measurement and governance for AI agents	nthlayer-measure
NthLayer	Generate monitoring infrastructure from manifests (this repo)	nthlayer
nthlayer-correlate	Situational awareness through signal correlation	nthlayer-correlate
nthlayer-respond	Multi-agent incident response	nthlayer-respond

🤝 Contributing

# Install uv (https://docs.astral.sh/uv/)
curl -LsSf https://astral.sh/uv/install.sh | sh

git clone https://github.com/rsionnach/nthlayer.git
cd nthlayer
make setup    # Install deps, start services
make test     # Run tests

See CONTRIBUTING.md for details.

📄 License

MIT - See LICENSE.txt

🙏 Acknowledgments

Built on grafana-foundation-sdk, awesome-prometheus-alerts, pint, and OpenSLO. Inspired by Sloth and autograf.

Name		Name	Last commit message	Last commit date
Latest commit History 439 Commits
.agents		.agents
.beads		.beads
.claude		.claude
.github		.github
alembic		alembic
demo		demo
docs-site		docs-site
docs		docs
documentation		documentation
examples		examples
generated		generated
plans		plans
plugins/backstage-plugin-nthlayer		plugins/backstage-plugin-nthlayer
policies		policies
presentations		presentations
scripts		scripts
specs		specs
src/nthlayer		src/nthlayer
tests		tests
.env.example		.env.example
.env.mock		.env.mock
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENTS.md		AGENTS.md
ATTRIBUTION.md		ATTRIBUTION.md
BACKSTAGE_PLUGIN_SPEC.md		BACKSTAGE_PLUGIN_SPEC.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
DEPENDENCY_DISCOVERY_SPEC.md		DEPENDENCY_DISCOVERY_SPEC.md
DRIFT_DETECTION_SPEC.md		DRIFT_DETECTION_SPEC.md
Dockerfile		Dockerfile
GETTING_STARTED.md		GETTING_STARTED.md
LICENSING_COMPLIANCE.md		LICENSING_COMPLIANCE.md
MCP_SERVER_SPEC.md		MCP_SERVER_SPEC.md
Makefile		Makefile
README.md		README.md
action.yml		action.yml
alembic.ini		alembic.ini
alert_stats.json		alert_stats.json
docker-compose.yml		docker-compose.yml
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
robtest2.yaml		robtest2.yaml
robtest3.yaml		robtest3.yaml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NthLayer

TL;DR

⚠️ The Problem

💡 The Solution

⚡ Core Features

Drift Detection

Dependency-Aware SLO Validation

Deployment Gates

Blast Radius Analysis

Metric Recommendations

Artifact Generation

🚀 Quick Start

Minimal `service.yaml`

🔄 CI/CD Integration

🎯 How It's Different

📚 Documentation

🗺️ Roadmap

Agentic Inference (Planned)

OpenSRM Ecosystem

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Uh oh!

Releases 18

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NthLayer

TL;DR

⚠️ The Problem

💡 The Solution

⚡ Core Features

Drift Detection

Dependency-Aware SLO Validation

Deployment Gates

Blast Radius Analysis

Metric Recommendations

Artifact Generation

🚀 Quick Start

Minimal service.yaml

🔄 CI/CD Integration

🎯 How It's Different

📚 Documentation

🗺️ Roadmap

Agentic Inference (Planned)

OpenSRM Ecosystem

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Topics

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 18

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Minimal `service.yaml`

Packages