Inspiration

Every team has that Slack message they dread: "is prod on fire?" — sent at 2 AM, minutes after a deploy. We've all lived it. The gap between git push and knowing whether the release actually landed safely is where outages breed. You're refreshing three dashboards, grepping logs across services, and asking teammates "are you seeing this too?"

Splunk already has the data. Every error, every timeout, every 5xx — it's all there, indexed and searchable. The problem isn't data. The problem is that nobody runs the same three Splunk searches every single time they ship. It's tedious, inconsistent, and humans forget.

We built Release Preflight to make that check automatic, repeatable, and opinionated. One click, one risk score, one verdict. Before you merge to main, before you page on-call — run the preflight.

What it does

Release Preflight is a release-risk gate that answers one question: should this release ship?

Paste a GitHub repo, branch, or commit link. The app:

  1. Auto-resolves the service name, environment, repo, branch, commit SHA, and even the package.json version — no manual form-filling.
  2. Writes a deployment marker to Splunk via HEC so future queries can trace exactly when this release landed.
  3. Runs three SPL searches through the Splunk MCP server in parallel: a summary aggregation, an error breakdown by host, and a recent-event timeline.
  4. Scores the release on a 0–100 risk scale across four dimensions: event volume pressure, explicit errors (error/critical/fatal severity), failure-language signals (timeout, exception, panic, OOM, 5xx, connection refused), and blast radius (affected host count).
  5. Returns a verdict: Ship, Ship Guarded, Canary Only, or Hold Release — backed by specific evidence rows and remediation commands.
  6. Optionally calls Splunk AI Assistant through MCP for a natural-language release decision narrative.

All Splunk credentials stay server-side. The browser never sees a token.

How we built it

Stack: Next.js 15 (App Router) + TypeScript + Tailwind CSS, deployed on Vercel.

Architecture at a glance:

Layer What it does
POST /api/deployments Ingests a deployment marker event into Splunk HEC with raw Node.js http/https — no extra HTTP client dependency
POST /api/preflight Orchestrates three parallel SPL queries through the Splunk MCP JSON-RPC client, optionally calls the AI Assistant tool, then assembles the risk report
mcp.ts A hand-rolled MCP client that speaks JSON-RPC 2.0, handles streamable HTTP responses, and coerces tool results into typed rows
hec.ts A raw HEC sender that constructs the /services/collector/event payload with deployment metadata
spl.ts Builds three SPL searches scoped to service × environment × release with SPL value escaping, deployment-marker deduplication, and normalized severity extraction
report.ts Receives Splunk rows, computes the multi-signal risk score, builds evidence cards and remediation actions

Key engineering decisions:

  • No Splunk secrets on the client. API routes read from server env vars. Next.js API routes on Vercel make this zero-config.
  • Deterministic fallback. When RELEASE_PREFLIGHT_REQUIRE_AI=false, the narrative is built from the same Splunk rows — it never fabricates evidence.
  • SPL value escaping. All user-supplied values (service, environment, release ID) are escaped before injection into SPL, preventing query injection.
  • Deployment marker deduplication. Markers are identified by event_type="deployment" and deduplicated on a composite key so they don't inflate error counts.
  • Zod validation on every boundary. Request schemas, deployment events, evidence events, and the final report are all validated.

Challenges we ran into

1. MCP streamable HTTP parsing. The Splunk MCP server returns streamable HTTP responses — event:/data: lines, not plain JSON. We had to write a parser that handles both streaming and non-streaming responses, extracts the final data: line, and tolerates edge cases like empty payloads.

2. Deployment marker deduplication in SPL. If you send two markers for the same release (e.g., re-running the tool), they shouldn't count as errors. We built a composite dedup key that distinguishes markers from real events, using md5() on raw text for non-marker events and a stable marker:* key for markers. Getting this right in pure SPL — without a subsearch — took several iterations.

3. Making the risk score meaningful, not just a number. Early versions produced scores that didn't correlate with real risk. We landed on a four-pressure model (volume, errors, signals, hosts) with clamped ceilings so no single dimension can dominate, and calibrated the thresholds against real deployment patterns.

4. Vercel + Splunk connectivity. Trial Splunk Cloud stacks sometimes serve self-signed certificates. We added SPLUNK_HEC_ALLOW_SELF_SIGNED with a raw Node.js HTTPS agent override (rejectUnauthorized: false) — side-stepping the need for custom CA chains in a serverless environment.

5. Keeping the form usable while packing in precision. We auto-resolve GitHub links to populate service, repo, branch, commit SHA, and read package.json for the version — all from a single paste. The parsing handles HTTPS URLs, SSH git URLs, commit links, and tree/branch links. Debugging the GitHub API call chain (repo → branch → commit → package.json) without overwhelming the user with loading states was a UX challenge.

Accomplishments that we're proud of

  • Zero-fabrication guarantee. Every piece of evidence in the report comes from Splunk rows. If AI is unavailable, the deterministic narrative still uses only returned data. The app will tell you "no evidence returned" rather than hallucinate.
  • The multi-pressure risk model. Four independent signals (volume, errors, failure language, blast radius) combining into one score feels obvious in hindsight, but getting the ceilings and weights right so it surfaces real risk without false positives was genuinely hard.
  • Raw Node.js HTTP for HEC. No axios, no node-fetch. Just http/https with proper timeout handling and self-signed cert support. The HEC client is ~120 lines and does exactly what it needs to.
  • The GitHub link parser. Paste anything — https://github.com/owner/repo, git@github.com:owner/repo.git, a commit URL, a tree URL — and the form fills itself. It even reads package.json from the GitHub API to pull the version as the release ID.
  • Full test coverage of the risk engine. The SPL builders are tested for injection safety. The report builder is tested with real-shaped Splunk rows. The tests verify that evidence is never fabricated.

What we learned

  • MCP is the right abstraction for Splunk. Instead of embedding Splunk SDKs or REST clients, we treat Splunk as a tool-calling server. The MCP JSON-RPC protocol gave us splunk_run_query and saia_ask_splunk_question as clean interfaces. Adding a new query is just building SPL — no API surface changes.
  • SPL is a programming language, treat it like one. Value escaping, dedup logic, normalized severity extraction — these are code. Keeping the SPL builders in a dedicated module (spl.ts) made them testable and reviewable.
  • Serverless + Splunk works if you keep it lean. Vercel's 60-second function timeout (extended with maxDuration) is plenty for three parallel SPL queries when you use Promise.all. The key is not blocking on sequential round-trips.
  • Risk scoring needs calibration against real data. Abstract formulas produce abstract numbers. We iterated the score thresholds (80/60/35) by running against scenarios with known outcomes — a release with 3 errors on 1 host vs. a release with 88 errors on 6 hosts should feel different.
  • Fallbacks are features, not afterthoughts. The AI narrative is nice, but the deterministic fallback means the tool is never broken. RELEASE_PREFLIGHT_REQUIRE_AI=false is the safe default.

What's next for Preflight

  • Splunk Enterprise Security integration. Wire up notable-event indexes as a dedicated security evidence source. If a release coincides with a security notable, that should spike the risk score independently.
  • Historical baseline comparison. Compare this release's metrics against the last N releases for the same service × environment. A release with 10 errors might be normal for this service — or it might be 10× the baseline.
  • Slack/Discord bot. @preflight ship owner/repo#main in chat → risk report posted to the channel. Make it the thing teams run without leaving their workflow.
  • Canary-aware scoring. If a release is deployed to a canary host first, Preflight should score the canary window separately and gate the full rollout.
  • GitHub Check Run integration. Post the risk report as a Check Run on the commit/PR so it blocks merges when the risk score exceeds a configurable threshold.
  • Multi-service release correlation. When a platform release touches 5 services, Preflight should correlate evidence across all of them and surface cross-service blast radius.
  • Splunk Observability Cloud data sources. Pull in APM metrics (latency, error rate, throughput) alongside log-based evidence for a richer risk picture.## Inspiration

Built With

Share this project:

Updates