DEPLOY ANYWHERE. ONE ARTIFACT, EVERY ENVIRONMENT.
The same 50 MB Rust binary runs in Docker, Kubernetes, bare metal, or an air-gapped subnet, with PostgreSQL as the only runtime dependency and a profile YAML as the only environment delta.
Single-Artifact Perimeter
An ops team running governed Claude in production usually inherits a stack of seven services. A web tier, a session cache, a rate-limit store, a job runner, a search index, a log shipper, an identity service. Each one is a deploy path, an upgrade cadence, and an on-call surface. A CISO who approves that stack for one environment then has to re-approve each moving part for staging, for an air-gapped VM, and for a developer laptop, because the variance between deployments is where governance drift hides.
systemprompt.io collapses that stack into one process. The HTTP server, the job scheduler, the template engine, the JWT middleware, the tiered rate limiter, analytics, and cost tracking all link into the same Rust binary. PostgreSQL is the only runtime dependency. Session state lives in the database. Rate-limit state lives in process memory. No Redis, no Kafka, no Elasticsearch sits in the dependency graph, so the CISO's approval covers one artifact rather than seven services.
The entry point is nine lines. Two function calls bring up routes, schemas, jobs, health checks, and the CLI. A linker pin keeps extension registrations from being stripped under link-time optimisation, then the CLI dispatcher runs. A staff engineer can read the full bootstrap in under a minute and map every deploy target back to the same two calls. The full source of the entry point is named in the reference below.
- One Artifact, Approved Once — A single compiled Rust binary plus PostgreSQL is the whole stack. No Redis cluster to approve, no Kafka broker to scan, no Elasticsearch index to review. The same artifact that ships to production ships to staging and to a laptop, so an air-gap sign-off does not repeat per environment.
- No Sidecars In The Deployment — No service mesh, no proxy container, no init container, no log forwarder. The job scheduler runs inside the binary on an async cron runtime, so no separate scheduler process needs monitoring and no second binary can drift out of sync with the main one during an incident.
- In-Process Rate Limiter — The in-process rate limiter holds a keyed token bucket per auth tier. State lives in process memory, so adding a replica does not mean operating a Redis cluster for coordination, and a runaway agent on one node cannot drain a shared counter for every other node.
- main.rs (nine-line entry point) The full application entry point. Linker pin then CLI dispatcher.
- lib.rs (__force_extension_link) Linker pin that keeps inventory-registered extensions alive under link-time optimisation.
- rate_limit.rs (TieredRateLimiter) In-process rate limiter backed by the governor crate, one keyed bucket per tier.
- crates/entry/api/ HTTP server and API surface, linked into the same binary as the CLI.
- crates/app/scheduler/ Cron-style job runner on the same Tokio runtime as the HTTP server.
- crates/domain/templates/ Template engine linked in, no external rendering service.
- crates/domain/analytics/ Analytics and session tracking written directly to PostgreSQL.
- crates/infra/security/ Authentication, JWT validation, and scanner detection, all in-binary.
- crates/infra/database/ PostgreSQL connection pool, the only runtime dependency at startup.
Linker-Section Extensions
A team adding a custom capability to a typical AI stack writes a separate service, a Dockerfile, a deploy pipeline, and an interface contract, and then carries the contract drift forever. Every upgrade has to re-certify the host plus every plugin against a new ABI, and a production incident can land on either side of the service boundary. The build-vs-buy answer for a CTO is whether they own that coordination or push it down to the linker.
systemprompt.io pushes it down to the linker. A custom capability is a crate that implements the extension trait and registers itself with a registration macro. The macro emits a factory into a linker section at compile time. At startup the registry walks that section once and instantiates each extension. No classpath scan, no dynamic loader, no host-plugin ABI to version. A staff engineer verifies the mechanism by opening the trait and the macro in the references below. The wire between host and extension is a compiled symbol, not a process boundary.
The same pattern ships in the box. The template links three extensions into the binary (web, marketplace, email), and the host provides the infrastructure extensions for database, logging, analytics, files, users, AI, MCP, OAuth, content, agents, and the scheduler. A custom extension follows the same recipe. Implement the trait, call the macro, recompile, redeploy the same artifact. The CISO approving the rebuild is approving one binary, not a host plus a plugin catalogue.
- Compile-Time Discovery — The registration macro writes a factory into a linker section, and the registry walks that section at startup. No classpath scan, no reflection, no runtime plugin loader that a supply-chain compromise could substitute a malicious crate into post-build.
- Typed Extension Trait — The extension trait defines the contribution surfaces for schemas, jobs, routes, providers, renderers, and dependencies. Each extension implements only the surfaces it contributes, and the rest fall through to default implementations, so a one-purpose extension does not pay the cost of an empty method stub for every unused surface.
- Template Links Web, Marketplace, Email — The systemprompt-web template links the web, marketplace, and email library extensions into the same binary as the host. A custom extension follows the same recipe (implement the trait, call the macro, recompile), and the deploy target stays the same artifact on the same host.
- Extension trait Trait every extension implements. Schemas, jobs, routes, providers, renderers, most with defaults.
- register_extension! macro Registration macro. Writes a factory into a linker section at compile time.
- registry/mod.rs Registration type and inventory collector the registry walks at startup.
- ExtensionRegistry::discover Discovery function. Iterates linker-section registrations once and instantiates each extension.
- WebExtension Web library extension linked into the template binary.
- MarketplaceExtension Marketplace library extension linked into the template binary.
- SystempromptExtension MCP library extension linked into the template binary.
Git-Resident Profiles
Environment drift is the slow killer of self-hosted infrastructure. Local config diverges from staging, staging diverges from production, and a config-only change ships an outage nobody spots in review because the diff lives in a config server, not in the repo. A CTO who asks "what is different between prod and the air-gapped cluster" should get the answer from git log, not from a screenshot.
systemprompt.io binds the binary to a profile through one environment variable. At startup, the profile loader reads the chosen profile name, loads the matching YAML file, and stores the result in a global one-time cell. Every subsystem reads from that single value for its configuration. Rate-limit tiers, JWT issuer and audience, log level, database connection, and storage paths all resolve from the same profile. Switching environments means changing the variable and restarting the process. The binary does not change, so an air-gap approval granted against one SHA is still valid against the staging run of the same SHA.
A profile is a directory checked into version control. The local profile disables rate limits and turns logging up for developer ergonomics. The production profile sets tiered rate limits, JSON logging, and a JWT configuration with issuer, audiences, and expiration. Per-region or per-tenant profiles follow the same shape. A CISO asking "what changed in the air-gapped profile last quarter" reads a git diff, not a change-ticket trail.
- One Variable Selects The Configuration — A single environment variable selects the profile, and the profile loader stores it in a one-time global at startup, so every subsystem reads from the same value. No per-service config server, no risk that the JWT middleware and the rate limiter disagree about which environment they are in.
- Profiles In Git, Diffs In Review — A profile is a YAML file carrying JWT issuer, audiences, expiration, per-tier rate limits, log level, and environment. Changes are visible at review time, not at incident time, so a security review pulls the profile diff from the same pull request as the code diff.
- Per-Tenant, Per-Region, Same Binary — Each profile carries its own tenant_id, database connection, and rate-limit tiers. The same binary can serve a directory of tenants or regions, so a white-label deployment is a profile directory plus a shared binary, not a branded fork of the codebase.
- ProfileBootstrap::init Profile loader. Reads the env variable and stores the YAML in a one-time global.
- profile/mod.rs Profile model. Defines the YAML shape a profile directory must satisfy.
- rate_limits.rs Per-tier rate-limit configuration. Tier multipliers are YAML values, not code.
- services/config/config.yaml Concrete configuration checked into the template repo, showing the canonical YAML shape.
CLI And Server Share A Binary
Most stacks ship one binary for the server, another for the CLI, and a third dashboard image to operate it. Three things to package, three to version, three to keep in sync on upgrade. A staff engineer running an incident at 3am has to remember which binary on which host has which subcommands. systemprompt.io is one artifact playing all three roles, so the command a developer runs on a laptop is the command an operator runs in production.
The same binary parses subcommands and dispatches them. systemprompt services start --foreground brings up the HTTP server. systemprompt admin agents list runs against the database. systemprompt infra logs view queries the log store. The CLI is not a wrapper around the server. It is the server invoked with a different subcommand, so an admin task in production and a debug run on a laptop hit the same code paths and the same profile loader.
The production container reflects that. The image starts from debian:bookworm-slim, installs two system libraries (libpq5 for the PostgreSQL client and libssl3 for TLS), copies the pre-built binary into /app/bin/, sets a healthcheck hitting the health endpoint, and runs the entrypoint. No Rust toolchain in the image, no multi-stage compile. A CISO doing a supply-chain review sees a binary and two library packages, not a compiler and a build cache, which keeps the attack surface of the production image small.
- CLI And Server Share A Binary — The CLI dispatcher parses the subcommand and routes into either the API server or an admin operation. Same binary, same code paths, same profile loader, so an operator tightening a rate limit from the CLI hits the same configuration surface as the running server.
- Slim Production Image — The container is debian:bookworm-slim with libpq5 and libssl3 and the pre-built binary copied in. No Rust compiler, no build cache, no multi-stage layers an attacker could squat on. A supply-chain review covers two system libraries and one binary.
- Readiness Signal For Kubernetes — The server flips an atomic readiness flag and broadcasts when it is accepting connections, so a Kubernetes readiness probe does not have to time-box its guess. A deploy rolls forward as soon as the binary is actually serving, not when a sleep timer says it should be.
- cli/lib.rs CLI dispatcher. Routes a subcommand into the server or an admin operation.
- api/lib.rs API server startup. Shares the profile loader and connection pool with the CLI.
- readiness.rs Readiness signal. Flips an atomic flag when the listener accepts connections.
- Dockerfile Production image. debian:bookworm-slim, libpq5, libssl3, pre-built binary, healthcheck.
- demo/00-preflight.sh Startup preflight script. Runs the same binary with the chosen profile selected.
Shared-Nothing Replicas
Scaling a typical AI gateway means scaling the cache and the session store with it. Add a replica, add capacity to Redis, watch for hot keys. A CTO signing off on an in-cluster deployment ends up approving three distributed systems instead of one, and every one of them is a separate on-call surface. systemprompt.io takes those moving parts off the table by keeping per-request state stateless and per-process state local, so horizontal scaling is N binaries behind a load balancer.
JWT validation runs inside the request. The JWT service constructs a local decoding key from the profile secret once at startup, and every request verifies signature and expiry against that key without touching I/O. No session store sits in front of the binary and no external auth service is called per request, so any replica can serve any request and a lost pod does not strand a session on a remote cache.
Rate limiting is local too. Each replica meters its own share of traffic through the in-process tiered limiter, with separate token buckets per auth tier so a batch of MCP tool calls cannot crowd out an admin invocation at the same replica. A CTO who needs globally coordinated limits can still layer an upstream reverse proxy, but the binary does not require one to deploy safely.
Health and readiness ride on the same binary. The health endpoint verifies database connectivity for liveness probes. The readiness layer flips an atomic flag and broadcasts when the server starts accepting connections, and the database layer manages its own connection pool so no external pooler (PgBouncer, pgpool) has to be deployed and approved alongside.
- Stateless JWT Validation — The JWT service verifies signature and expiry against a local decoding key loaded once from the profile secret. No database lookup per request, no session store, no external auth call, so a request latency budget does not spend milliseconds talking to a token service before doing any work.
- Per-Tier Local Rate Limiting — The in-process tiered limiter holds one token bucket per auth tier, sized by the profile. Replicas meter independently, so a batch of MCP calls on one replica cannot drain an admin budget on another, and adding a replica does not also add pressure on a Redis cluster.
- Health And Readiness In-Binary — The liveness endpoint checks database connectivity. The readiness layer broadcasts when the HTTP listener is accepting connections. Probe wiring is one healthcheck line in the Dockerfile, not a sidecar that could be version-skewed against the server it is checking.
- JwtService::validate_token JWT validation against a local decoding key. Signature and expiry, no I/O.
- JwtService::generate_admin_token Self-issued admin tokens signed locally with HS256 so tokens verify in-process; no round-trip to an identity provider.
- TieredRateLimiter Per-tier limiter. One token bucket per auth tier, state local to the process.
- rate_limits.rs Tier rate limits from the profile. Base rates and multipliers as YAML values.
- health.rs Liveness endpoint. Verifies database connectivity so probe failure means real outage.
- readiness.rs Readiness layer. Atomic flag flipped when the listener accepts connections.
- postgres/mod.rs PostgreSQL connection pool managed in the binary. No external pooler.
Air-Gap As Configuration
A regulated team picking AI infrastructure asks one question early. Can this run somewhere we trust, including a network with no internet route? A vendor whose air-gapped build is a separate code branch forces the CISO to re-approve that branch against a different supply chain. systemprompt.io answers the question by being a binary, a database, and a profile, and nothing else. The same artifact runs in every environment, and air-gapping is a profile choice, not a fork.
The binary runs on any Linux x86_64 host with libpq5 and libssl3 available. The container image is the same binary on debian:bookworm-slim. No cloud-specific packaging, no environment-specific build path, so the CISO who approved the production SHA has approved the staging SHA and the air-gapped SHA as well.
Air-gapped deployment is a configuration, not a source-code fork. The binary is its own token issuer. Admin tokens are signed locally with HS256 using the profile secret, so tokens verify in-process without Auth0, Okta, or an external identity service in the loop. Logging writes to PostgreSQL. The only outbound network calls are to PostgreSQL and to whichever AI providers the profile explicitly configures, so an auditor can enumerate the perimeter of a deployment from one profile file, not from a network capture.
- Docker And Kubernetes — The debian:bookworm-slim image ships with a healthcheck hitting the health endpoint. No init containers, no service mesh, no sidecar, so a Kubernetes deployment manifest is one container image and a PostgreSQL connection string.
- Bare Metal And VMs — Copy the binary to a Linux host with libpq5 and libssl3 installed, and run it in the foreground. No container runtime required, so a regulated environment that forbids Docker can run the same artifact the cloud deployment runs.
- Air-Gapped Networks — Admin tokens are signed locally with the profile secret, so no external identity provider is required. Logging goes to PostgreSQL, and outbound network calls are scoped to PostgreSQL plus whichever AI providers the profile explicitly configures. An auditor enumerates the perimeter from one file.
- Dockerfile Production image. debian:bookworm-slim, libpq5, libssl3, binary, healthcheck.
- JwtService::generate_admin_token Self-issued admin tokens signed locally. No external IdP required.
- postgres/mod.rs PostgreSQL connection entry point. Binary talks to the database directly.
- session.rs Session storage. Sessions are rows in PostgreSQL, not a cache.
- Cargo.toml Dependency manifest. Auditable list of every crate linked into the binary.
Founder-led. Self-service first.
No sales team. No demo theatre. The template is free to evaluate — if it solves your problem, we talk.
Who we are
One founder, one binary, full IP ownership. Every line of Rust, every governance rule, every MCP integration — written in-house. Two years of building AI governance infrastructure from first principles. No venture capital dictating roadmap. No advisory board approving features.
How to engage
Evaluate
Clone the template from GitHub. Run it locally with Docker or compile from source. Full governance pipeline.
Talk
Once you have seen the governance pipeline running, book a meeting to discuss your specific requirements — technical implementation, enterprise licensing, or custom integrations.
Deploy
The binary and extension code run on your infrastructure. Perpetual licence, source-available under BSL-1.1, with support and update agreements tailored to your compliance requirements.
One binary. One database. Your infrastructure.
Clone the template, link your extensions into the same binary, and deploy the same artifact to every environment.