Skip to content

feat(telemetry): opt-in PostHog usage analytics for binary installs #980

@Wirasm

Description

@Wirasm

Context

Archon ships as a binary via Homebrew, curl install, and (eventually) archon serve with bundled web UI per #978. We have zero visibility into how users actually use it once they install. This makes prioritization decisions ("which workflows are popular?", "which commands fail most often?", "is anyone using the Codex provider?") guesses rather than informed.

This issue proposes adding opt-in anonymous usage analytics via PostHog so we can answer those questions with data.

Why PostHog

  • Open-source (own deployment possible if cloud is ever a concern)
  • Free tier covers ~1M events/month, plenty for archon's expected scale
  • Official Node.js SDK (posthog-node) — small, batched, async, handles offline gracefully
  • Reverse proxy support for users behind corporate firewalls / ad blockers
  • Per-event property model matches our use case (events with structured data, not just page views)

Privacy posture (the load-bearing part)

This is a developer tool. The users are technical. They will (correctly) be skeptical of telemetry. The implementation must be:

  1. Opt-out by default — telemetry sends nothing until the user explicitly turns it on
  2. Transparentarchon telemetry inspect shows the literal JSON of what would be sent
  3. Easy to disable — multiple opt-out paths (CLI command, config flag, env var, industry-standard DO_NOT_TRACK=1)
  4. Whitelist data, not blacklist — only explicitly allowed properties get sent; everything else is silently dropped
  5. Anonymous by design — no account, no email, no GitHub username, no machine fingerprint beyond a locally-generated random UUID
  6. Documented in plain language — a docs/telemetry.md page that lists every event and every property in plain English

If we cannot meet all six, we should not ship telemetry at all.

Architecture

Library

posthog-node — official PostHog Node.js SDK. Buffered async sender, never blocks the main thread, flushes on shutdown so events do not get lost.

Build-time API key embedding

Composes with the build-time constants refactor in #979. Add BUNDLED_POSTHOG_KEY and BUNDLED_POSTHOG_HOST to packages/paths/src/bundled-build.ts:

export const BUNDLED_IS_BINARY = false;
export const BUNDLED_VERSION = 'dev';
export const BUNDLED_GIT_COMMIT = 'unknown';
export const BUNDLED_POSTHOG_KEY = '';                 // empty in dev = telemetry impossible
export const BUNDLED_POSTHOG_HOST = 'https://eu.posthog.com';

The build script writes the real key from a GitHub Actions secret (POSTHOG_PUBLIC_KEY) when building release binaries:

cat > packages/paths/src/bundled-build.ts << EOF
export const BUNDLED_IS_BINARY = true;
export const BUNDLED_VERSION = '${VERSION}';
export const BUNDLED_GIT_COMMIT = '${GIT_COMMIT}';
export const BUNDLED_POSTHOG_KEY = '${POSTHOG_PUBLIC_KEY:-}';
export const BUNDLED_POSTHOG_HOST = '${POSTHOG_HOST:-https://eu.posthog.com}';
EOF

Critical: dev mode has an empty key. Telemetry can never be sent from a dev clone, even if a developer accidentally enables it. This protects internal usage from polluting production data.

The PostHog public/ingest key (phc_...) is safe to embed in client code — it is write-only ingestion, not the read API key. PostHog explicitly documents this pattern.

Anonymous identity

A UUID v4 generated once per machine and stored at ~/.archon/telemetry-id with mode 0600. Generated only when the user opts in — if telemetry is off, the file does not exist.

// packages/telemetry/src/identity.ts
import { randomUUID } from 'crypto';
import { existsSync, readFileSync, writeFileSync, chmodSync } from 'fs';
import { join } from 'path';
import { getArchonHome } from '@archon/paths';

export function getOrCreateAnonymousId(): string {
  const idPath = join(getArchonHome(), 'telemetry-id');
  if (existsSync(idPath)) {
    return readFileSync(idPath, 'utf8').trim();
  }
  const id = randomUUID();
  writeFileSync(idPath, id, { mode: 0o600 });
  return id;
}

export function clearAnonymousId(): void {
  const idPath = join(getArchonHome(), 'telemetry-id');
  if (existsSync(idPath)) {
    require('fs').unlinkSync(idPath);
  }
}

The ID is not tied to anything personal. Wiping the file resets your identity (archon telemetry clear).

Consent resolution

Multiple opt-in/out signals, resolved in priority order:

// packages/telemetry/src/consent.ts
export function isTelemetryEnabled(): boolean {
  // 1. Industry-standard universal opt-out — always wins
  if (process.env.DO_NOT_TRACK === '1') return false;

  // 2. Explicit env var override (per-invocation)
  if (process.env.ARCHON_NO_TELEMETRY === '1') return false;
  if (process.env.ARCHON_TELEMETRY === '0') return false;
  if (process.env.ARCHON_TELEMETRY === '1') {
    if (!BUNDLED_POSTHOG_KEY) return false;  // dev mode, key not embedded
    return true;
  }

  // 3. Config file
  const config = loadArchonConfig();
  if (config.telemetry?.enabled === false) return false;
  if (config.telemetry?.enabled === true) {
    if (!BUNDLED_POSTHOG_KEY) return false;
    return true;
  }

  // 4. Default: off
  return false;
}

Default order: DO_NOT_TRACK > env var > config file > default off.

DO_NOT_TRACK=1 is the W3C / industry-standard universal opt-out signal. Many privacy-conscious users set this in their shell rc to opt out of all telemetry from all tools. Honoring it is table stakes.

What to capture (events)

Event When Properties
binary.startup Compiled binary boots (just base properties)
cli.command_invoked Start of any CLI command command, subcommand (no args/values)
cli.command_completed End of CLI command command, subcommand, exit_code, duration_ms
workflow.run_started Workflow execution begins workflow_name (sanitized), node_count, provider, is_cli_invoked
workflow.run_completed Workflow ends successfully workflow_name, duration_ms, node_count, nodes_completed
workflow.run_failed Workflow fails workflow_name, error_class, failed_node_type, duration_ms
workflow.node_failed Individual node fails (even if workflow continues) workflow_name, node_type, error_classification (transient/fatal/credit_exhausted), provider
error.uncaught Top-level uncaught exception error_class, where (function name only, no path)
setup.completed First-run setup finishes install_method (homebrew/curl/source), default_assistant

Common properties on every event (auto-included)

{
  archon_version: BUNDLED_VERSION,
  archon_git_commit: BUNDLED_GIT_COMMIT,
  archon_is_binary: BUNDLED_IS_BINARY,
  platform: process.platform,           // 'darwin' | 'linux' | 'win32'
  arch: process.arch,                   // 'arm64' | 'x64'
  bun_version: process.versions.bun ?? 'unknown',
  node_version: process.versions.node ?? 'unknown',
}

What NEVER gets sent

Whitelist approach. Only the explicitly allowed properties get sent. Everything else is silently dropped.

Category Examples Why excluded
File paths cwd, --cwd /home/me/secret-repo Identifies user's filesystem layout
File names workflow file names, command file names Could be sensitive
User-defined workflow names my-internal-deploy Could leak company project names
Branch names feature/customer-x-integration Could leak feature plans
Commit hashes (other than archon's own) repo HEAD, PR head SHA Same
Prompts / messages / chat content Anything the user typed Privacy disaster
Error messages (only error classes) "Failed to connect to internal-vpn.acme.com" Could leak hostnames, internal infra
Env var values All of them Could leak secrets
GitHub repo names acme/proprietary-thing Privacy
Owner names acme-corp Privacy
Workflow definitions YAML content Could leak company process
AI tokens, API keys, OAuth tokens All Catastrophic if leaked

Sanitization helpers

User-defined workflow names get bucketed as 'user-defined' so we can see usage volume without leaking names:

const ALLOWED_WORKFLOW_NAMES = new Set([
  'archon-assist',
  'archon-fix-github-issue',
  'archon-comprehensive-pr-review',
  // ... all bundled defaults from packages/workflows/src/defaults/
]);

export function sanitizeWorkflowName(name: string): string {
  return ALLOWED_WORKFLOW_NAMES.has(name) ? name : 'user-defined';
}

Same approach for command names — bucket non-bundled into 'user-defined'.

Implementation skeleton

New package: packages/telemetry/

packages/telemetry/
├── package.json
└── src/
    ├── index.ts          (public API: trackEvent, shutdownTelemetry)
    ├── client.ts         (PostHog client wrapper, lazy init)
    ├── consent.ts        (opt-in/opt-out resolution)
    ├── identity.ts       (UUID file management)
    ├── events.ts         (typed event helpers)
    ├── sanitize.ts       (whitelist filters for workflow names, commands)
    ├── consent.test.ts
    ├── sanitize.test.ts
    └── identity.test.ts

Public API

// packages/telemetry/src/index.ts
export { trackEvent, shutdownTelemetry, isTelemetryEnabled } from './client';
export { TELEMETRY_EVENT_NAMES } from './events';

Client wrapper

// packages/telemetry/src/client.ts
import { PostHog } from 'posthog-node';
import { BUNDLED_POSTHOG_KEY, BUNDLED_POSTHOG_HOST, BUNDLED_VERSION, BUNDLED_GIT_COMMIT, BUNDLED_IS_BINARY } from '@archon/paths';
import { getOrCreateAnonymousId } from './identity';
import { isTelemetryEnabled } from './consent';

let client: PostHog | null = null;

function getClient(): PostHog | null {
  if (!isTelemetryEnabled()) return null;
  if (!BUNDLED_POSTHOG_KEY) return null;
  if (client) return client;

  client = new PostHog(BUNDLED_POSTHOG_KEY, {
    host: BUNDLED_POSTHOG_HOST,
    flushAt: 20,            // batch up to 20 events
    flushInterval: 10000,   // or every 10s
    requestTimeout: 5000,   // never block CLI shutdown for more than 5s
  });
  return client;
}

export function trackEvent(event: string, properties: Record<string, unknown> = {}): void {
  const ph = getClient();
  if (!ph) return;

  try {
    ph.capture({
      distinctId: getOrCreateAnonymousId(),
      event,
      properties: {
        ...buildBaseProperties(),
        ...properties,
      },
    });
  } catch {
    // Telemetry must NEVER throw or block. Silently swallow errors.
  }
}

function buildBaseProperties(): Record<string, unknown> {
  return {
    archon_version: BUNDLED_VERSION,
    archon_git_commit: BUNDLED_GIT_COMMIT,
    archon_is_binary: BUNDLED_IS_BINARY,
    platform: process.platform,
    arch: process.arch,
    bun_version: process.versions.bun ?? 'unknown',
    node_version: process.versions.node ?? 'unknown',
  };
}

export async function shutdownTelemetry(): Promise<void> {
  if (!client) return;
  try {
    await client.shutdown();
  } catch {
    // never throw on shutdown
  }
  client = null;
}

export { isTelemetryEnabled };

Wiring into the CLI

// packages/cli/src/cli.ts (after dotenv setup)
import { trackEvent, shutdownTelemetry } from '@archon/telemetry';

trackEvent('binary.startup');
process.on('exit', () => { void shutdownTelemetry(); });
process.on('SIGINT', () => { void shutdownTelemetry().then(() => process.exit(130)); });
process.on('SIGTERM', () => { void shutdownTelemetry().then(() => process.exit(143)); });

Then individual command handlers add their own trackEvent calls. Example:

// packages/cli/src/commands/workflow.ts (run subcommand)
const startTime = Date.now();
trackEvent('cli.command_invoked', { command: 'workflow', subcommand: 'run' });

try {
  await runWorkflow(...);
  trackEvent('cli.command_completed', {
    command: 'workflow',
    subcommand: 'run',
    exit_code: 0,
    duration_ms: Date.now() - startTime,
  });
} catch (err) {
  trackEvent('cli.command_completed', {
    command: 'workflow',
    subcommand: 'run',
    exit_code: 1,
    duration_ms: Date.now() - startTime,
  });
  trackEvent('error.uncaught', {
    error_class: err.constructor.name,
    where: 'workflow.run',
  });
  throw err;
}

User-facing CLI commands

Three new commands under archon telemetry:

archon telemetry on                # opt in (writes config + creates anonymous ID)
archon telemetry off               # opt out (writes config, leaves anonymous ID alone)
archon telemetry status            # show current state, source of decision, anonymous ID
archon telemetry inspect           # show last 10 events that WOULD be sent (in JSON, never actually sends)
archon telemetry clear             # delete anonymous ID file (resets identity, doesn't change opt-in state)
archon telemetry policy            # print the link to docs/telemetry.md

status example output:

Telemetry: ENABLED
  Source: ~/.archon/config.yaml (telemetry.enabled = true)
  Anonymous ID: 7f3b2a8c-... (created 2026-04-08)
  PostHog host: https://eu.posthog.com
  PostHog key: phc_xxx... (embedded at build time)
  Last event: cli.command_completed (3 minutes ago)

To disable: archon telemetry off
To inspect what is being sent: archon telemetry inspect
Full policy: https://github.com/coleam00/Archon/blob/main/docs/telemetry.md

inspect example output:

{
  "events": [
    {
      "event": "cli.command_invoked",
      "distinctId": "7f3b2a8c-...",
      "properties": {
        "command": "workflow",
        "subcommand": "run",
        "archon_version": "0.2.13",
        "archon_git_commit": "abc1234",
        "archon_is_binary": true,
        "platform": "darwin",
        "arch": "arm64",
        "bun_version": "1.3.11",
        "node_version": "20.x"
      }
    },
    ...
  ]
}

First-run notice

When a user runs any archon command for the first time after install (detected by absence of ~/.archon/.first-run-completed), print this notice once:

─────────────────────────────────────────────────────────────────
Archon collects no usage data by default.

If you'd like to share anonymous usage events to help improve archon:
  archon telemetry on

Review what would be sent at any time:
  archon telemetry inspect

Full telemetry policy:
  https://github.com/coleam00/Archon/blob/main/docs/telemetry.md
─────────────────────────────────────────────────────────────────

Then write ~/.archon/.first-run-completed so the notice never repeats.

Do NOT prompt interactively — that breaks scripts and CI. The notice is informational only; users have to explicitly run telemetry on to enable.

Reverse proxy (recommended for production deployment)

PostHog cloud lives at app.posthog.com / eu.posthog.com. These are blocked by:

  • Many corporate firewalls
  • Ad blockers (uBlock Origin, Pi-hole)
  • Privacy-focused DNS resolvers (NextDNS, AdGuard)
  • DNS-based VPN profiles

If we want telemetry to actually arrive from real users, set up a reverse proxy:

  • Cloudflare Worker (recommended): ~10 lines of JS, free tier, custom subdomain like telemetry.archon.dev
  • Self-hosted nginx/Caddy: more control, more maintenance
  • Vercel edge function: if you already have Vercel infrastructure

PostHog officially documents this pattern: https://posthog.com/docs/advanced/proxy

The BUNDLED_POSTHOG_HOST constant in bundled-build.ts gets set to the proxy URL at release time:

POSTHOG_HOST=https://telemetry.archon.dev bash scripts/build-binaries.sh

Documentation: docs/telemetry.md

A plain-language page covering:

  • What telemetry is and why we want it
  • The exact list of events and properties (autogenerated from events.ts?)
  • The exact list of things that are NEVER sent (the whitelist exclusion list)
  • How to opt in and out
  • How to inspect what is being sent
  • How to wipe your anonymous ID
  • The reverse proxy URL
  • Where the data lives (PostHog cloud EU region)
  • Data retention policy (PostHog default: 7 years; we should set lower if possible)
  • How to request data deletion (email + anonymous ID)

This page is the load-bearing trust artifact. Skimping on it undermines the whole feature.

Coupling with other issues

#979 (build-time constants)

Composes naturally. This issue extends bundled-build.ts with two more constants (BUNDLED_POSTHOG_KEY, BUNDLED_POSTHOG_HOST). Both issues touch the same file. Recommended sequence: land #979 first, then this on top.

#978 (web UI distribution via Option E)

Not a hard blocker. Three orthogonal layers:

Telemetry layer Depends on
CLI events (workflow run, binary startup, errors) Just the CLI binary (exists today)
Server events (API requests, workflow runs from web/Slack/etc.) Server running somewhere — works in dev clone today, would also work in #978's binary
Web UI events (page views, button clicks, feature usage) posthog-js in packages/web — completely independent of how the server is shipped

You could instrument all three TODAY without #978, by adding posthog-js to the dev-clone web UI. Server telemetry already works wherever the server runs. Only "binary users get web UI telemetry" is gated on #978.

Recommendation: ship CLI telemetry first (this issue, narrow scope). Add server telemetry independently when there is bandwidth. Add web UI telemetry as a third orthogonal piece. Do not gate this issue on #978 — that delays data collection by weeks.

CI / release workflow

The release workflow (.github/workflows/release.yml) needs:

  • A new POSTHOG_PUBLIC_KEY secret in the repo (set by maintainer)
  • Optionally POSTHOG_HOST if using a reverse proxy
  • The build script wrapper to pass them as env vars

Both are 2-line additions to the workflow.

Phased plan

Phase 1 — Infrastructure (1 day)

  • Add posthog-node dependency to a new packages/telemetry/ package
  • Implement consent.ts, identity.ts, client.ts, sanitize.ts
  • Unit tests for consent resolution, identity management, sanitization
  • Public API exported via index.ts

Phase 2 — CLI integration (0.5 day)

  • Add archon telemetry on/off/status/inspect/clear/policy commands
  • Wire trackEvent calls into CLI startup, command handlers, error catchers
  • Add binary.startup, cli.command_invoked, cli.command_completed, error.uncaught events
  • Add first-run notice

Phase 3 — Workflow events (0.5 day)

  • Wire workflow events from packages/workflows/src/executor.ts and dag-executor.ts
  • Add workflow.run_started, workflow.run_completed, workflow.run_failed, workflow.node_failed
  • Sanitize workflow names against the bundled-defaults whitelist

Phase 4 — Build script + CI (0.5 day)

  • Update scripts/build-binaries.sh to write PostHog constants from env vars
  • Update .github/workflows/release.yml to pass POSTHOG_PUBLIC_KEY from secrets
  • Add a smoke test that verifies the binary correctly disables telemetry when no key is embedded

Phase 5 — Reverse proxy + docs (0.5 day)

  • Set up Cloudflare Worker reverse proxy (or whichever proxy backend is preferred)
  • Write docs/telemetry.md
  • Update README with link to telemetry policy
  • Add to release notes

Phase 6 — Validation (0.5 day)

  • Build a binary with telemetry on, opt in, run a few commands, verify events arrive in PostHog
  • Build a binary with telemetry on, opt out, verify no events arrive
  • Build a binary with telemetry on, set DO_NOT_TRACK=1, verify no events arrive
  • Verify the inspect command shows the same data PostHog receives
  • Review every event property against the whitelist exclusion list

Total: ~3.5 days of focused work.

Files to change

File Action
packages/paths/src/bundled-build.ts extend (add BUNDLED_POSTHOG_KEY, BUNDLED_POSTHOG_HOST)
packages/telemetry/ new package
packages/cli/src/commands/telemetry.ts new command
packages/cli/src/commands/setup.ts add first-run notice
packages/cli/src/cli.ts wire binary.startup, shutdown handlers
packages/cli/src/commands/workflow.ts wire workflow CLI events
packages/workflows/src/executor.ts wire workflow run events
packages/workflows/src/dag-executor.ts wire node failure events
scripts/build-binaries.sh pass PostHog key/host from env
.github/workflows/release.yml secret env var injection
docs/telemetry.md new policy doc
packages/docs-web/src/content/docs/reference/telemetry.md docs site version
README.md link to telemetry policy
CHANGELOG.md new feature entry

Open questions for discussion

  1. EU vs US PostHog hosting — recommend EU for GDPR alignment. Confirm.
  2. Reverse proxy backend — Cloudflare Worker is cheapest and fastest. Do we want it?
  3. Should telemetry-on be the default? Strongly recommend NO. Opt-in is the only ethical default for a developer tool.
  4. Should we capture telemetry from archon-stable (released binary) AND archon (bun link dev binary)? Recommend NO for dev — empty key in dev mode prevents this anyway.
  5. Data retention — PostHog default is 7 years. Recommend setting to 1 year max.
  6. Do we want server-side telemetry events from packages/server/src/? Same @archon/telemetry package can be imported. Recommend YES once the server is in scope (depends on feat(distribution): one-command web UI install via lazy-fetch from release tarball #978 for binary; can ship for dev clone immediately).
  7. Web UI telemetry via posthog-js? Recommend YES, separate sub-issue. Independent of CLI.

Out of scope (deferred to follow-ups)

  • Server-side telemetry from packages/server/ — separate sub-issue once consensus on this one
  • Web UI telemetry via posthog-js — separate sub-issue, independent of CLI binary
  • Custom dashboards in PostHog — operational concern, not a feature
  • Automatic anomaly detection — too speculative
  • A/B testing infrastructure — different feature entirely
  • Funnel analysis on user journeys — data first, analysis second

Success criteria

  • Default install of the next archon release sends ZERO telemetry events
  • After archon telemetry on, the next CLI invocation shows up in PostHog within 30 seconds
  • archon telemetry inspect output matches what PostHog actually receives (verified by spot-checking 10 events)
  • Setting DO_NOT_TRACK=1 in the shell rc disables all telemetry without any other config
  • A user who never runs archon telemetry on has no ~/.archon/telemetry-id file (no anonymous identity created)
  • The privacy whitelist is enforced — adding a new event property fails the lint/build if the property is not on the allowed list (stretch goal: make this a TypeScript type constraint)
  • The reverse proxy is documented and the binary points at it by default

Related

Metadata

Metadata

Assignees

Labels

P2Medium priority - Backlog, when time permitsarchitectureArchitectural changes and designarea: infraDocker, deployment, CI/CDeffort/mediumFew files, one domain or module, some coordination neededfeatureNew functionality (planned)

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions