Skip to content

Fix logs#362

Merged
breardon2011 merged 1 commit into
mainfrom
fix/worker-env-axiom-token
Jun 10, 2026
Merged

Fix logs#362
breardon2011 merged 1 commit into
mainfrom
fix/worker-env-axiom-token

Conversation

@breardon2011

Copy link
Copy Markdown
Contributor

Restore Axiom + Key Vault env vars in worker provisioning

What broke

Sandbox logs stopped showing up in the dashboard and oc logs — the Logs panel
shows "No logs yet" for every recent sandbox. Reported by a customer; confirmed
platform-wide. Axiom ingest into oc-sandbox-logs collapsed from millions of
events/day in May to ~25/day after June 2.

Worker platform logs (Vector → oc-platform-logs) died at the same time, for
the same reason.

Root cause

A parallel-branch semantic merge conflict, invisible to git:

  • Apr 30 (02b3e65): the WorkerSpec refactor introduced
    compute.WorkerSpec + BuildWorkerEnv() and switched the Azure pool's
    cloud-init to build worker.env from it. Axiom logship didn't exist yet, so
    the spec didn't know about it. The old inline env template in
    cmd/server/main.go was left in place — no longer consumed.
  • May 5 (d2df72d): logship shipped on main and added
    AXIOM_INGEST_TOKEN / AXIOM_DATASET to that inline template — which was
    still the live path on main.
  • May 14 (36ce1c9): the branches merged with no textual conflict. The
    merge silently flipped provisioning to BuildWorkerEnv (no Axiom vars),
    leaving the AXIOM lines in main.go as dead code. Even the
    "WARNING: empty AXIOM_INGEST_TOKEN" guard is in the dead path.

The bug stayed latent because worker env is baked at VM creation: existing
workers kept their old (correct) env files. Each fleet rotation converted more
of the fleet — partial drop May 27, cliff June 2, finished June 8. Every
current prod worker now logs ConfigureLogship skipped: no axiom ingest token set on worker for every sandbox.

Three vars were dropped, three casualties:

Dropped var Casualty
AXIOM_INGEST_TOKEN / AXIOM_DATASET Sandbox session logs (dashboard Logs panel, oc logs)
OPENSANDBOX_AZURE_KEY_VAULT_NAME populate-vector-env.service can't fetch tokens → Vector platform logs/metrics empty
(SECRETS_VAULT_NAME, same value) Worker-side KV self-load (LoadSecretsFromKeyVault) is a no-op — the entire worker-*/shared-* half of kvMapping is dead on workers

Fix

Add AxiomIngestToken, AxiomDataset, KeyVaultName to WorkerSpec;
populate them from cfg in cmd/server/main.go; emit AXIOM_INGEST_TOKEN,
AXIOM_DATASET, OPENSANDBOX_AZURE_KEY_VAULT_NAME, and SECRETS_VAULT_NAME
in BuildWorkerEnv. 18 lines.

Restoring SECRETS_VAULT_NAME re-activates the worker's KV self-load, which is
the architecture keyvault.go documents (env file = bootstrap pointers only;
KV = cell config). Env-file values win over KV, so baked vars are unaffected;
the net-new loads on prod are worker-global-blob-* (makes the missing-rootfs
fallback functional — verified all hot archive/rebase paths use the cell-local
checkpointStore and were never broken) and worker-sentry-dsn.

Verification (dev)

  • Deployed CP, deleted a worker, scaler provisioned replacements
  • New worker.env contains all 4 vars; worker logs show KV in use at runtime
  • Created a sandbox, ran a command: exec_stdout, exec_stderr, and var_log
    events all landed in Axiom and render via oc logs / the dashboard endpoint
  • internal/compute tests pass; touched packages build clean

Rollout

  1. Deploy CP; confirm startup line workers spawned by this server will ship sandbox session logs to Axiom (dataset=oc-sandbox-logs)
  2. Cycle the 3 prod workers one at a time (env is baked at VM create — existing
    workers don't heal). On each replacement verify worker.env has the AXIOM
    lines, no ConfigureLogship skipped in journal, and vector.env tokens are
    non-empty
  3. Watch oc-sandbox-logs daily ingest — should return to ~10⁵–10⁶ events/day

Follow-ups (separate PRs)

  • Delete the dead inline env template + unused WorkerEnvBase64 from
    cmd/server/main.go / internal/compute/azure.go — the two-templates trap
    is what made this possible
  • Seed the 7 shared-axiom-* secrets into Infisical /shared before the
    Infisical→KV sync flips to push-authoritative, or this recurs at the flip

@breardon2011 breardon2011 changed the title restore Axiom + Key Vault env vars dropped from worker provisioning b… Fix logs Jun 9, 2026
@breardon2011 breardon2011 marked this pull request as ready for review June 9, 2026 20:08
@breardon2011 breardon2011 merged commit 44c4cde into main Jun 10, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants