Fix logs#362
Merged
Merged
Conversation
…y WorkerSpec refactor
motatoes
approved these changes
Jun 9, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Restore Axiom + Key Vault env vars in worker provisioning
What broke
Sandbox logs stopped showing up in the dashboard and
oc logs— the Logs panelshows "No logs yet" for every recent sandbox. Reported by a customer; confirmed
platform-wide. Axiom ingest into
oc-sandbox-logscollapsed from millions ofevents/day in May to ~25/day after June 2.
Worker platform logs (Vector →
oc-platform-logs) died at the same time, forthe same reason.
Root cause
A parallel-branch semantic merge conflict, invisible to git:
02b3e65): the WorkerSpec refactor introducedcompute.WorkerSpec+BuildWorkerEnv()and switched the Azure pool'scloud-init to build
worker.envfrom it. Axiom logship didn't exist yet, sothe spec didn't know about it. The old inline env template in
cmd/server/main.gowas left in place — no longer consumed.d2df72d): logship shipped on main and addedAXIOM_INGEST_TOKEN/AXIOM_DATASETto that inline template — which wasstill the live path on main.
36ce1c9): the branches merged with no textual conflict. Themerge silently flipped provisioning to
BuildWorkerEnv(no Axiom vars),leaving the AXIOM lines in
main.goas dead code. Even the"WARNING: empty AXIOM_INGEST_TOKEN" guard is in the dead path.
The bug stayed latent because worker env is baked at VM creation: existing
workers kept their old (correct) env files. Each fleet rotation converted more
of the fleet — partial drop May 27, cliff June 2, finished June 8. Every
current prod worker now logs
ConfigureLogship skipped: no axiom ingest token set on workerfor every sandbox.Three vars were dropped, three casualties:
AXIOM_INGEST_TOKEN/AXIOM_DATASEToc logs)OPENSANDBOX_AZURE_KEY_VAULT_NAMEpopulate-vector-env.servicecan't fetch tokens → Vector platform logs/metrics emptySECRETS_VAULT_NAME, same value)LoadSecretsFromKeyVault) is a no-op — the entireworker-*/shared-*half ofkvMappingis dead on workersFix
Add
AxiomIngestToken,AxiomDataset,KeyVaultNametoWorkerSpec;populate them from cfg in
cmd/server/main.go; emitAXIOM_INGEST_TOKEN,AXIOM_DATASET,OPENSANDBOX_AZURE_KEY_VAULT_NAME, andSECRETS_VAULT_NAMEin
BuildWorkerEnv. 18 lines.Restoring
SECRETS_VAULT_NAMEre-activates the worker's KV self-load, which isthe architecture
keyvault.godocuments (env file = bootstrap pointers only;KV = cell config). Env-file values win over KV, so baked vars are unaffected;
the net-new loads on prod are
worker-global-blob-*(makes the missing-rootfsfallback functional — verified all hot archive/rebase paths use the cell-local
checkpointStoreand were never broken) andworker-sentry-dsn.Verification (dev)
exec_stdout,exec_stderr, andvar_logevents all landed in Axiom and render via
oc logs/ the dashboard endpointinternal/computetests pass; touched packages build cleanRollout
workers spawned by this server will ship sandbox session logs to Axiom (dataset=oc-sandbox-logs)workers don't heal). On each replacement verify worker.env has the AXIOM
lines, no
ConfigureLogship skippedin journal, and vector.env tokens arenon-empty
oc-sandbox-logsdaily ingest — should return to ~10⁵–10⁶ events/dayFollow-ups (separate PRs)
WorkerEnvBase64fromcmd/server/main.go/internal/compute/azure.go— the two-templates trapis what made this possible
shared-axiom-*secrets into Infisical/sharedbefore theInfisical→KV sync flips to push-authoritative, or this recurs at the flip