Skip to content

feat(infra): install bwrap AppArmor profile + bubblewrap on runner EC2#725

Draft
G4614 wants to merge 1 commit into
boxlite-ai:mainfrom
G4614:feat/infra-runner-bwrap-apparmor
Draft

feat(infra): install bwrap AppArmor profile + bubblewrap on runner EC2#725
G4614 wants to merge 1 commit into
boxlite-ai:mainfrom
G4614:feat/infra-runner-bwrap-apparmor

Conversation

@G4614

@G4614 G4614 commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Bug

The runner EC2 AMI is Ubuntu 24.04 (noble). Ubuntu 24.04 ships kernel.apparmor_restrict_unprivileged_userns=1 but does NOT ship the bwrap-userns-restrict AppArmor profile that Ubuntu 25.04+ includes. Without that profile, bwrap (used by BoxLite's jailer for sandbox isolation — SecurityOptions::strict) is DENIED the userns capability and every box fails:

Timeout waiting for guest ready (30s)
VM subprocess exited before guest became ready
dmesg: apparmor="DENIED" ... comm="bwrap" capability=8

So BoxLite's security option (SecurityOptions::strict, default for prod) is unusable on every freshly-provisioned runner — boxes only start if you SecurityOptions::development() (disable jailer) or manually patch the host.

Fix

apps/infra/sst.config.ts user-data — two additions to buildRunnerUserData():

  1. apt-get install -y bubblewrap so /usr/bin/bwrap exists. The AppArmor profile is scoped to that exact path; the runner's fallback bundled-bwrap (from bubblewrap-sys) extracts to an arbitrary cache path that the profile can't match.
  2. Write /etc/apparmor.d/bwrap-userns-restrict with the same profile Ubuntu 25.04+ ships (bwrap + unpriv_bwrap), then apparmor_parser -r to load it.

The global kernel restriction (apparmor_restrict_unprivileged_userns=1) stays on — only /usr/bin/bwrap gets the userns + capability allowances it needs. This is docs/faq.md "Ubuntu 24.04: Timeout waiting for guest ready" §Fix A (the targeted, recommended option).

Why not the alternatives:

Option Why not
sysctl -w kernel.apparmor_restrict_unprivileged_userns=0 (FAQ Fix B) Disables userns AppArmor protection host-wide — worse posture than the targeted profile that only allows it for /usr/bin/bwrap.
SecurityOptions::development() (FAQ Fix C) Turns OFF the jailer. Acceptable for local dev, not for the production runner this user-data provisions.
Bake into the AMI This stack pins a generic Canonical-owned AMI by name pattern; baking a custom AMI couples infra to image-build pipeline. The fix is one cloud-init block.

Test plan — manual deploy verification

Can't unit-test cloud-init from the repo. Verify on next pulumi up:

  • pulumi up on a stack with RUNNERS=1 (or more) brings up Runner EC2(s) with the new user-data.
  • ssh runnercat /var/log/runner-setup.log shows the AppArmor profile write + apparmor_parser -r exit 0.
  • aa-status | grep bwrap shows bwrap (the new profile) loaded.
  • Create a box via the API (any default-strict security profile). It reaches "guest ready" without the 30s timeout.
  • dmesg | grep apparmor | grep bwrap no longer shows DENIED lines for comm="bwrap" capability=8.

The user-data only re-runs on instance replacement; the Runner's ignoreChanges: ["userDataBase64"] is intentional (in-place upgrades go through scripts/deploy/runner-update-binary.sh). So this change lands on the NEXT runner replacement — for existing runners, the same AppArmor profile + bwrap install must be applied via SSM Run Command. Suggested follow-up: a one-shot SSM doc that idempotently applies the same tee + apparmor_parser block on existing runners.

🤖 Generated with Claude Code

The runner AMI is Ubuntu 24.04 (noble), which ships
kernel.apparmor_restrict_unprivileged_userns=1 but does NOT ship the
`bwrap-userns-restrict` AppArmor profile that Ubuntu 25.04+ includes.
Without that profile bwrap — used by BoxLite's jailer for sandbox
isolation (the SecurityOptions::strict path) — is DENIED the userns
capability and every box fails with "Timeout waiting for guest ready /
VM subprocess exited before guest became ready". The runner user-data
now does two things to support BoxLite's security option on this host:

1. apt-get install bubblewrap so /usr/bin/bwrap exists (the AppArmor
   profile is scoped to that path; bundled bwrap from bubblewrap-sys
   would land at an arbitrary cache path that the profile can't match).
2. Write /etc/apparmor.d/bwrap-userns-restrict with the same profile
   Ubuntu 25.04+ ships, then apparmor_parser -r to load it. The kernel
   restriction stays on globally — only /usr/bin/bwrap gets the
   userns + capability allowances it needs.

Reference: docs/faq.md "Ubuntu 24.04: Timeout waiting for guest ready"
§Fix A (Option A — targeted, recommended).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 0d122093-12ea-4887-98b8-add797b6b929

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment thread apps/infra/sst.config.ts
apt-get install -y /tmp/mount-s3.deb
rm -f /tmp/mount-s3.deb

# Ubuntu 24.04 ships kernel.apparmor_restrict_unprivileged_userns=1 but does NOT

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to install this profile when we install bubblewrap from apt?

DorianZheng added a commit that referenced this pull request Jun 10, 2026
…follow-up) (#726)

## What

Regenerates the committed API clients against the post-#715 (merged A2 +
MVP) API surface — the follow-up that #715's merge commit explicitly
deferred:

> ⚠️ **CI will be red until generated clients are regenerated** against
the merged API surface … **Generated clients now carry ZERO diff in this
PR** (reset to main in `f9ea0730`) — regenerate upstream against the
merged API surface.

Since that merge, the **API client drift** check fails on every PR
touching `apps/**` (e.g. #725's run 8 minutes after the merge). This PR
turns it green again.

## Content

**Commit 2 — the regen (`apps/libs/api-client`, `apps/api-client-go`,
231 files).** Pure `openapi-generator` 7.23.0 output, zero hand edits,
produced with the exact `api-client-drift.yml` recipe (pinned generator
via `openapitools.json`, NestJS spec boot with local Redis, GNU sed for
the postprocess script). `analytics-api-client` and `toolbox-api-client`
regenerated to **zero diff** (already current since #721/#723).

Surface delta (mirrors the A2 + MVP API changes):
- **removed:** snapshots / docker-registry / build / backup /
archive-lifecycle / quota / usage-overview endpoints and models;
`BoxState` build states (`pending_build`, `build_failed`, …);
`write:snapshots` + `delete:snapshots` permission values;
`listBoxesPaginated`'s `snapshots` filter param
- **added:** `SystemRole`, `UpdateOrganizationName` (+ `PATCH
/organizations/{organizationId}/name`), admin overview/observability
models

**Commit 1 — prek lint unblock (34 deleted lines, dashboard).** The
Sandbox→Box rename left `LEGACY_*` route enum members byte-identical to
the canonical ones — 4 pre-existing
`@typescript-eslint/no-duplicate-enum-values` errors at HEAD that fail
the repo's prek pre-commit hook (`make lint:fix`) for *every* local
commit. The legacy routes are unreachable (identical paths, canonical
registrations precede them), so this deletes them plus the orphaned
`LegacyBoxRedirect`. No behavior change. Included here because nothing
can be committed locally until it lands.

## Verification

- `go build ./...` passes in `apps/api-client-go` (standalone),
`apps/common-go`, `apps/otel-collector/exporter`.
- The **API client drift** check on this PR is the canonical
byte-for-byte proof.

## Known follow-up (intentionally split)

Per review preference, this PR is generated code only. Three consumers
still reference removed APIs and will not compile against the new
clients until the prepared follow-up PR lands (branched on top of this
one):

- `apps/cli` — Dockerfile-build flow (`CreateBuildInfo`,
`BOXSTATE_BUILD_FAILED`/`PENDING_BUILD`, `--dockerfile`/`--context`, MCP
`buildInfo` arg, `pkg/minio`)
- `apps/dashboard` — Registries page + registry hooks, usage-overview
wiring in Spending/Limits, `templates` filter arg
- `apps/libs/sdk-typescript` —
`Box.buildInfo`/`backupState`/`backupCreatedAt`, `getBuildLogsUrl`

No CI workflow compiles these consumers on PR today (the drift check is
the only `apps/**` gate), so this PR is green-mergeable; the follow-up
restores local builds. Note `apps/runner` has a **pre-existing**
unrelated compile failure on main (`boxlite.WithPort` undefined in
`pkg/boxlite`) — out of scope here.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants