Skip to content

Epic: Brev CI launchables — pre-baked environments for E2E test reliability #1326

@jyaunches

Description

@jyaunches

Summary

Create purpose-built Brev launchables for CI/CD E2E testing that push as much setup as possible into the pre-baked VM image, reducing per-run setup time and eliminating the timeouts and flakiness that come from installing Docker, pulling images, and building deps on every CI run.

Motivation

The current E2E Brev workflow (e2e-brev.yaml + brev-e2e.test.js) has been brittle because every run bootstraps a bare VM from scratch — installing Docker, Node.js, OpenShell CLI, cloning repos, pulling multi-GB Docker images, and building the sandbox. This 10-15 minute setup window is where most CI failures occur (apt mirror timeouts, Docker pull rate limits, npm registry hiccups).

We previously had a Brev launchable (launch-nemoclaw.sh from OpenShell-Community) that pre-installed the foundational dependencies, but it was designed for interactive developers (code-server + VS Code theming) rather than CI. We removed it when the readiness detection was unreliable, not realizing the pre-baked setup was actually saving significant time.

Approach

Create CI-focused launchables (private, under Nemoclaw CI/CD org) with startup scripts that:

  1. Pre-install all system deps (Docker, Node.js, OpenShell CLI)
  2. Pre-pull Docker images (sandbox-base, openshell/cluster, node:22-slim)
  3. Pre-install npm dependencies and build the TypeScript plugin
  4. Use a reliable sentinel file for readiness detection
  5. Skip code-server and other interactive-only tooling

Flavors

  • Flavor 1: CI-Ready CPU — For PR E2E tests (credential-sanitization, telegram-injection, full E2E). Saves ~5-10 min/run.
  • Flavor 2: CI-Ready GPU — For Ollama/local inference tests. Adds NVIDIA Container Toolkit + Ollama.
  • Flavor 3: Full-Sandbox-Ready — For nightly runs. Pre-builds the sandbox Docker image.

Success Criteria

  • Flavor 1 launchable created and validated with e2e-brev.yaml workflow
  • brev-e2e.test.js updated to use launchable with reliable readiness detection
  • Average E2E bootstrap time reduced from ~12 min to ~4 min
  • Flavor 2 launchable created for GPU tests
  • Flavor 3 launchable created for nightly runs

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    area: ciCI workflows, checks, release automation, or GitHub Actionsarea: e2eEnd-to-end tests, nightly failures, or validation infrastructureneeds: unblockBlocked item needs dependency or decision resolved
    No fields configured for Enhancement.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions