Self-hosted VM provisioning portal for Proxmox VE clusters, with optional public SSH tunnels and a GPU job plane.
A user signs in, fills out one form (hostname, tier, OS, SSH key) and ~60 seconds later has a freshly-provisioned Linux VM with a static IP, SSH credentials, and — optionally — a public SSH endpoint exposed via Gopher. Like launching an EC2 instance, but on hardware you control.
A single Go binary embeds the React SPA, talks to Proxmox over its REST API, and stores everything it needs in one local SQLite file. No Postgres, Redis, or external infrastructure — one binary, one SQLite file is an architectural invariant we defend on purpose.
Originally built for ACM@UCLA's internal cluster. A static UI mockup of the full multi-phase product is at nimbusx.html; the longer design doc is nimbus-design-v03.docx.
| Phase | Status | What you get |
|---|---|---|
| 1 — VM provisioning | shipping | Web portal, 4 OS templates, atomic IP allocation, live cluster scoring, BYO or generated SSH keys |
| Auth | shipping | Email + password, GitHub OAuth, Google OAuth, access-code gate, admin/member roles |
| 2 — Public SSH tunnels | shipping | Optional Gopher integration — checkbox at provision time exposes the VM at <host>:<port> |
| 3 — Per-user S3 | planned | MinIO-backed object storage |
| 4 — GPU plane | shipping | Single GX10 box runs vLLM + a FIFO job worker; every VM gets OPENAI_BASE_URL and a gx10 CLI helper |
VM deletion is the notable gap — it's not yet a first-class action in the UI.
Two services run on a single GX10 (or any aarch64 NVIDIA box on the LAN):
- vLLM inference server — always-on, OpenAI-compatible. Every Nimbus VM
receives
OPENAI_BASE_URL+ agx10CLI helper at provision time. - Job worker — pulls queued training jobs from Nimbus, runs each in
docker run --gpus all, streams logs back. One job at a time, FIFO.
No GPU is ever attached to a VM. The GX10 is not a Proxmox node. It's a peer service Nimbus hands out via env vars + an HTTP API.
- Provision a GX10 with Docker + the NVIDIA Container Toolkit.
- In Nimbus: Settings → GX10 GPU → click Add GX10, copy the
one-line
curl ... | sudo bashcommand. - SSH to the GX10, paste the curl. Two systemd units come up:
nimbus-vllmandnimbus-gpu-worker. - From any Nimbus VM:
curl $OPENAI_BASE_URL/v1/models # inference gx10 submit pytorch/pytorch:latest -- python train.py # training
Production builds run make build, which depends on make gx10-bundle —
that target cross-compiles the ARM64 worker and stages it (plus the
install scripts and demo) into cmd/server/gx10-assets/ for //go:embed,
making the deployed nimbus binary self-contained.
One-liner for any Linux host (Proxmox node or external VM). Downloads the latest release binary to /usr/local/bin/nimbus and verifies its SHA256:
curl -fsSL https://raw.githubusercontent.com/smalex-z/nimbus/main/scripts/quickinstall.sh | bash
# While only pre-releases (alphas/betas) exist, add --prerelease:
curl -fsSL https://raw.githubusercontent.com/smalex-z/nimbus/main/scripts/quickinstall.sh | bash -s -- --prerelease
# Then install as a systemd service (re-execs with sudo if needed):
sudo nimbus install
# Add --upgrade to swap the binary in place without touching config or systemd unitSupported: Linux amd64, Linux arm64.
For development builds or unreleased changes:
./scripts/install-deps.sh # first-time only: installs Node.js + Go via apt if missing
./scripts/build.sh # cross-compiles for the host arch
sudo ./nimbus install # writes systemd unit + sudoers rule, starts the serviceinstall-deps.sh is apt-only (Ubuntu/Debian); on other distros install Node.js 18+ and Go 1.25+ manually.
The wizard auto-detects when run on a Proxmox host and defaults PROXMOX_HOST=https://localhost:8006. On a guest VM, the web wizard prompts for the cluster's address.
After install, open http://<host-ip>:8080 and complete the web setup wizard — it collects Proxmox credentials, IP pool range, gateway, and (optionally) Gopher tunnel credentials, then offers to bootstrap templates on every online node before your first provision.
git clone https://github.com/smalex-z/nimbus.git
cd nimbus
# Installs Node.js + Go via apt if missing or below minimum. Idempotent.
# Apt-based distros only (Ubuntu/Debian).
./scripts/install-deps.sh
cp .env.example .env
$EDITOR .env
make dev
# Frontend hot reload: http://localhost:5173
# Backend API: http://localhost:8080Requirements: Go 1.25+ (matches go.mod), Node.js 18+. install-deps.sh only handles those two — install golangci-lint separately if you want to run make lint (go install github.com/golangci/golangci-lint/v2/cmd/golangci-lint@v2.11.4).
sudo nimbus install --upgrade |
Replace binary, restart service. Config and systemd unit untouched. |
./scripts/reinstall.sh |
Local-source equivalent: build + hot-swap. Add --clean to wipe DB and config first. |
./scripts/uninstall.sh |
Remove systemd unit, binary, sudoers rule, service user. Prompts before deleting DB and config (or pass --yes). |
Nimbus requires sign-in. On a fresh install:
- The first user to register is auto-promoted to admin (a startup migration also re-checks this on every boot — it's safe if you delete the admin user).
- Optional: enable an access code in Settings → Access Code. When enabled, every non-admin user must enter the rotating code on the Verify page before any protected route works. Bumping the version invalidates everyone in one click — useful for end-of-quarter rotations.
- Optional: enable OAuth providers in Settings → Authentication:
- GitHub — set Client ID + Secret + (optionally) restrict to specific orgs.
- Google — set Client ID + Secret + (optionally) restrict to specific authorized domains.
OAuth credentials live in the DB, not env vars; you can rotate them from the UI without restarting. Sessions are cookie-based.
Nimbus needs cloud-image templates on the cluster to provision VMs (the standard "clone an image, customize via cloud-init" pattern). The setup wizard handles this automatically — you don't have to SSH into nodes or run qm commands.
After the wizard configures and starts the service, it offers:
Bootstrap templates now? [Y/n]:
Pressing Enter downloads the cloud images for all four OSes on every online cluster node and converts them to Proxmox templates. ~2 GB per node, parallel across nodes; ~10–20 min total on a typical home lab.
Templates created:
| OS | VMID |
|---|---|
| Ubuntu 24.04 LTS | 9000 |
| Ubuntu 22.04 LTS | 9001 |
| Debian 12 | 9002 |
| Debian 11 | 9003 |
Each template has cloud-init pre-installed, qemu-guest-agent ready, a cloud-init drive attached, and is marked immutable. Re-running is idempotent — already-existing templates are skipped.
Manual bootstrap (re-runs, adding a new node, etc.) — same code path either way:
# CLI
./nimbus bootstrap # all 4 OSes, all online nodes
./nimbus bootstrap --os ubuntu-24.04 # one OS, all online nodes
./nimbus bootstrap --node motik7 # all OSes, one node
./nimbus bootstrap --force # re-create even if exists
# HTTP (admin-only; same args)
curl -X POST http://localhost:8080/api/admin/bootstrap-templates -d '{}'In the Proxmox UI: Datacenter → Permissions → API Tokens → Add
- User:
root@pam(or a dedicated service account) - Token ID:
nimbus - Privilege Separation: uncheck this — otherwise the token has no permissions
- Expire: never (for development)
Copy the secret value immediately — Proxmox shows it only once. The combined token ID for the env file is root@pam!nimbus.
Gopher is ACM@UCLA's reverse-tunnel gateway (rathole + Caddy). When Gopher credentials are configured, the Provision form gains an Expose SSH publicly checkbox; tick it to register a Gopher machine alongside the VM and bootstrap the rathole client at provision time. The result screen and the My machines page then show both LAN and WAN SSH commands.
Configure under Settings → Authentication → Gopher tunnels (admin-only). The integration is fully optional — without credentials, every tunnel code path silently no-ops and the checkbox is disabled.
Two services run on a single GX10 (or any aarch64 NVIDIA box):
- vLLM inference server — always-on, OpenAI-compatible. Every Nimbus VM gets
OPENAI_BASE_URLand agx10CLI helper injected at provision time. - Job worker — pulls queued training jobs from Nimbus, runs each in
docker run --gpus all, streams logs back. One job at a time, FIFO.
No GPU is ever attached to a VM. The GX10 is not a Proxmox node — it's a peer service Nimbus hands out via env vars + an HTTP API.
- Provision a GX10 with Docker + the NVIDIA Container Toolkit. Verify with
docker run --rm --gpus all nvidia/cuda:12.4.0-base-ubuntu22.04 nvidia-smi. - In Nimbus: Settings → GPU → Add GX10. Set base URL + model. The page mints a one-line
curlcommand with a freshly minted pairing token. - SSH to the GX10 and paste the curl. Two systemd units come up:
nimbus-vllmandnimbus-gpu-worker. - From any provisioned VM:
curl $OPENAI_BASE_URL/v1/models # inference gx10 submit pytorch/pytorch:latest -- python train.py # training
Build the worker binary before tagging a release: make gx10-worker (cross-compiles ARM64 into scripts/gx10/gx10-worker, where the install script fetches it from). See scripts/gx10/README.md for more.
Set via environment variables — typically loaded from /etc/nimbus/nimbus.env (production) or ./.env (development). See .env.example.
OAuth, Gopher, and GPU credentials can be set via env vars for first-boot seeding, but the Settings UI is the authoritative editor afterwards — changes there take effect without restart.
| Variable | Description |
|---|---|
PROXMOX_HOST |
Proxmox API base URL, e.g. https://localhost:8006 |
PROXMOX_TOKEN_ID |
Format: user@realm!tokenname |
PROXMOX_TOKEN_SECRET |
UUID shown once when creating the token |
IP_POOL_START |
First IP in the VM pool |
IP_POOL_END |
Last IP in the VM pool |
GATEWAY_IP |
LAN gateway IP for cloud-init network config |
| Variable | Default | Description |
|---|---|---|
PROXMOX_TEMPLATE_BASE_VMID |
9000 |
Base template VMID; +0..+3 per OS |
NIMBUS_EXCLUDED_NODES |
— | Comma-separated nodes to skip in scoring |
VM_CPU_TYPE |
x86-64-v3 |
CPU type passed to Proxmox at clone time |
NAMESERVER |
1.1.1.1 8.8.8.8 |
Cloud-init DNS servers (space-separated) |
SEARCH_DOMAIN |
local |
Cloud-init DNS search domain |
NIMBUS_VM_DISK_STORAGE |
local-lvm |
Storage pool the disk gate checks for free space; empty disables the gate |
NIMBUS_MEM_BUFFER_MIB |
256 |
RAM headroom required above the tier's request |
NIMBUS_CPU_LOAD_FACTOR |
0.5 |
Share of a fresh VM's vCPUs the soft score assumes consumed (range 0.25–1.0) |
| Variable | Default | Description |
|---|---|---|
RECONCILE_INTERVAL_SECONDS |
60 |
Background reconcile cadence; 0 disables |
RESERVATION_TTL_SECONDS |
600 |
Stale-reservation cutoff (10 min) |
VERIFY_CACHE_TTL_SECONDS |
5 |
How long ListClusterIPs snapshot is reused |
VACATE_MISS_THRESHOLD |
3 |
Consecutive missing reconciles before auto-vacating |
| Variable | Default | Description |
|---|---|---|
NIMBUS_NETSCAN_MODE |
both |
off / tcp / both |
NIMBUS_NETSCAN_INTERVAL_SECONDS |
300 |
Sweep cadence; 0 disables |
NIMBUS_NETSCAN_TIMEOUT_MS |
200 |
Per-port TCP dial timeout |
NIMBUS_NETSCAN_CONCURRENCY |
50 |
Parallel probes per sweep |
| Variable | Default | Description |
|---|---|---|
PORT |
8080 |
HTTP server port |
DB_PATH |
./nimbus.db |
SQLite database path |
APP_URL |
http://localhost:5173 |
Public URL — used as the OAuth redirect base |
CORS_ORIGIN |
* |
CORS allowed origin |
APP_ENV |
production |
Environment label |
GITHUB_CLIENT_ID / GITHUB_CLIENT_SECRET |
— | First-boot seed for GitHub OAuth (DB takes over) |
GOOGLE_CLIENT_ID / GOOGLE_CLIENT_SECRET |
— | First-boot seed for Google OAuth |
GOPHER_API_URL / GOPHER_API_KEY |
— | First-boot seed for Gopher tunnels |
Routes are mounted under /api. The full list lives in internal/api/router.go; the highlights:
| Group | Endpoints |
|---|---|
| Health | GET /health, GET /tunnels/info |
| Auth | POST /auth/{register,login,logout}, GET /auth/{github,google}{,/callback}, GET /auth/providers, GET /me |
| Access code | GET /access-code/status, POST /access-code/verify |
| VMs | GET/POST /vms, GET /vms/{id}, GET /vms/{id}/private-key, GET/POST /vms/{id}/tunnels, DELETE /vms/{id}/tunnels/{tid} |
| SSH keys | GET/POST /keys, GET/DELETE /keys/{id}, GET/POST /keys/{id}/private-key, POST /keys/{id}/default |
| GPU (user) | GET /gpu/inference, GET/POST /gpu/jobs, GET /gpu/jobs/{id}, POST /gpu/jobs/{id}/cancel |
| GPU (worker, bearer-token) | POST /gpu/worker/{claim, jobs/{id}/logs, jobs/{id}/status} |
| Admin observability | GET /nodes, GET /ips, GET /cluster/{vms,stats}, POST /ips/reconcile |
| Admin templates | GET /admin/bootstrap-status, POST /admin/bootstrap-templates |
| Admin settings | GET/PUT /settings/{oauth,gopher,gpu,access-code,google-domains,github-orgs} |
Provision request:
{
"hostname": "my-project",
"tier": "medium",
"os_template": "ubuntu-24.04",
"generate_key": true,
"public_tunnel": false
}Provision response (200):
{
"success": true,
"data": {
"vmid": 342,
"hostname": "my-project",
"ip": "192.168.0.142",
"username": "ubuntu",
"os": "ubuntu-24.04",
"tier": "medium",
"node": "motik7",
"ssh_private_key": "-----BEGIN OPENSSH PRIVATE KEY-----\n...",
"key_name": "nimbus-my-project"
}
}nimbus/
├── cmd/
│ ├── server/ Go entry point — embeds frontend/dist
│ └── gx10-worker/ ARM64 GX10 worker daemon (Phase 4)
├── frontend/ React 18 + TS + Vite + Tailwind SPA
├── internal/
│ ├── api/ Chi router, middleware, handlers
│ ├── bootstrap/ Cloud-image download + template creation
│ ├── config/ Env-based config + .env loader
│ ├── ctxutil/ Request-context helpers (current user, …)
│ ├── db/ GORM models, SQLite single-writer setup
│ ├── errors/ Typed error sentinels (ValidationError, ConflictError, …)
│ ├── gpu/ GX10 job queue + log storage (Phase 4)
│ ├── install/ Installer + setup-wizard mode
│ ├── ippool/ Atomic IP allocation + Proxmox reconciliation
│ ├── netscan/ LAN-host probe — fills the IP-pool with non-VM holders
│ ├── nodescore/ Pure node scoring (60% mem free, 40% cpu free)
│ ├── oauth/ GitHub + Google OAuth flows
│ ├── provision/ 9-step VM lifecycle orchestrator
│ ├── proxmox/ Proxmox REST client (form-encoded, self-signed TLS)
│ ├── secrets/ AES-GCM encryption helpers (vault-stored keys)
│ ├── service/ Auth service (sessions, password hashing)
│ ├── sshkeys/ Per-user SSH key vault
│ └── tunnel/ Gopher reverse-tunnel HTTP client (Phase 2)
├── scripts/ build / dev / install / quickinstall / reinstall / uninstall
│ └── gx10/ Inference + worker installers (Phase 4)
└── .github/workflows/ build, test, lint, release
| Command | Description |
|---|---|
make dev |
Backend (:8080) + frontend dev server (:5173) with hot reload |
make build |
Production binary with embedded frontend |
make test |
go test ./... + npm run type-check |
make lint |
golangci-lint run + npm run lint |
make gx10-worker |
Cross-compile the GX10 worker for linux/arm64 into scripts/gx10/ |
make clean |
Remove build artifacts |
make gx10-worker # don't forget — install scripts fetch this binary
git tag v0.1.0-alpha.1
git push --tagsGitHub Actions builds Linux amd64/arm64 + macOS amd64 binaries and creates a release.