Skip to content

Tags: cerc-io/stack-orchestrator

Tags

v1.1.0-3c6d2f7-202606100934

Toggle v1.1.0-3c6d2f7-202606100934's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Native WebSocket support for k8s http-proxy routes (#758)

- websocket: true on an http-proxy route now works: same public URL serves HTTP and WebSocket, split by Upgrade header — the Solana RPC convention (wss:// at the same address as https://)
- Since the k8s Ingress API can't express header-based routing, SO generates a small Caddy mux ({deployment}-ws-mux) per deployment that needs it, and points the Ingress at it. Image overridable via a ws-mux-image spec key
- Conflicting routes (two plain routes on one host+path) now fail deploy create loudly instead of silently emitting duplicate Ingress paths
- Ingresses without TLS (kind) get disable-ssl-redirect — previously every HTTP route 308'd into a cert-less HTTPS endpoint
- Verified end-to-end on kind, over both HTTP and TLS: 101 handshake + echo round-trip through controller → mux, lowercase-header clients, controller restarts, redeploys, and down cleanup. No protocol forcing needed — this replaces the in-memory admin-API patch that broke on every Caddy restart
- 27 new unit tests; fixture stack (test-websocket) included for re-verification

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

v1.1.0-a181281-202606050713

Toggle v1.1.0-a181281-202606050713's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Support a label to allow re-running completed k8s Jobs on deploy (#757)

- Adds support for a `laconic.recreate-job` compose label. Any service carrying it gets the same annotation stamped onto its generated k8s Job
- On deploy, a Job marked for recreate is deleted (cascading to its pods) and recreated instead of erroring because the Job already exists
- The delete blocks until the old Job is fully gone (bounded by a timeout) before recreating, so each run starts clean
- Adds unit tests for the label -> annotation propagation and the delete-then-recreate path

Co-authored-by: pranav <pranav@deepstacksoft.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

v1.1.0-fe77540-202605271117

Toggle v1.1.0-fe77540-202605271117's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
fix(_copy_hooks): copy entire deploy/ dir to hooks/, not just command…

…s.py (#755)

- `_copy_hooks` was hardcoded to only copy `commands.py` into the deployment's `hooks/` dir
- Any sibling file read via `Path(__file__).parent` at `start()` time would silently not exist
- Now copies all files from `deploy/`; the multi-plugin `commands_N.py` indexing is preserved

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

v1.1.0-55b2e8e-202605251108

Toggle v1.1.0-55b2e8e-202605251108's commit message
host-metrics: emit ZFS fields as uint64 to avoid int64 overflow

The telegraf zfs input was parsing kstat fields as int64. Several ZFS
counters (notably fm.erpt-*, dmu_tx_* and certain pool io_* values)
are native uint64 and routinely exceed 2^63, producing repeated
errors like:

  E! [inputs.zfs] Error in plugin: strconv.ParseInt: parsing
     "18446740553841588499": value out of range

at every collection interval, with no ZFS metrics published. Observed
on gorbagana with telegraf 1.36.2 against OpenZFS on Linux.

Telegraf added useNativeTypes in PR #17617 (1.36.3) specifically to
fix this class of bug by emitting uint64-typed fields per ZFS's
definition. The InfluxDB 1.x output plugin accepts uint64 via the
line-protocol u-suffix, so the existing monitoring stack consumes
these unchanged.

1.36.3 also introduced a regression panic in processProcFile (issue
#17952), so pin to 1.36.4 which fixes it via PR #17953.

Changes:
- stack_orchestrator/data/config/host-metrics/scripts/telegraf-entrypoint.sh:
  emit `useNativeTypes = true` alongside `poolMetrics = true` in the
  rendered [[inputs.zfs]] block when COLLECT_ZFS=true.
- stack_orchestrator/data/compose/docker-compose-host-metrics.yml:
  pin telegraf image to 1.36.4 (previously the floating 1.36 tag).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

v1.1.0-ccbfde0-202605220903

Toggle v1.1.0-ccbfde0-202605220903's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
feat(secrets): create user-declared k8s Secrets from spec env/file so…

…urces (#754)

- Extends `spec.secrets` to accept a keyed-dict form where each entry declares an env or file source; SO resolves and creates one Opaque k8s Secret per entry at deploy start
- Legacy list form (operator-managed, reference-only) is preserved unchanged
- Unit tests cover env/file happy paths, missing-source errors, legacy list skip, and 409 idempotency

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

v1.1.0-b3e9366-202605111309

Toggle v1.1.0-b3e9366-202605111309's commit message
host-metrics: rename telegraf service to host-telegraf

gorchain-monitoring also ships a `telegraf` service (synthetic RPC +
HTTP probes); running both on the same host produced two containers
named `laconic-<hash>-telegraf-1`, which made `docker ps` and
`laconic-so deployment ... logs telegraf` confusing.

Rename the host-metrics service to `host-telegraf` -- the descriptive
name fits a host-system-metrics collector and is unambiguous next to
the probe-side telegraf in gorchain-monitoring.

v1.1.0-3d70370-202605111308

Toggle v1.1.0-3d70370-202605111308's commit message
host-metrics: rename telegraf service to host-telegraf

gorchain-monitoring also ships a `telegraf` service (synthetic RPC +
HTTP probes); running both on the same host produced two containers
named `laconic-<hash>-telegraf-1`, which made `docker ps` and
`laconic-so deployment ... logs telegraf` confusing.

Rename the host-metrics service to `host-telegraf` -- the descriptive
name fits a host-system-metrics collector and is unambiguous next to
the probe-side telegraf in gorchain-monitoring.

v1.1.0-2ff7e5e-202605061003

Toggle v1.1.0-2ff7e5e-202605061003's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
deploy: restart now force-recreates compose containers (#752)

Operator-reported: editing source files mounted into a service via
bind volumes (alert rules, dashboards, scripts, templates, telegraf
config) and running 'laconic-so deployment ... restart' did not
take effect. Operator had to fall back to 'stop && start' to pick
up changes.

Root cause: 'restart' calls up_operation, which translates to
'docker compose up -d'. Compose's up only recreates a container
when the *service definition* itself (image, env, ports, volume
declarations) changes. Bind-mount target file content is not part
of that hash, so the running container kept its old in-memory
state (e.g. Grafana's pre-edit provisioning).

Add force_recreate kwarg through the deployer interface and have
restart pass force_recreate=True. compose path threads through to
python_on_whales' compose.up(force_recreate=...). k8s path accepts
the kwarg but is a no-op for now (rolling update on
unchanged-spec needs a separate fix that stamps the
kubectl.kubernetes.io/restartedAt annotation on managed
Deployments; tracked in a follow-up).

v1.1.0-cf0e230-202605050445

Toggle v1.1.0-cf0e230-202605050445's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
bug-fix: fix image-overrides usage to load locally build images into …

…kind cluster (#751)

- Cluster setup was only considering images from containers list in `stack.yml` for kind-loading into the cluster; i.e. images from `image_overrides` in spec were not being loaded
- This also resulted in laconic-so to attempt kind-loading images not present locally sometimes
- Fix: union `image_overrides` values (user-specified local images) with the ones from container-list, filtered to only ones that are actually present on the docker host

v1.1.0-7c65d39-202604281204

Toggle v1.1.0-7c65d39-202604281204's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Make deployments self-sufficient and add E2E restart test (#750)

- `deploy create` now copies each pod's `commands.py` into `<deployment>/hooks/`. `call_stack_deploy_start` loads from there, so `deployment start` / `restart` no longer need the live stack source on disk to run the `start()` hook
- Only the `start()` hook is affected. `init`, `setup`, and `create` still load from the live source — they only run at `deploy create` time, when the source is guaranteed to be present
- Multi-repo stacks produce `hooks/commands_0.py`, `hooks/commands_1.py`, …; `call_stack_deploy_start` loads them all in sorted order
- Adds `tests/k8s-deploy/run-restart-test.sh` covering the full single-repo restart cycle (v1 -> mutate working tree -> `restart` re-copies and re-executes v2) and the multi-repo file-naming + multi-hook invocation. Wired into the existing **K8s Deploy Test** workflow