Skip to content

Operator follow-up: finish HERMES_HOME/profile isolation smoke quirks #4671

@taeonu

Description

@taeonu

Summary

Backfill smoke still found a small set of non-blocking operator-facing inconsistencies around profile/HERMES_HOME isolation. This is not a release blocker, but it should be tracked as one bounded cleanup lane so operators can trust profile-scoped behavior.

Adjacent issues already exist for broader gateway/profile isolation problems (#4402, #4426, #4587). This follow-up is for the remaining smoke quirks below.

Scope

  1. hermes gateway status and global process visibility can still report/process data outside the active HERMES_HOME, even when smoke is run against isolated homes.
  2. Some profile path behavior still appears HOME-based rather than HERMES_HOME-based, which makes profile-local expectations ambiguous.
  3. cron remove <missing-job> prints a failure-style message but exits 0, which is awkward for scripts/operators that rely on exit status.
  4. Tick-related tests are materially more stable in serial mode than under xdist, suggesting leftover shared-state or timing coupling in the test/runtime path.

Why this matters

These do not currently block backfill smoke, but they weaken operator confidence in:

  • profile isolation
  • scriptable CLI semantics
  • parallel-test reliability as a signal for regressions

Desired outcome

Treat this as a bounded cleanup/polish pass:

  • make profile/HERMES_HOME scoping rules explicit and consistent for status/path resolution
  • make cron remove return a clearly actionable nonzero exit code when the target job does not exist (or intentionally document/standardize a different contract)
  • either make tick tests xdist-safe again or explicitly mark/serialize the cases that still require serial execution

Verification expectations

A fix here should include targeted coverage showing:

  • status/process inspection only reports the active HERMES_HOME scope
  • path resolution under profiles matches documented HERMES_HOME semantics
  • removing a missing cron job has deterministic message + exit-code behavior
  • tick-related coverage passes reliably under the intended test mode (xdist-safe, or intentionally serialized with rationale)

Priority

Non-blocking follow-up from backfill smoke. Worth cleaning up soon, but lower priority than operator-breaking isolation regressions or data-loss bugs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3Low — cosmetic, nice to havearea/configConfig system, migrations, profilescomp/cliCLI entry point, hermes_cli/, setup wizardcomp/cronCron scheduler and job managementcomp/gatewayGateway runner, session dispatch, deliverytype/refactorCode restructuring, no behavior change

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions