fix(cron): defer desktop ticker to gateway scheduler owner#44049
Open
AIalliAI wants to merge 1 commit into
Open
fix(cron): defer desktop ticker to gateway scheduler owner#44049AIalliAI wants to merge 1 commit into
AIalliAI wants to merge 1 commit into
Conversation
The Desktop dashboard cron ticker shared only cron/.tick.lock with the launchd gateway ticker, which gives at-most-once execution but not deterministic execution provenance: whichever ticker won the lock ran the job. On macOS, TCC / Full Disk Access depends on process ancestry, so jobs that won under the dashboard backend lost access to protected local data. tick() now accepts defer_to_gateway_owner; the desktop ticker passes True and skips execution while a gateway holds the per-profile runtime lock (gateway.status.is_gateway_runtime_lock_active). With no gateway running, desktop-only setups keep firing jobs as before. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
1 task
14 tasks
12 tasks
Contributor
Author
|
Requesting maintainer review — this is ready to land from my side. Standalone fork CI is pending first-run approval here; the rollup branch in #44061 carrying this session's batch is fully green on upstream CI (all test shards, typecheck, e2e). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #43965
Problem
The Desktop dashboard backend (
HERMES_DESKTOP=1) runs its own cron ticker, sharing onlycron/.tick.lockwith the launchd gateway ticker. That lock gives at-most-once execution but not deterministic ownership — whichever process wins the lock runs the job. On macOS, TCC / Full Disk Access depends on process ancestry, so a job that fired under the dashboard ancestry lost access to protected local data that the same job could read when run by the launchd gateway chain.Observed with a launchd-managed gateway (
ai.hermes.gateway-<profile>) and Desktop-spawned dashboard backends alive for the same profile: the dashboard ticker sometimes woncron/.tick.lockand executed an hourly job without the gateway's FDA provenance. After terminating the dashboard backends, the same job succeeded from the gateway ancestry.Related: #25737 (durable server-side cron runner), #40684 (added the desktop ticker, relying on the tick lock), #43652 (tick-lock release semantics).
Steps to reproduce
hermes --profile <p> gateway run --replaceHERMES_DESKTOP=1) are alive for the same profileFix
Use the gateway's per-profile runtime lock (
gateway.status.is_gateway_runtime_lock_active, held for the live gateway's lifetime, OS-released on death — never stale) as the scheduler-ownership signal:cron/scheduler.py: new_gateway_scheduler_owner_active()helper (lazy import, fails open if the signal is unavailable) and a keyword-onlydefer_to_gateway_owner=Falseparameter ontick(). When set and a gateway owns the scheduler, the tick returns 0 before taking the tick lock or inspecting jobs.hermes_cli/web_server.py: the desktop ticker passesdefer_to_gateway_owner=Trueand the docstring no longer claims the tick lock alone makes it cross-process safe.gateway/run.py) keeps the defaultFalse, so it always runs.Net effect: with a gateway running, only the gateway executes jobs (correct provenance). With no gateway (desktop-only setup), the dashboard ticker keeps firing jobs exactly as before.
Tests
New
tests/cron/test_scheduler_ownership.py(5 tests): deferral when a gateway owner is active (no job inspection, no lock taken), normal execution when no owner, gateway ticker unaffected by the guard, helper delegation togateway.status, fail-open on error.Ran via
scripts/run_tests.shon macOS: new file plustests/cron/test_scheduler.py,tests/cron/test_parallel_pool.py,tests/test_web_server.py,tests/gateway/test_status.py— 207 tests, all passing. Platforms tested: macOS (the bug is macOS-specific; the guard itself is platform-neutral and a no-op when no gateway lock is held).🤖 Generated with Claude Code