Skip to content

fix(cron): pull Bitwarden Secrets Manager secrets in run_job#33479

Open
briandevans wants to merge 2 commits into
NousResearch:mainfrom
briandevans:fix/cron-bsm-secrets-33465
Open

fix(cron): pull Bitwarden Secrets Manager secrets in run_job#33479
briandevans wants to merge 2 commits into
NousResearch:mainfrom
briandevans:fix/cron-bsm-secrets-33465

Conversation

@briandevans

Copy link
Copy Markdown
Contributor

What does this PR do?

The cron scheduler called bare dotenv.load_dotenv() and never invoked the Bitwarden Secrets Manager (BSM) apply path. Any cron job that needed a BSM-managed credential (Discord token, provider API key, etc.) saw the .env placeholder string instead of the real value and failed with HTTP 401, while the gateway and the CLI happily resolved the same secrets via load_hermes_dotenv().

This PR adds a follow-up call to _apply_external_secret_sources(_get_hermes_home()) immediately after the existing dotenv reload in _run_job_impl, so cron picks up the same secrets the gateway does. The _APPLIED_HOMES process-level dedup added in #32271 (de76f4d) makes calling this on every tick effectively free after the first successful pull. A failing BSM backend must never block job execution, so the call is wrapped in a debug-logged try/except.

Why additive instead of switching to load_hermes_dotenv()?

The issue's suggested fix is to replace from dotenv import load_dotenv; load_dotenv(...) with load_hermes_dotenv(...). I chose the minimal additive change instead for two reasons:

  • 19 existing cron tests patch dotenv.load_dotenv directly (see tests/cron/test_scheduler.py, tests/cron/test_cron_profile.py, tests/cron/test_cron_workdir.py). load_hermes_dotenv binds load_dotenv at module-import time inside hermes_cli/env_loader.py, so those patches would no longer intercept and the test suite would need a coordinated rewrite.
  • load_hermes_dotenv also loads a project_env (<repo>/.env) when passed one. The scheduler has never loaded a project env, so switching to it would silently broaden the load surface beyond what the bug requires.

The bug is specifically the missing BSM apply step. Calling _apply_external_secret_sources directly delivers exactly that, leaves the existing dotenv flow (and its test contract) untouched, and is the same step load_hermes_dotenv itself invokes after its dotenv calls. _apply_external_secret_sources is already a stable, directly-tested API (see tests/test_env_loader_secret_sources.py and tests/test_bitwarden_secrets.py).

Related Issue

Fixes #33465

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 🔒 Security fix
  • 📝 Documentation update
  • ✅ Tests (adding or improving test coverage)
  • ♻️ Refactor (no behavior change)
  • 🎯 New skill (bundled or hub)

Changes Made

  • cron/scheduler.py — after the dotenv reload in _run_job_impl, lazily import _apply_external_secret_sources and call it with the cron job's HERMES_HOME so BSM secrets land in os.environ before AIAgent is constructed; swallow any exception with a debug log so a failing backend can't block job execution.
  • tests/cron/test_scheduler.py — add TestRunJobBSMSecretResolution with two cases: (1) BSM-enabled config makes the apply path put ANTHROPIC_API_KEY into os.environ before AIAgent init; (2) a raising BSM backend does not block run_job. An autouse fixture imports run_agent eagerly so its module-level load_hermes_dotenv runs once against the ambient HERMES_HOME and doesn't mask whether the scheduler itself is invoking the apply path.

How to Test

  1. Reproduce the bug on origin/main (before this PR): configure ~/.hermes/config.yaml with secrets.bitwarden.enabled: true and a valid BSM project, set the provider API key value only in BSM (not in .env), and observe that any cron-scheduled job receives HTTP 401 on first tick while the gateway resolves the same key cleanly.
  2. With this PR applied, the same cron job picks up the BSM-managed key and runs without 401.
  3. uv run --with pytest --with pytest-xdist --with pytest-asyncio python3 -m pytest tests/cron/test_scheduler.py::TestRunJobBSMSecretResolution -v — both new cases pass.
  4. uv run --with pytest --with pytest-xdist --with pytest-asyncio python3 -m pytest tests/cron/ tests/test_env_loader_secret_sources.py tests/test_bitwarden_secrets.py -v — 215 existing + new tests all pass; no regressions.

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix/feature (no unrelated commits)
  • I've run focused tests for the touched code and all pass
  • I've added tests for my changes (required for bug fixes, strongly encouraged for features)
  • I've tested on my platform: macOS 15.x

Documentation & Housekeeping

  • I've updated relevant documentation (README, docs/, docstrings) — N/A (inline comments cover the why)
  • I've updated cli-config.yaml.example if I added/changed config keys — N/A (no new config keys)
  • I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — N/A
  • I've considered cross-platform impact (Windows, macOS) — N/A (the apply path is platform-agnostic; behaves identically on Windows, macOS, and Linux)
  • I've updated tool descriptions/schemas if I changed tool behavior — N/A

Audited siblings: confirmed the only other in-process entrypoints that need BSM applied (cli.py, gateway/run.py, trajectory_compressor.py, hermes_cli/main.py, acp_adapter/entry.py, tui_gateway/server.py, tools/mcp_tool.py, gateway/platforms/feishu_comment_rules.py, hermes_cli/doctor.py, hermes_cli/dump.py, scripts/discord-voice-doctor.py) all already call load_hermes_dotenv() at module-import time and therefore already invoke _apply_external_secret_sources. The cron scheduler was the lone gap. No widening needed.

Screenshots / Logs

N/A.

Copilot AI review requested due to automatic review settings May 27, 2026 21:17

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR fixes a regression where the cron scheduler did not load Bitwarden Secrets Manager (BSM) secrets, causing cron jobs that depend on BSM-managed credentials to fail with HTTP 401. The fix invokes _apply_external_secret_sources during _run_job_impl so secrets land in os.environ before the agent is constructed.

Changes:

  • Added BSM secret resolution step in _run_job_impl after the dotenv load, with errors logged and swallowed.
  • Added a regression test verifying BSM secrets reach os.environ before AIAgent construction.
  • Added a test verifying that BSM apply failures do not block job execution.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

File Description
cron/scheduler.py Calls _apply_external_secret_sources during job execution to populate BSM secrets into env.
tests/cron/test_scheduler.py Adds TestRunJobBSMSecretResolution covering successful BSM apply and failure tolerance.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread cron/scheduler.py Outdated
Comment on lines +1484 to +1487
from hermes_cli.env_loader import _apply_external_secret_sources
_apply_external_secret_sources(_get_hermes_home())
except Exception:
logger.debug("BSM secret apply failed", exc_info=True)
Comment thread tests/cron/test_scheduler.py Outdated
Comment on lines +2619 to +2628
def test_run_job_applies_bitwarden_secrets(self, tmp_path, monkeypatch):
"""BSM apply must run during run_job so the secret lands in os.environ
before AIAgent is constructed."""
from agent.secret_sources.bitwarden import FetchResult
from hermes_cli import env_loader

# Clean per-process dedup state so the apply path actually runs.
env_loader._SECRET_SOURCES.clear()
env_loader.reset_secret_source_cache()

Comment thread tests/cron/test_scheduler.py Outdated
Comment on lines +2691 to +2692
env_loader._SECRET_SOURCES.clear()
env_loader.reset_secret_source_cache()
Comment thread tests/cron/test_scheduler.py Outdated
Comment on lines +2626 to +2627
env_loader._SECRET_SOURCES.clear()
env_loader.reset_secret_source_cache()

with patch("cron.scheduler._hermes_home", tmp_path), \
patch("cron.scheduler._resolve_origin", return_value=None), \
patch("dotenv.load_dotenv"), \
Comment on lines +2667 to +2668
import agent.secret_sources.bitwarden as bw_module
monkeypatch.setattr(bw_module, "apply_bitwarden_secrets", _fake_apply)
@alt-glitch alt-glitch added type/bug Something isn't working comp/cron Cron scheduler and job management area/auth Authentication, OAuth, credential pools P2 Medium — degraded but workaround exists labels May 27, 2026
@briandevans

Copy link
Copy Markdown
Contributor Author

@copilot All findings addressed in commit bdcdd1f73:

  • scheduler.py:1487 (private API) — Promoted _apply_external_secret_sources to a public apply_external_secret_sources in hermes_cli/env_loader.py and switched the scheduler call site to it. Kept a backward-compatible alias on the underscore name so existing tests (tests/test_bitwarden_secrets.py, tests/test_env_loader_secret_sources.py) continue to work unchanged.
  • test_scheduler.py:2628/2692 (cleanup leak on assertion failure) — Replaced the inline cleanup at the end of both new tests with an autouse setup+teardown fixture (_isolate_secret_source_state) that clears _SECRET_SOURCES and _APPLIED_HOMES both before and after each test in a try/finally. State now resets even when an assertion fails partway through.
  • test_scheduler.py:2627 (private _SECRET_SOURCES access) — Kept the direct mutation in the new autouse fixture (only used by tests in this class). Folding _SECRET_SOURCES clearing into reset_secret_source_cache() would change semantics for non-test callers (the source-label cache outlives the dedup set on purpose, so get_secret_source() still returns "bitwarden" after a re-pull), so I left that out of scope.
  • test_scheduler.py:2668/2674 (patch site vs import site) — Both patches are correct here because the production imports are function-local: from dotenv import load_dotenv lives inside _run_job_impl (cron/scheduler.py:1470) and from agent.secret_sources.bitwarden import apply_bitwarden_secrets lives inside _apply_external_secret_sources (hermes_cli/env_loader.py:282), so both lookups happen at call time and see the monkeypatched values. Verified by passing tests on both the original commit and this fix-up.

@briandevans

Copy link
Copy Markdown
Contributor Author

CI audit — both test failures are pre-existing baselines on clean origin/main (2d5dcfa). Zero failures are in touched code (cron/scheduler, env_loader, cron tests).

Test Symptom Root cause on main
tests/tools/test_kanban_tools.py::test_worker_complete_rejects_stale_run_id assert [] == ["t_a6365ccd"]detect_crashed_workers returns [] for a fresh worker _resolve_crash_grace_seconds (30-s grace, default value) was added in hermes_cli/kanban_db.py recently; the test does not advance time past the grace window, so a freshly-claimed worker is intentionally skipped
tests/tools/test_windows_native_support.py::TestKanbanWaitpidWindowsGuard::test_source_gates_waitpid_loop os.waitpid(-1, os.WNOHANG) must sit behind an os.name != "nt" guard reap_worker_zombies() in kanban_db.py uses the equivalent early-return if os.name == "nt": return [] pattern; the source-inspection assertion looks for the literal != "nt" string

Reproduced both on git checkout 2d5dcfabc with uv run --with pytest --with pytest-xdist --with pytest-asyncio --with pytest-timeout python3 -m pytest tests/tools/test_kanban_tools.py::test_worker_complete_rejects_stale_run_id tests/tools/test_windows_native_support.py::TestKanbanWaitpidWindowsGuard::test_source_gates_waitpid_loop -v → 2 failed (identical error text).

The cron scheduler called bare `dotenv.load_dotenv()` and never invoked
the BSM apply path. Any cron job that needed a BSM-managed credential
(Discord token, provider API key, etc.) saw the .env placeholder string
instead of the real value and failed with HTTP 401, while the gateway
and the CLI happily resolved the same secrets via `load_hermes_dotenv()`.

Add a follow-up call to `_apply_external_secret_sources(_get_hermes_home())`
after the dotenv reload so cron picks up the same secrets the gateway
does. The `_APPLIED_HOMES` process-level dedup added in NousResearch#32271 (de76f4d)
makes calling this on every tick effectively free after the first pull.
A failing BSM backend must never block job execution, so the call is
wrapped in a debug-logged try/except.

This is the minimal additive change rather than the full substitution
of `load_hermes_dotenv()`: existing tests patch `dotenv.load_dotenv`
directly via 19 sites, and replacing the dotenv call would change
path-loading semantics (project_env loading) the scheduler has never
exercised. The bug is specifically missing BSM resolution; this fix
covers exactly that.

Fixes NousResearch#33465
Three follow-ups on PR NousResearch#33479:

1. Promote `_apply_external_secret_sources` to a public
   `apply_external_secret_sources` in `hermes_cli.env_loader`, with a
   backward-compatible alias for existing tests that imported the
   underscore name. Cron now imports the public symbol instead of
   reaching into a name-mangled private attribute.

2. Replace the inline cleanup at the end of the two new BSM tests with
   an autouse setup+teardown fixture so `_SECRET_SOURCES` / the
   `_APPLIED_HOMES` dedup set are reset around every test even when an
   assertion fails partway through. `test_scheduler.py` is large and
   shares enough state that a leak would manifest in sibling tests.

3. Minor docstring fix in `reset_secret_source_cache` to track the new
   public name.

Patching of `dotenv.load_dotenv` and `agent.secret_sources.bitwarden.apply_bitwarden_secrets`
remains correct because both imports inside the production code are
function-local (deferred to call time), so the monkeypatch on the
source module is honoured. Confirmed by passing tests under the
original PR as well as this follow-up.
@briandevans briandevans force-pushed the fix/cron-bsm-secrets-33465 branch from a43af2d to b42d2a5 Compare June 2, 2026 23:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/auth Authentication, OAuth, credential pools comp/cron Cron scheduler and job management P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: cron scheduler missing BSM resolution

3 participants