fix(continuous-learning-v2): observer background process crashes immediately#312
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review infoConfiguration used: defaults Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughRuns the observer as a fully detached nohup background process with PID-file lifecycle and verification, interruptible sleep loop, USR1-triggered on-demand analysis, timestamped archival of observations, and stricter Claude analysis output requiring YAML instincts frontmatter. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Starter as "Starter (invoker)"
participant Observer as "start-observer.sh (detached)"
participant FS as "Filesystem (obs file / archive)"
participant Claude as "Claude (analysis)"
Starter->>Observer: nohup bash -c (export envs + start detached block)
Observer->>Observer: write PID_FILE, setup traps & cleanup
Observer->>Observer: spawn interruptible background sleep (sleep + wait)
alt USR1 received
Observer->>Observer: cancel sleep, set USR1 flag, run on_usr1
Observer->>FS: read `OBSERVATIONS_FILE`
Observer->>Claude: send analysis request (strict YAML frontmatter)
Claude-->>Observer: return instincts YAML
Observer->>FS: move processed file to archive (timestamp) & recreate observations file
else periodic timer triggers
Observer->>FS: read `OBSERVATIONS_FILE`
Observer->>Claude: send scheduled analysis request
Claude-->>Observer: return instincts YAML
Observer->>FS: archive processed observations
end
Observer->>Starter: print startup success (PID & log) or failure if process died
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
skills/continuous-learning-v2/agents/start-observer.sh (1)
77-80: Variable injection can break on paths with single quotes.The
'"$VAR"'pattern will produce malformed bash if any path contains a single quote (e.g.,/home/user's-dir/.claude/...). While uncommon, this can cause silent failures.A safer alternative is to use
printf '%q'to escape values or export the variables so the child inherits them.Alternative: export variables before nohup
+ # Export for child process + export CONFIG_DIR PID_FILE LOG_FILE OBSERVATIONS_FILE + # The observer loop — fully detached with nohup, IO redirected to log nohup /bin/bash -c ' set +e unset CLAUDECODE - - CONFIG_DIR="'"$CONFIG_DIR"'" - PID_FILE="'"$PID_FILE"'" - LOG_FILE="'"$LOG_FILE"'" - OBSERVATIONS_FILE="'"$OBSERVATIONS_FILE"'" + # Variables inherited from parent via export trap "rm -f \"$PID_FILE\"; exit 0" TERM INT🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@skills/continuous-learning-v2/agents/start-observer.sh` around lines 77 - 80, The current inline assignment using the pattern '"$CONFIG_DIR"','"$PID_FILE"','"$LOG_FILE"','"$OBSERVATIONS_FILE"' can break when values contain single quotes; update start-observer.sh to either (a) escape each variable before embedding by using printf '%q' for CONFIG_DIR, PID_FILE, LOG_FILE and OBSERVATIONS_FILE so the generated shell fragment is safe, or (b) simpler: export those variables in the parent shell (export CONFIG_DIR PID_FILE LOG_FILE OBSERVATIONS_FILE) and invoke nohup/child without embedding quoted values so the child inherits them; modify the code that writes/echoes those lines to use one of these approaches and ensure the unique symbols CONFIG_DIR, PID_FILE, LOG_FILE and OBSERVATIONS_FILE are handled accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@skills/continuous-learning-v2/agents/start-observer.sh`:
- Around line 147-150: The loop currently runs sleep 300 & wait $! then always
calls analyze_observations, which causes a duplicate run when the USR1 trap
already invoked analyze_observations; fix this by introducing a skip flag (e.g.,
SKIP_SCHEDULED_ANALYSIS) that the USR1 trap sets to true before calling
analyze_observations, and in the while loop (the sleep/wait block that calls
analyze_observations) check that flag: if set, clear it and skip the scheduled
analyze_observations invocation; reference the existing analyze_observations
function and the while true; do sleep 300 & wait $! loop to locate where to add
the flag guard and where the trap should set the flag.
---
Nitpick comments:
In `@skills/continuous-learning-v2/agents/start-observer.sh`:
- Around line 77-80: The current inline assignment using the pattern
'"$CONFIG_DIR"','"$PID_FILE"','"$LOG_FILE"','"$OBSERVATIONS_FILE"' can break
when values contain single quotes; update start-observer.sh to either (a) escape
each variable before embedding by using printf '%q' for CONFIG_DIR, PID_FILE,
LOG_FILE and OBSERVATIONS_FILE so the generated shell fragment is safe, or (b)
simpler: export those variables in the parent shell (export CONFIG_DIR PID_FILE
LOG_FILE OBSERVATIONS_FILE) and invoke nohup/child without embedding quoted
values so the child inherits them; modify the code that writes/echoes those
lines to use one of these approaches and ensure the unique symbols CONFIG_DIR,
PID_FILE, LOG_FILE and OBSERVATIONS_FILE are handled accordingly.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 561f9eb7bf
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Pull request overview
Fixes the continuous-learning-v2 observer background process so it stays running reliably when launched from interactive Claude Code sessions, and ensures generated instinct files conform to instinct-cli.py’s expected frontmatter format.
Changes:
- Detach the observer loop using
nohup /bin/bash -c ...and redirect output to the observer log. - Prevent immediate loop termination by disabling inherited
set -ebehavior and unsettingCLAUDECODEin the background process. - Update the Haiku prompt to include the full instinct file format inline (YAML frontmatter + required
id).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
🧹 Nitpick comments (2)
skills/continuous-learning-v2/agents/start-observer.sh (2)
181-183: Replace fixed startup delay with bounded readiness polling.Line 182’s fixed
sleep 2can still false-fail on slower machines/filesystems. A short polling loop with timeout is more reliable.⏱️ Suggested refactor
- # Wait for PID file - sleep 2 + # Wait up to 10s for PID file + for _ in {1..40}; do + [ -f "$PID_FILE" ] && break + sleep 0.25 + done🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@skills/continuous-learning-v2/agents/start-observer.sh` around lines 181 - 183, The fixed sleep (sleep 2) used to "Wait for PID file" can false-fail on slow systems; replace it with a bounded polling loop that repeatedly checks for the PID file's existence (e.g., the variable holding the PID path such as OBSERVER_PID_FILE or PID_FILE used earlier in start-observer.sh) with a short sleep (e.g., 0.2s) and an overall timeout (e.g., 10s); if the file appears break and continue, otherwise after timeout emit an error and exit non-zero—implement the loop where the current sleep 2 sits and preserve existing logging/exit behavior.
26-31: Add explicit PID validation for defense-in-depth.Lines 27, 45, 65, and 185 use PID values from the PID file with
kill. Whilekill -0provides a safety check, explicit validation before reading the PID improves code clarity and robustness. Consider adding a validation function to ensure the PID file contains only positive integers before use, rejecting malformed values (e.g., negative numbers, special values like-1).Suggested validation function
+is_valid_pid() { + [[ "$1" =~ ^[1-9][0-9]*$ ]] +} + case "${1:-start}" in stop) if [ -f "$PID_FILE" ]; then pid=$(cat "$PID_FILE") + if ! is_valid_pid "$pid"; then + echo "Observer not running (invalid PID file)." + rm -f "$PID_FILE" + exit 0 + fi if kill -0 "$pid" 2>/dev/null; then🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@skills/continuous-learning-v2/agents/start-observer.sh` around lines 26 - 31, Add a defensive PID validation routine (e.g., is_valid_pid or validate_pid) that reads the PID string from PID_FILE and ensures it is a non-zero positive integer (reject negative numbers, zero, empty strings, and values containing non-digits or special prefixes like "-1"); call this validator wherever the script reads pid from PID_FILE (references: PID_FILE variable and the blocks that call kill and kill -0) and if validation fails log an error, remove or ignore the malformed PID_FILE, and avoid calling kill with the bad value.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@skills/continuous-learning-v2/agents/start-observer.sh`:
- Around line 181-183: The fixed sleep (sleep 2) used to "Wait for PID file" can
false-fail on slow systems; replace it with a bounded polling loop that
repeatedly checks for the PID file's existence (e.g., the variable holding the
PID path such as OBSERVER_PID_FILE or PID_FILE used earlier in
start-observer.sh) with a short sleep (e.g., 0.2s) and an overall timeout (e.g.,
10s); if the file appears break and continue, otherwise after timeout emit an
error and exit non-zero—implement the loop where the current sleep 2 sits and
preserve existing logging/exit behavior.
- Around line 26-31: Add a defensive PID validation routine (e.g., is_valid_pid
or validate_pid) that reads the PID string from PID_FILE and ensures it is a
non-zero positive integer (reject negative numbers, zero, empty strings, and
values containing non-digits or special prefixes like "-1"); call this validator
wherever the script reads pid from PID_FILE (references: PID_FILE variable and
the blocks that call kill and kill -0) and if validation fails log an error,
remove or ignore the malformed PID_FILE, and avoid calling kill with the bad
value.
affaan-m
left a comment
There was a problem hiding this comment.
Automated review: this PR has merge conflicts. Please rebase or resolve.
affaan-m
left a comment
There was a problem hiding this comment.
Automated review: this PR has merge conflicts. Please rebase or resolve.
affaan-m
left a comment
There was a problem hiding this comment.
Automated review: this PR has merge conflicts. Please rebase or resolve.
affaan-m
left a comment
There was a problem hiding this comment.
Automated review: this PR has merge conflicts. Please rebase or resolve.
affaan-m
left a comment
There was a problem hiding this comment.
Automated review: this PR has merge conflicts. Please rebase or resolve.
affaan-m
left a comment
There was a problem hiding this comment.
Automated review: this PR has merge conflicts. Please rebase or resolve.
affaan-m
left a comment
There was a problem hiding this comment.
Automated review: this PR has merge conflicts. Please rebase or resolve.
affaan-m
left a comment
There was a problem hiding this comment.
Automated review: this PR has merge conflicts. Please rebase or resolve.
affaan-m
left a comment
There was a problem hiding this comment.
Automated review: this PR has merge conflicts. Please rebase or resolve.
affaan-m
left a comment
There was a problem hiding this comment.
Automated review: this PR has merge conflicts. Please rebase or resolve.
affaan-m
left a comment
There was a problem hiding this comment.
Automated review: this PR has merge conflicts. Please rebase or resolve.
…diately Three bugs prevent the observer from running: 1. Nested session detection: When launched from a Claude Code session, the child process inherits CLAUDECODE env var, causing `claude` CLI to refuse with "cannot be launched inside another session". Fix: unset CLAUDECODE in the background process. 2. set -e kills the loop: The parent script's `set -e` is inherited by the subshell. When `claude` exits non-zero (e.g. max turns reached), the entire observer loop dies. Fix: `set +e` in the background process. 3. Subshell dies when parent exits: `( ... ) & disown` loses IO handles when the parent shell exits, killing the background process. Fix: use `nohup /bin/bash -c '...'` for full detachment, and `sleep & wait` to allow SIGUSR1 to interrupt sleep without killing the process. Additionally, the prompt for Haiku now includes the exact instinct file format inline (YAML frontmatter with id/trigger/confidence/domain/source fields), since the previous prompt referenced "the observer agent spec" which Haiku could not actually read, resulting in instinct files that the CLI parser could not parse.
- Use `env` to pass variables to child process instead of quote-splicing, avoiding shell injection risk from special chars in paths - Add USR1_FIRED flag to prevent double analysis when SIGUSR1 interrupts the sleep/wait cycle - Track SLEEP_PID and kill it in both TERM trap and USR1 handler to prevent orphaned sleep processes from accumulating - Consolidate cleanup logic into a dedicated cleanup() function
29c413e to
2f80af3
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
skills/continuous-learning-v2/agents/start-observer.sh (1)
259-260: Prefer a short poll loop over a fixed 2s startup wait.A fixed delay can cause flaky “failed to start” reports on slower machines.
💡 Proposed refactor
- # Wait for PID file - sleep 2 + # Wait up to 10s for PID file + for _ in {1..50}; do + [ -f "$PID_FILE" ] && break + sleep 0.2 + done🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@skills/continuous-learning-v2/agents/start-observer.sh` around lines 259 - 260, Replace the fixed "sleep 2" startup wait with a short poll loop that checks for the observer PID file existence (the PID file referenced where "sleep 2" currently sits) at a small interval (e.g., 100–250ms), stop polling once the PID file appears, and enforce a reasonable overall timeout after which the script logs a clear failure and exits non‑zero; update the block around the existing "sleep 2" call to perform this polling, use the same PID file variable name used elsewhere in the script (e.g., PID_FILE or PIDFILE), and ensure to log success/failure so callers don’t get flaky “failed to start” reports.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@skills/continuous-learning-v2/agents/start-observer.sh`:
- Around line 158-161: The cleanup function currently unconditionally removes
PID_FILE which can delete a newer observer's PID during restart races; modify
cleanup (the cleanup function that uses SLEEP_PID and PID_FILE) to first read
the PID stored in PID_FILE and only rm -f "$PID_FILE" if that stored PID equals
the current process ID ($$), otherwise leave the file untouched; keep the
existing kill of SLEEP_PID and the exit behavior unchanged.
---
Nitpick comments:
In `@skills/continuous-learning-v2/agents/start-observer.sh`:
- Around line 259-260: Replace the fixed "sleep 2" startup wait with a short
poll loop that checks for the observer PID file existence (the PID file
referenced where "sleep 2" currently sits) at a small interval (e.g.,
100–250ms), stop polling once the PID file appears, and enforce a reasonable
overall timeout after which the script logs a clear failure and exits non‑zero;
update the block around the existing "sleep 2" call to perform this polling, use
the same PID file variable name used elsewhere in the script (e.g., PID_FILE or
PIDFILE), and ensure to log success/failure so callers don’t get flaky “failed
to start” reports.
Only remove PID file in cleanup trap if it still belongs to the current process, preventing a restarted observer from losing its PID file when the old process exits.
…diately (affaan-m#312) * fix(continuous-learning-v2): observer background process crashes immediately Three bugs prevent the observer from running: 1. Nested session detection: When launched from a Claude Code session, the child process inherits CLAUDECODE env var, causing `claude` CLI to refuse with "cannot be launched inside another session". Fix: unset CLAUDECODE in the background process. 2. set -e kills the loop: The parent script's `set -e` is inherited by the subshell. When `claude` exits non-zero (e.g. max turns reached), the entire observer loop dies. Fix: `set +e` in the background process. 3. Subshell dies when parent exits: `( ... ) & disown` loses IO handles when the parent shell exits, killing the background process. Fix: use `nohup /bin/bash -c '...'` for full detachment, and `sleep & wait` to allow SIGUSR1 to interrupt sleep without killing the process. Additionally, the prompt for Haiku now includes the exact instinct file format inline (YAML frontmatter with id/trigger/confidence/domain/source fields), since the previous prompt referenced "the observer agent spec" which Haiku could not actually read, resulting in instinct files that the CLI parser could not parse. * fix: address review feedback on observer process management - Use `env` to pass variables to child process instead of quote-splicing, avoiding shell injection risk from special chars in paths - Add USR1_FIRED flag to prevent double analysis when SIGUSR1 interrupts the sleep/wait cycle - Track SLEEP_PID and kill it in both TERM trap and USR1 handler to prevent orphaned sleep processes from accumulating - Consolidate cleanup logic into a dedicated cleanup() function * fix: guard PID file cleanup against race condition on restart Only remove PID file in cleanup trap if it still belongs to the current process, preventing a restarted observer from losing its PID file when the old process exits.
…diately (affaan-m#312) * fix(continuous-learning-v2): observer background process crashes immediately Three bugs prevent the observer from running: 1. Nested session detection: When launched from a Claude Code session, the child process inherits CLAUDECODE env var, causing `claude` CLI to refuse with "cannot be launched inside another session". Fix: unset CLAUDECODE in the background process. 2. set -e kills the loop: The parent script's `set -e` is inherited by the subshell. When `claude` exits non-zero (e.g. max turns reached), the entire observer loop dies. Fix: `set +e` in the background process. 3. Subshell dies when parent exits: `( ... ) & disown` loses IO handles when the parent shell exits, killing the background process. Fix: use `nohup /bin/bash -c '...'` for full detachment, and `sleep & wait` to allow SIGUSR1 to interrupt sleep without killing the process. Additionally, the prompt for Haiku now includes the exact instinct file format inline (YAML frontmatter with id/trigger/confidence/domain/source fields), since the previous prompt referenced "the observer agent spec" which Haiku could not actually read, resulting in instinct files that the CLI parser could not parse. * fix: address review feedback on observer process management - Use `env` to pass variables to child process instead of quote-splicing, avoiding shell injection risk from special chars in paths - Add USR1_FIRED flag to prevent double analysis when SIGUSR1 interrupts the sleep/wait cycle - Track SLEEP_PID and kill it in both TERM trap and USR1 handler to prevent orphaned sleep processes from accumulating - Consolidate cleanup logic into a dedicated cleanup() function * fix: guard PID file cleanup against race condition on restart Only remove PID file in cleanup trap if it still belongs to the current process, preventing a restarted observer from losing its PID file when the old process exits.
Summary
The observer background process (
start-observer.sh) crashes immediately after starting due to three independent bugs. This PR fixes all three and also fixes the Haiku prompt so that generated instinct files match the format expected byinstinct-cli.py.Bug 1: Nested session detection
When launched from a Claude Code session, the child process inherits the
CLAUDECODEenv var, causingclaudeCLI to refuse with "cannot be launched inside another Claude Code session".Fix:
unset CLAUDECODEin the background process.Bug 2:
set -ekills the loopThe parent script's
set -eis inherited by the subshell. Whenclaudeexits non-zero (e.g. max turns reached), the entire observer loop dies silently.Fix:
set +ein the background process.Bug 3: Subshell dies when parent exits
( ... ) & disownloses IO handles when the parent shell exits, killing the background process within seconds.Fix: Use
nohup /bin/bash -c '...'for full detachment, andsleep & waitpattern to allow SIGUSR1 to interrupt sleep without killing the process.Bonus: Inline instinct format in prompt
The previous prompt told Haiku to follow "the format in the observer agent spec", but Haiku has no way to read
observer.md. This resulted in free-form Markdown files thatinstinct-cli.pycould not parse (missing YAML frontmatter /idfield). The prompt now includes the exact format inline.Test plan
start-observer.sh start→ process stays alive after 10+ secondsstart-observer.sh status→ correctly reports running PIDstart-observer.sh stop→ cleanly stops the processclaudeCLI failures (non-zero exit)instinct-cli.py statuscorrectly parses generated filesRelated
Summary by cubic
Fixes observer background process crashes and makes the service robust and signal-safe. Instinct generation now uses a strict YAML format and archives processed observations to prevent duplicates and reprocessing.
Bug Fixes
New Features
Written for commit e5fd15f. Summary will update on new commits.
Summary by CodeRabbit
Improvements
New Features