Skip to content

fix(continuous-learning-v2): add lazy-start observer logic#508

Merged
affaan-m merged 5 commits into
affaan-m:mainfrom
albertlieyingadrian:feat/continuous-learning-v2-debug-lazy-start
Mar 16, 2026
Merged

fix(continuous-learning-v2): add lazy-start observer logic#508
affaan-m merged 5 commits into
affaan-m:mainfrom
albertlieyingadrian:feat/continuous-learning-v2-debug-lazy-start

Conversation

@albertlieyingadrian

@albertlieyingadrian albertlieyingadrian commented Mar 16, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Add lazy-start logic: auto-starts observer when observer.enabled: true in config and no .observer.pid exists

Test plan

  • Set observer.enabled: true in config.json
  • Remove any existing .observer.pid file
  • Restart a Claude session - verify observer auto-starts

Summary by CodeRabbit

  • Improvements

    • Observer now lazily auto-starts when enabled in the config, with support for a configurable override and improved cross-platform reliability.
    • Safer startup checks reduce race conditions and ensure the observer is only started when truly needed.
  • Bug Fixes

    • More robust running-status validation and stale-process cleanup.
    • Deduplicated signaling to prevent duplicate notifications in a single run.

@coderabbitai

coderabbitai Bot commented Mar 16, 2026

Copy link
Copy Markdown
Contributor

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds config-driven lazy-start and race-free startup to the observer: when enabled but not running, acquire an OS lock (flock or macOS lockfile), revalidate PID files, start the observer if still absent, and deduplicate USR1 signaling to avoid signaling the same PID multiple times.

Changes

Cohort / File(s) Summary
Observer lifecycle
skills/continuous-learning-v2/hooks/observe.sh
Replaces unconditional signaling with a config-driven lazy-start and race-free startup: supports CLV2_CONFIG override and effective config path, checks observer.enabled, uses flock (Linux) with macOS lockfile fallback, performs atomic lock+double-check of PID files (project/global) with stale-PID cleanup, invokes start-observer.sh when needed, and deduplicates USR1 signaling per pass. No exported/public API changes.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Cron as Cron/Invoker
    participant Observe as observe.sh
    participant Config as Config (file / CLV2_CONFIG)
    participant Lock as OS Lock (flock / lockfile)
    participant Starter as start-observer.sh
    participant PID as PID files (project/global)
    participant Observer as Observer process

    Cron->>Observe: run
    Observe->>Config: read effective config (is observer.enabled?)
    Observe->>PID: check project/global PID files (is-running / stale cleanup)
    alt PID present
        Observe->>Observer: send USR1 (dedupe per PID)
    else enabled && no PID
        Observe->>Lock: acquire (flock or lockfile)
        Lock->>Observe: granted
        Observe->>PID: re-check PID files (double-check / cleanup)
        Observe->>Config: re-validate enabled
        alt still missing & enabled
            Observe->>Starter: invoke start-observer.sh
            Starter->>Observer: start, write PID files
        end
        Observe->>Lock: release
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰
I nibble locks and peer inside,
I prune the ghosts where PIDs might hide,
I tap the sleepers — once, not twice,
Or whisper growth to wake device,
Hop, observe — the learning's wide.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(continuous-learning-v2): add lazy-start observer logic' accurately describes the main change in the pull request, which introduces lazy-start logic for the observer component.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps

greptile-apps Bot commented Mar 16, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds a lazy-start mechanism to observe.sh so that the background observer process is automatically spawned on the first hook invocation when observer.enabled: true is set in config, without requiring a manual start-observer.sh start call. It also improves robustness with stale-PID cleanup, PID validation, cross-platform locking (flock on Linux, lockfile fallback on macOS), and deduplication of USR1 signals.

Key changes and concerns:

  • CLV2_CONFIG not forwarded to start-observer.sh (new issue, see inline comment): observe.sh correctly reads observer.enabled from the custom config path in CLV2_CONFIG, but the invocation of start-observer.sh start on lines 342 and 355 does not propagate CLV2_CONFIG. Since start-observer.sh hardcodes CONFIG_FILE="${SKILL_ROOT}/config.json", it will check the default config for observer.enabled. If the default config has observer.enabled: false (or is absent), start-observer.sh exits 1 silently, and the lazy-start produces no observer — with no feedback to the user.
  • flock subshell is synchronous: the outer hook blocks until the subshell exits, though this is brief since start-observer.sh is now backgrounded with nohup ... &, mitigating the previous sleep 2 stall concern.
  • PID-write race still present: the lock is released before observer-loop.sh writes the PID file, so a rapid second invocation can still pass the double-check and launch a duplicate observer — a pre-existing concern not fully closed by this PR.
  • Deduplication of USR1 signals via the signaled_pids string pattern is a clean, portable improvement.

Confidence Score: 2/5

  • Not safe to merge as-is: the lazy-start will silently fail for users relying on CLV2_CONFIG, and several race-condition and platform-portability issues flagged in prior reviews remain open.
  • Score of 2 reflects that the core intent of the PR is sound and the deduplication and stale-PID cleanup are genuine improvements, but a newly identified bug (CLV2_CONFIG not forwarded to start-observer.sh) will cause the lazy-start to silently do nothing for users with a custom config. Several critical issues from the previous review round (PID-write race, flock not available on macOS by default, stale lockfile on macOS) are partially addressed but not fully resolved.
  • skills/continuous-learning-v2/hooks/observe.sh — specifically the nohup start-observer.sh start invocations on lines 342 and 355.

Important Files Changed

Filename Overview
skills/continuous-learning-v2/hooks/observe.sh Adds lazy-start observer logic with flock/lockfile locking and stale-PID recovery, but start-observer.sh does not receive CLV2_CONFIG and will silently fail to start the observer when a custom config enables it but the default config.json does not.

Sequence Diagram

sequenceDiagram
    participant C as Claude Code Hook
    participant O as observe.sh
    participant L as Lock (flock/lockfile)
    participant S as start-observer.sh
    participant P as observer-loop.sh

    C->>O: invoke (PreToolUse / PostToolUse)
    O->>O: read stdin, check disabled, check guards
    O->>O: parse input, write to observations.jsonl
    O->>O: read EFFECTIVE_CONFIG → OBSERVER_ENABLED?

    alt OBSERVER_ENABLED=true and no PID file
        O->>L: acquire lock (flock or lockfile)
        L-->>O: lock acquired
        O->>O: double-check PID files (_CHECK_OBSERVER_RUNNING)
        O->>S: nohup start-observer.sh start (backgrounded)
        note over S: reads ${SKILL_ROOT}/config.json<br/>(ignores CLV2_CONFIG!)
        S->>P: nohup observer-loop.sh (backgrounded)
        P-->>S: writes .observer.pid (async, ~2s)
        O->>L: release lock
    end

    alt Observer PID file exists
        O->>O: validate PID, deduplicate
        O->>P: kill -USR1 $observer_pid
    end
Loading

Last reviewed commit: 0f3927c

Comment thread skills/continuous-learning-v2/hooks/observe.sh Outdated
Comment thread skills/continuous-learning-v2/hooks/observe.sh Outdated
Comment thread skills/continuous-learning-v2/hooks/observe.sh Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
skills/continuous-learning-v2/hooks/observe.sh (1)

266-275: ⚠️ Potential issue | 🟡 Minor

Same observer can be signaled twice when both PID files point to the same process.

If project and global pid files contain the same PID, Line 272 runs twice, causing duplicate analysis triggers.

Proposed fix (dedupe signaled PIDs)
+signaled_pids=" "
 for pid_file in "${PROJECT_DIR}/.observer.pid" "${CONFIG_DIR}/.observer.pid"; do
   if [ -f "$pid_file" ]; then
     observer_pid=$(cat "$pid_file")
+    case "$signaled_pids" in
+      *" $observer_pid "*) continue ;;
+    esac
     echo "From observe.sh: Observer PID found: $observer_pid" >> ~/claude-hooks.log
     if kill -0 "$observer_pid" 2>/dev/null; then
       echo "date: $(date) From observe.sh: Sending USR1 signal to observer: $observer_pid" >> ~/claude-hooks.log
       kill -USR1 "$observer_pid" 2>/dev/null || true
+      signaled_pids="${signaled_pids}${observer_pid} "
     fi
   fi
 done
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@skills/continuous-learning-v2/hooks/observe.sh` around lines 266 - 275, The
loop in observe.sh reads .observer.pid from both "${PROJECT_DIR}" and
"${CONFIG_DIR}" and may call kill -USR1 twice when both files contain the same
PID (observer_pid), causing duplicate triggers; fix by deduplicating PIDs before
signaling: maintain a small in-memory set/list of already-signaled observer_pid
values (e.g., signaled_pids) and check membership before executing the kill
-USR1 on observer_pid, only sending the signal once per unique PID and still
logging as currently done.
🧹 Nitpick comments (1)
skills/continuous-learning-v2/hooks/observe.sh (1)

17-17: Unconditional debug logging on every hook call will create noisy, unbounded log growth.

These lines run on the hot path and append every invocation. Gate this behind a debug flag (and ideally rotate/size-cap logs).

Proposed fix
+DEBUG_HOOK_LOG="${CLV2_DEBUG_HOOK_LOG:-0}"
+HOOK_LOG_FILE="${HOME}/claude-hooks.log"
+log_debug() {
+  [ "$DEBUG_HOOK_LOG" = "1" ] || return 0
+  printf '%s\n' "$1" >> "$HOOK_LOG_FILE"
+}
-echo "From observe.sh: Hook fired $(date)" >> ~/claude-hooks.log
+log_debug "From observe.sh: Hook fired $(date)"
@@
-echo "From observe.sh: CONFIG_DIR: $CONFIG_DIR" >> ~/claude-hooks.log
-echo "From observe.sh: OBSERVATIONS_FILE: $OBSERVATIONS_FILE" >> ~/claude-hooks.log
-echo "From observe.sh: CLAUDE_PROJECT_DIR: $CLAUDE_PROJECT_DIR" >> ~/claude-hooks.log
-echo "From observe.sh: PYTHON_CMD: $PYTHON_CMD" >> ~/claude-hooks.log
+log_debug "From observe.sh: CONFIG_DIR: $CONFIG_DIR"
+log_debug "From observe.sh: OBSERVATIONS_FILE: $OBSERVATIONS_FILE"
+log_debug "From observe.sh: CLAUDE_PROJECT_DIR: $CLAUDE_PROJECT_DIR"
+log_debug "From observe.sh: PYTHON_CMD: $PYTHON_CMD"
@@
-echo "date: $(date) From observe.sh: PROJECT_DIR: $PROJECT_DIR" >> ~/claude-hooks.log
-echo "date: $(date) From observe.sh: CONFIG_DIR: $CONFIG_DIR" >> ~/claude-hooks.log
+log_debug "date: $(date) From observe.sh: PROJECT_DIR: $PROJECT_DIR"
+log_debug "date: $(date) From observe.sh: CONFIG_DIR: $CONFIG_DIR"

Also applies to: 95-99, 240-242

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@skills/continuous-learning-v2/hooks/observe.sh` at line 17, The script
currently appends an unconditional debug line (echo "From observe.sh: Hook fired
$(date)" >> ~/claude-hooks.log) on every invocation, causing unbounded log
growth; wrap this and the other similar occurrences (around lines 95-99 and
240-242) in a conditional that checks a debug flag (e.g. CLAUDE_HOOKS_DEBUG or
DEBUG_HOOKS) and only writes the log when that env var is set/truthy, and
consider switching to a rotating/size-limited logging mechanism (or use system
logger) instead of blind appends; update the echo statements in observe.sh so
they are executed only when the debug flag is enabled.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@skills/continuous-learning-v2/hooks/observe.sh`:
- Around line 92-103: The disable-check currently only tests
"${CONFIG_DIR}/disabled" where CONFIG_DIR is hardcoded to
"${HOME}/.claude/homunculus", but the script later loads configuration from
CLV2_CONFIG, so the disable guard can be bypassed; update the logic so
CONFIG_DIR is resolved from CLV2_CONFIG when present (or check both locations)
before testing for a "disabled" file—i.e., use the CLV2_CONFIG value to compute
the effective config directory (or explicitly test "${CLV2_CONFIG}/disabled" in
addition to "${CONFIG_DIR}/disabled") so that the early guard in the observe.sh
startup path respects the same config source used by the lazy-start code.
- Around line 243-263: The lazy-start race is caused by checking only for
${PROJECT_DIR}/.observer.pid before launching start-observer.sh; wrap the
check+start in an atomic lock (e.g., use mkdir-based lock or flock on a
lockfile) inside observe.sh so only one hook instance performs the startup
check, and within that critical section validate any existing .observer.pid by
reading the PID and confirming the process is running (if not running, remove
the stale .observer.pid) before calling "${SKILL_ROOT}/agents/start-observer.sh"
start; ensure the lock is reliably released (trap cleanup) so concurrent hooks
don't spawn duplicate observers and stale PID files are cleaned up.

---

Outside diff comments:
In `@skills/continuous-learning-v2/hooks/observe.sh`:
- Around line 266-275: The loop in observe.sh reads .observer.pid from both
"${PROJECT_DIR}" and "${CONFIG_DIR}" and may call kill -USR1 twice when both
files contain the same PID (observer_pid), causing duplicate triggers; fix by
deduplicating PIDs before signaling: maintain a small in-memory set/list of
already-signaled observer_pid values (e.g., signaled_pids) and check membership
before executing the kill -USR1 on observer_pid, only sending the signal once
per unique PID and still logging as currently done.

---

Nitpick comments:
In `@skills/continuous-learning-v2/hooks/observe.sh`:
- Line 17: The script currently appends an unconditional debug line (echo "From
observe.sh: Hook fired $(date)" >> ~/claude-hooks.log) on every invocation,
causing unbounded log growth; wrap this and the other similar occurrences
(around lines 95-99 and 240-242) in a conditional that checks a debug flag (e.g.
CLAUDE_HOOKS_DEBUG or DEBUG_HOOKS) and only writes the log when that env var is
set/truthy, and consider switching to a rotating/size-limited logging mechanism
(or use system logger) instead of blind appends; update the echo statements in
observe.sh so they are executed only when the debug flag is enabled.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0c319b52-7c7c-44e1-93fe-3ff635ef5d98

📥 Commits

Reviewing files that changed from the base of the PR and between c53bba9 and d9c3391.

📒 Files selected for processing (2)
  • skills/continuous-learning-v2/hooks/observe.sh
  • skills/continuous-learning-v2/scripts/detect-project.sh
💤 Files with no reviewable changes (1)
  • skills/continuous-learning-v2/scripts/detect-project.sh

Comment thread skills/continuous-learning-v2/hooks/observe.sh Outdated
Comment thread skills/continuous-learning-v2/hooks/observe.sh

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6 issues found across 2 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="skills/continuous-learning-v2/hooks/observe.sh">

<violation number="1" location="skills/continuous-learning-v2/hooks/observe.sh:17">
P2: Add error handling (`2>/dev/null || true`) to debug logging statements to prevent hook termination when `~/claude-hooks.log` cannot be written to. Currently unguarded redirects with `set -e` may cause immediate exit if HOME is unavailable, read-only, or disk is full, breaking the observation hook's primary functionality.</violation>

<violation number="2" location="skills/continuous-learning-v2/hooks/observe.sh:17">
P1: Removal of non-human session guards can cause observer self-looping and metadata pollution. The 5-layer guard system (entrypoint check, hook profile check, skip env var, subagent detection, and path exclusions) that prevented the observer from observing its own sessions has been completely removed without being relocated. This allows automated/subagent/observer sessions to write observations and trigger lazy-start, potentially causing the observer to ingest its own activity and pollute project-scoped metadata.</violation>

<violation number="3" location="skills/continuous-learning-v2/hooks/observe.sh:17">
P2: This unconditional hook logging appends to `~/claude-hooks.log` on every invocation and can lead to unbounded log growth; guard these writes behind a debug flag or remove them for normal runs.</violation>

<violation number="4" location="skills/continuous-learning-v2/hooks/observe.sh:244">
P2: Lazy-start observer has race condition: concurrent hooks can start multiple observers</violation>

<violation number="5" location="skills/continuous-learning-v2/hooks/observe.sh:244">
P1: The lazy-start guard should also consider the global `${CONFIG_DIR}/.observer.pid` (and verify PID liveness), otherwise a second observer can be started while a global observer is already running.</violation>

<violation number="6" location="skills/continuous-learning-v2/hooks/observe.sh:251">
P2: Avoid interpolating `$CONFIG_FILE` directly into the Python `-c` source; a path containing quotes can break parsing and make this config read brittle. Pass the fallback path via an environment variable and read it from `os.environ` inside Python.</violation>
</file>

Since this is your first cubic review, here's how it works:

  • cubic automatically reviews your code and comments on bugs and improvements
  • Teach cubic by replying to its comments. cubic learns from your replies and gets better over time
  • Add one-off context when rerunning by tagging @cubic-dev-ai with guidance or docs links (including llms.txt)
  • Ask questions if you need clarification on any suggestion

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

# Hook phase from CLI argument: "pre" (PreToolUse) or "post" (PostToolUse)
HOOK_PHASE="${1:-post}"

echo "From observe.sh: Hook fired $(date)" >> ~/claude-hooks.log

@cubic-dev-ai cubic-dev-ai Bot Mar 16, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Removal of non-human session guards can cause observer self-looping and metadata pollution. The 5-layer guard system (entrypoint check, hook profile check, skip env var, subagent detection, and path exclusions) that prevented the observer from observing its own sessions has been completely removed without being relocated. This allows automated/subagent/observer sessions to write observations and trigger lazy-start, potentially causing the observer to ingest its own activity and pollute project-scoped metadata.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At skills/continuous-learning-v2/hooks/observe.sh, line 17:

<comment>Removal of non-human session guards can cause observer self-looping and metadata pollution. The 5-layer guard system (entrypoint check, hook profile check, skip env var, subagent detection, and path exclusions) that prevented the observer from observing its own sessions has been completely removed without being relocated. This allows automated/subagent/observer sessions to write observations and trigger lazy-start, potentially causing the observer to ingest its own activity and pollute project-scoped metadata.</comment>

<file context>
@@ -14,7 +14,7 @@ set -e
 # Hook phase from CLI argument: "pre" (PreToolUse) or "post" (PostToolUse)
 HOOK_PHASE="${1:-post}"
-
+echo "From observe.sh: Hook fired $(date)" >> ~/claude-hooks.log
 # ─────────────────────────────────────────────
 # Read stdin first (before project detection)
</file context>
Fix with Cubic

Comment thread skills/continuous-learning-v2/hooks/observe.sh Outdated
Comment thread skills/continuous-learning-v2/hooks/observe.sh Outdated
# Hook phase from CLI argument: "pre" (PreToolUse) or "post" (PostToolUse)
HOOK_PHASE="${1:-post}"

echo "From observe.sh: Hook fired $(date)" >> ~/claude-hooks.log

@cubic-dev-ai cubic-dev-ai Bot Mar 16, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Add error handling (2>/dev/null || true) to debug logging statements to prevent hook termination when ~/claude-hooks.log cannot be written to. Currently unguarded redirects with set -e may cause immediate exit if HOME is unavailable, read-only, or disk is full, breaking the observation hook's primary functionality.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At skills/continuous-learning-v2/hooks/observe.sh, line 17:

<comment>Add error handling (`2>/dev/null || true`) to debug logging statements to prevent hook termination when `~/claude-hooks.log` cannot be written to. Currently unguarded redirects with `set -e` may cause immediate exit if HOME is unavailable, read-only, or disk is full, breaking the observation hook's primary functionality.</comment>

<file context>
@@ -14,7 +14,7 @@ set -e
 # Hook phase from CLI argument: "pre" (PreToolUse) or "post" (PostToolUse)
 HOOK_PHASE="${1:-post}"
-
+echo "From observe.sh: Hook fired $(date)" >> ~/claude-hooks.log
 # ─────────────────────────────────────────────
 # Read stdin first (before project detection)
</file context>
Suggested change
echo "From observe.sh: Hook fired $(date)" >> ~/claude-hooks.log
echo "From observe.sh: Hook fired $(date)" >> ~/claude-hooks.log 2>/dev/null || true
Fix with Cubic

# Hook phase from CLI argument: "pre" (PreToolUse) or "post" (PostToolUse)
HOOK_PHASE="${1:-post}"

echo "From observe.sh: Hook fired $(date)" >> ~/claude-hooks.log

@cubic-dev-ai cubic-dev-ai Bot Mar 16, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: This unconditional hook logging appends to ~/claude-hooks.log on every invocation and can lead to unbounded log growth; guard these writes behind a debug flag or remove them for normal runs.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At skills/continuous-learning-v2/hooks/observe.sh, line 17:

<comment>This unconditional hook logging appends to `~/claude-hooks.log` on every invocation and can lead to unbounded log growth; guard these writes behind a debug flag or remove them for normal runs.</comment>

<file context>
@@ -14,7 +14,7 @@ set -e
 # Hook phase from CLI argument: "pre" (PreToolUse) or "post" (PostToolUse)
 HOOK_PHASE="${1:-post}"
-
+echo "From observe.sh: Hook fired $(date)" >> ~/claude-hooks.log
 # ─────────────────────────────────────────────
 # Read stdin first (before project detection)
</file context>
Fix with Cubic

Comment thread skills/continuous-learning-v2/hooks/observe.sh Outdated
@albertlieyingadrian albertlieyingadrian force-pushed the feat/continuous-learning-v2-debug-lazy-start branch from d9c3391 to c53bba9 Compare March 16, 2026 02:58
Auto-starts observer when observer.enabled: true in config and no .observer.pid exists.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@albertlieyingadrian albertlieyingadrian changed the title feat(continuous-learning-v2): add debug logging, lazy-start observer, and session guards feat(continuous-learning-v2): add lazy-start observer logic Mar 16, 2026
Comment thread skills/continuous-learning-v2/hooks/observe.sh Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
skills/continuous-learning-v2/hooks/observe.sh (1)

279-284: Config file existence check doesn't cover CLV2_CONFIG override.

Line 279 verifies $CONFIG_FILE (${SKILL_ROOT}/config.json) exists, but line 282 reads from CLV2_CONFIG if set—which might be a different file that was never validated. Also, open() without a context manager is inconsistent with the pattern used in start-observer.sh:79-92.

Suggested fix
   CONFIG_FILE="${SKILL_ROOT}/config.json"
-  if [ -f "$CONFIG_FILE" ] && [ -n "$PYTHON_CMD" ]; then
+  EFFECTIVE_CONFIG="${CLV2_CONFIG:-$CONFIG_FILE}"
+  if [ -f "$EFFECTIVE_CONFIG" ] && [ -n "$PYTHON_CMD" ]; then
     _enabled=$("$PYTHON_CMD" -c "
 import json, os
-cfg = json.load(open(os.environ.get('CLV2_CONFIG', '$CONFIG_FILE')))
+with open('$EFFECTIVE_CONFIG') as f:
+    cfg = json.load(f)
 print(str(cfg.get('observer', {}).get('enabled', False)).lower())
 " 2>/dev/null || echo "false")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@skills/continuous-learning-v2/hooks/observe.sh` around lines 279 - 284, The
existence check currently only verifies CONFIG_FILE but the Python snippet reads
from CLV2_CONFIG when set, risking reading an unvalidated file; update the logic
that sets _enabled to first determine the actual config path (use
${CLV2_CONFIG:-$CONFIG_FILE}) and verify that file exists before invoking
PYTHON_CMD, and modify the Python one-liner to open the file using a context
manager (with open(... ) as f) and load JSON from that file; keep the existing
fallback to "false" and references to PYTHON_CMD and _enabled unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@skills/continuous-learning-v2/hooks/observe.sh`:
- Around line 279-284: The existence check currently only verifies CONFIG_FILE
but the Python snippet reads from CLV2_CONFIG when set, risking reading an
unvalidated file; update the logic that sets _enabled to first determine the
actual config path (use ${CLV2_CONFIG:-$CONFIG_FILE}) and verify that file
exists before invoking PYTHON_CMD, and modify the Python one-liner to open the
file using a context manager (with open(... ) as f) and load JSON from that
file; keep the existing fallback to "false" and references to PYTHON_CMD and
_enabled unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 66a48808-7b54-491b-bab2-1522251571ba

📥 Commits

Reviewing files that changed from the base of the PR and between d9c3391 and c1f6a03.

📒 Files selected for processing (1)
  • skills/continuous-learning-v2/hooks/observe.sh

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 1 file (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="skills/continuous-learning-v2/hooks/observe.sh">

<violation number="1" location="skills/continuous-learning-v2/hooks/observe.sh:275">
P2: Guard the lazy-start check/start with a lock (e.g., flock) so concurrent hook invocations can’t all observe a missing PID and launch duplicate observers.</violation>

<violation number="2" location="skills/continuous-learning-v2/hooks/observe.sh:279">
P2: CLV2_CONFIG environment variable is unable to override the config path when the default config.json doesn't exist. The file existence check uses CONFIG_FILE (always the default) instead of checking the actual config path that will be used by the Python code.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment thread skills/continuous-learning-v2/hooks/observe.sh Outdated
Comment thread skills/continuous-learning-v2/hooks/observe.sh Outdated
- Use flock for atomic check-then-act to prevent race conditions
- Check both project-scoped AND global PID files before starting
- Support CLV2_CONFIG override for config file path
- Check disabled file in lazy-start logic
- Use double-check pattern after acquiring lock

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comment thread skills/continuous-learning-v2/hooks/observe.sh Outdated
Comment thread skills/continuous-learning-v2/hooks/observe.sh Outdated
Comment thread skills/continuous-learning-v2/hooks/observe.sh Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
skills/continuous-learning-v2/hooks/observe.sh (1)

298-307: ⚠️ Potential issue | 🟡 Minor

Race condition is fixed, but stale PID files still prevent observer restart.

The flock implementation properly addresses the concurrency race. However, lines 299 and 303 only check for PID file existence, not whether the process is actually running. If an observer crashes without cleaning up its PID file, lazy-start will be skipped forever.

Consider adding stale PID cleanup inside the critical section:

🛡️ Proposed fix to clean stale PIDs before starting
 if [ "$OBSERVER_ENABLED" = "true" ] && [ ! -f "${PROJECT_DIR}/.observer.pid" ] && [ ! -f "${CONFIG_DIR}/.observer.pid" ]; then
   (
     flock -n 9 || exit 0
     # Double-check PID files after acquiring lock
-    if [ ! -f "${PROJECT_DIR}/.observer.pid" ] && [ ! -f "${CONFIG_DIR}/.observer.pid" ]; then
+    # Clean up stale PID files first
+    for _pid_file in "${PROJECT_DIR}/.observer.pid" "${CONFIG_DIR}/.observer.pid"; do
+      if [ -f "$_pid_file" ]; then
+        _existing_pid="$(cat "$_pid_file" 2>/dev/null || true)"
+        if [ -n "$_existing_pid" ] && ! kill -0 "$_existing_pid" 2>/dev/null; then
+          rm -f "$_pid_file"
+        fi
+      fi
+    done
+    # Now check if we should start
+    if [ ! -f "${PROJECT_DIR}/.observer.pid" ] && [ ! -f "${CONFIG_DIR}/.observer.pid" ]; then
       "${SKILL_ROOT}/agents/start-observer.sh" start 2>/dev/null || true
     fi
   ) 9>"$LAZY_START_LOCK"
 fi

Note: The initial check at line 299 must remain as-is (checking file existence) since it determines whether to attempt lock acquisition at all. The stale cleanup happens only after acquiring the lock.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@skills/continuous-learning-v2/hooks/observe.sh` around lines 298 - 307, The
current lazy-start uses flock but only checks for PID file existence; inside the
critical section (the block using 9>"$LAZY_START_LOCK") add logic to detect and
remove stale PID files before launching the observer: after acquiring the lock
and before calling "${SKILL_ROOT}/agents/start-observer.sh", read PID(s) from
"${PROJECT_DIR}/.observer.pid" and "${CONFIG_DIR}/.observer.pid" (if present)
and use a safe liveness check (e.g., kill -0 $PID) to determine if the PID is
dead; remove the stale PID file(s) if the process is not running, then re-check
existence and only call start-observer.sh if no live PID remains—leave the outer
pre-check that gates lock acquisition unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@skills/continuous-learning-v2/hooks/observe.sh`:
- Around line 298-307: The current lazy-start uses flock but only checks for PID
file existence; inside the critical section (the block using
9>"$LAZY_START_LOCK") add logic to detect and remove stale PID files before
launching the observer: after acquiring the lock and before calling
"${SKILL_ROOT}/agents/start-observer.sh", read PID(s) from
"${PROJECT_DIR}/.observer.pid" and "${CONFIG_DIR}/.observer.pid" (if present)
and use a safe liveness check (e.g., kill -0 $PID) to determine if the PID is
dead; remove the stale PID file(s) if the process is not running, then re-check
existence and only call start-observer.sh if no live PID remains—leave the outer
pre-check that gates lock acquisition unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8a6bea15-67c0-4054-8a56-e963c07e8619

📥 Commits

Reviewing files that changed from the base of the PR and between c1f6a03 and ccadd88.

📒 Files selected for processing (1)
  • skills/continuous-learning-v2/hooks/observe.sh

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 1 file (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="skills/continuous-learning-v2/hooks/observe.sh">

<violation number="1" location="skills/continuous-learning-v2/hooks/observe.sh:299">
P1: Checking only PID file existence can block recovery after crashes. Validate that the PID in each file is actually alive (for example with `kill -0`) before deciding to skip lazy-start.</violation>

<violation number="2" location="skills/continuous-learning-v2/hooks/observe.sh:301">
P1: Lazy-start observer logic depends on `flock` which is not available on macOS/BSD by default, causing the auto-start feature to silently fail on those platforms. Consider adding a `command -v flock` guard with a fallback to the previous non-atomic check behavior.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment thread skills/continuous-learning-v2/hooks/observe.sh Outdated
Comment thread skills/continuous-learning-v2/hooks/observe.sh Outdated
- Add stale PID cleanup via _CHECK_OBSERVER_RUNNING function
- Add macOS fallback using lockfile when flock unavailable
- Fix CLV2_CONFIG override: use EFFECTIVE_CONFIG for both check and read
- Use proper Python context manager (with open() as f)
- Deduplicate signaled PIDs to avoid duplicate USR1 signals

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comment thread skills/continuous-learning-v2/hooks/observe.sh
Comment thread skills/continuous-learning-v2/hooks/observe.sh Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
skills/continuous-learning-v2/hooks/observe.sh (2)

292-303: ⚠️ Potential issue | 🟠 Major

disabled sentinel check still diverges from CLV2_CONFIG path resolution.

In this block, EFFECTIVE_CONFIG is used for enabled-state reading, but the disabled check is still tied to ${CONFIG_DIR}/disabled (hardcoded base dir). If CLV2_CONFIG points elsewhere, disable behavior can still diverge from config behavior.

Proposed fix
-if [ -f "${CONFIG_DIR}/disabled" ]; then
+EFFECTIVE_CONFIG="${CLV2_CONFIG:-${SKILL_ROOT}/config.json}"
+EFFECTIVE_CONFIG_DIR="$(dirname "$EFFECTIVE_CONFIG")"
+if [ -f "${EFFECTIVE_CONFIG_DIR}/disabled" ]; then
   OBSERVER_ENABLED=false
 else
   OBSERVER_ENABLED=false
-  CONFIG_FILE="${SKILL_ROOT}/config.json"
-  # Allow CLV2_CONFIG override
-  if [ -n "${CLV2_CONFIG:-}" ]; then
-    CONFIG_FILE="$CLV2_CONFIG"
-  fi
-  # Use effective config path for both existence check and reading
-  EFFECTIVE_CONFIG="$CONFIG_FILE"
+  # Use effective config path for both existence check and reading
   if [ -f "$EFFECTIVE_CONFIG" ] && [ -n "$PYTHON_CMD" ]; then
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@skills/continuous-learning-v2/hooks/observe.sh` around lines 292 - 303, The
current disabled-sentinel check uses ${CONFIG_DIR}/disabled while the script
resolves CONFIG_FILE/EFFECTIVE_CONFIG (including CLV2_CONFIG override) later,
causing divergence; move the EFFECTIVE_CONFIG/CONFIG_FILE resolution above the
disabled check and then test for a sentinel in the same directory as
EFFECTIVE_CONFIG (e.g., check "$(dirname "$EFFECTIVE_CONFIG")/disabled"), and
set OBSERVER_ENABLED=false if that sentinel exists; update references to
OBSERVER_ENABLED, CONFIG_FILE, CLV2_CONFIG, and EFFECTIVE_CONFIG accordingly so
the disable behavior follows the resolved config path.

331-333: ⚠️ Potential issue | 🟠 Major

Lock is released before observer startup is confirmed, so duplicate starts remain possible.

Line 332 and Line 342 run start-observer.sh in background inside the lock. The lock can be released before .observer.pid is created, allowing a concurrent hook to start another observer.

Proposed fix
-          nohup "${SKILL_ROOT}/agents/start-observer.sh" start >/dev/null 2>&1 &
+          "${SKILL_ROOT}/agents/start-observer.sh" start >/dev/null 2>&1 || true
@@
-          nohup "${SKILL_ROOT}/agents/start-observer.sh" start >/dev/null 2>&1 &
+          "${SKILL_ROOT}/agents/start-observer.sh" start >/dev/null 2>&1 || true

Also applies to: 341-343

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@skills/continuous-learning-v2/hooks/observe.sh` around lines 331 - 333, The
lock in observe.sh is released before the observer process startup is confirmed,
allowing a race that can spawn duplicate observers; modify the block that
currently runs nohup "${SKILL_ROOT}/agents/start-observer.sh" start >/dev/null
2>&1 & so that either (a) start-observer.sh is invoked synchronously while
holding the lock until it creates .observer.pid, or (b) after launching the
background process you poll for the creation of ${PROJECT_DIR}/.observer.pid and
${CONFIG_DIR}/.observer.pid with a short timeout before releasing the lock;
ensure the lock is only released once the pid file exists and reference the
exact invocation of start-observer.sh and the .observer.pid files to locate and
update the code path.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@skills/continuous-learning-v2/hooks/observe.sh`:
- Around line 278-289: The _CHECK_OBSERVER_RUNNING function currently trusts PID
file contents; validate the PID before any kill call by ensuring the read value
is a non-empty, all-digits, positive integer (e.g. use a shell regex like [[
"$pid" =~ ^[0-9]+$ ]] and check pid -ne 0) and only call kill -0 when that check
passes; if the PID is missing or invalid (non-numeric or zero) treat it as a
stale PID file, remove the file (rm -f "$pid_file") and return 1. Also apply the
same numeric validation wherever the code reads PID files around the observe
start/stop logic (the other PID-read sites referenced) to avoid passing
malformed values like -1 or abc to kill.

---

Duplicate comments:
In `@skills/continuous-learning-v2/hooks/observe.sh`:
- Around line 292-303: The current disabled-sentinel check uses
${CONFIG_DIR}/disabled while the script resolves CONFIG_FILE/EFFECTIVE_CONFIG
(including CLV2_CONFIG override) later, causing divergence; move the
EFFECTIVE_CONFIG/CONFIG_FILE resolution above the disabled check and then test
for a sentinel in the same directory as EFFECTIVE_CONFIG (e.g., check "$(dirname
"$EFFECTIVE_CONFIG")/disabled"), and set OBSERVER_ENABLED=false if that sentinel
exists; update references to OBSERVER_ENABLED, CONFIG_FILE, CLV2_CONFIG, and
EFFECTIVE_CONFIG accordingly so the disable behavior follows the resolved config
path.
- Around line 331-333: The lock in observe.sh is released before the observer
process startup is confirmed, allowing a race that can spawn duplicate
observers; modify the block that currently runs nohup
"${SKILL_ROOT}/agents/start-observer.sh" start >/dev/null 2>&1 & so that either
(a) start-observer.sh is invoked synchronously while holding the lock until it
creates .observer.pid, or (b) after launching the background process you poll
for the creation of ${PROJECT_DIR}/.observer.pid and ${CONFIG_DIR}/.observer.pid
with a short timeout before releasing the lock; ensure the lock is only released
once the pid file exists and reference the exact invocation of start-observer.sh
and the .observer.pid files to locate and update the code path.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ba712604-867b-48b2-80a5-3786da5da5ce

📥 Commits

Reviewing files that changed from the base of the PR and between ccadd88 and e2be804.

📒 Files selected for processing (1)
  • skills/continuous-learning-v2/hooks/observe.sh

Comment thread skills/continuous-learning-v2/hooks/observe.sh

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 issues found across 1 file (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="skills/continuous-learning-v2/hooks/observe.sh">

<violation number="1" location="skills/continuous-learning-v2/hooks/observe.sh:332">
P1:</violation>

<violation number="2" location="skills/continuous-learning-v2/hooks/observe.sh:338">
P2: Avoid `exit 0` in the parent shell here; on lock contention it terminates the whole hook and skips the observer signal pass for this invocation.</violation>

<violation number="3" location="skills/continuous-learning-v2/hooks/observe.sh:338">
P1: Harden the macOS `lockfile` path against stale locks. If the script exits between acquisition and cleanup, `.observer-start.lock` can be left behind and prevent future lazy-start attempts.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment thread skills/continuous-learning-v2/hooks/observe.sh
Comment thread skills/continuous-learning-v2/hooks/observe.sh Outdated
Comment thread skills/continuous-learning-v2/hooks/observe.sh Outdated
- Wrap lockfile block in subshell so exit 0 only terminates that block
- Add trap for EXIT to clean up lock file on script interruption
- Add -l 30 (30 second expiry) to prevent permanent lock file stuck

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 1 file (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="skills/continuous-learning-v2/hooks/observe.sh">

<violation number="1" location="skills/continuous-learning-v2/hooks/observe.sh:340">
P0: Race condition: The EXIT trap is set before attempting to acquire the lockfile, so if lockfile fails (because another process holds the lock), the trap will delete another process's active lock when the subshell exits. This breaks the mutual-exclusion guarantee and can cause duplicate observer starts.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

if command -v lockfile >/dev/null 2>&1; then
# Use subshell to isolate exit and add trap for cleanup
(
trap 'rm -f "$LAZY_START_LOCK" 2>/dev/null || true' EXIT

@cubic-dev-ai cubic-dev-ai Bot Mar 16, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P0: Race condition: The EXIT trap is set before attempting to acquire the lockfile, so if lockfile fails (because another process holds the lock), the trap will delete another process's active lock when the subshell exits. This breaks the mutual-exclusion guarantee and can cause duplicate observer starts.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At skills/continuous-learning-v2/hooks/observe.sh, line 340:

<comment>Race condition: The EXIT trap is set before attempting to acquire the lockfile, so if lockfile fails (because another process holds the lock), the trap will delete another process's active lock when the subshell exits. This breaks the mutual-exclusion guarantee and can cause duplicate observer starts.</comment>

<file context>
@@ -335,13 +335,17 @@ if [ "$OBSERVER_ENABLED" = "true" ]; then
-        rm -f "$LAZY_START_LOCK" 2>/dev/null || true
+        # Use subshell to isolate exit and add trap for cleanup
+        (
+          trap 'rm -f "$LAZY_START_LOCK" 2>/dev/null || true' EXIT
+          lockfile -r 1 -l 30 "$LAZY_START_LOCK" 2>/dev/null || exit 0
+          _CHECK_OBSERVER_RUNNING "${PROJECT_DIR}/.observer.pid" || true
</file context>
Fix with Cubic

Comment on lines +326 to +334
(
flock -n 9 || exit 0
# Double-check PID files after acquiring lock
_CHECK_OBSERVER_RUNNING "${PROJECT_DIR}/.observer.pid" || true
_CHECK_OBSERVER_RUNNING "${CONFIG_DIR}/.observer.pid" || true
if [ ! -f "${PROJECT_DIR}/.observer.pid" ] && [ ! -f "${CONFIG_DIR}/.observer.pid" ]; then
nohup "${SKILL_ROOT}/agents/start-observer.sh" start >/dev/null 2>&1 &
fi
) 9>"$LAZY_START_LOCK"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flock released before PID file is written — observer window race remains

The flock subshell exits (releasing the lock on line 334) immediately after backgrounding start-observer.sh with nohup ... &. At that point, start-observer.sh is still initializing — it launches observer-loop.sh in the background and only then sleep 2s before checking for the PID file. The PID file is written by observer-loop.sh asynchronously.

This creates a window of potentially 1–2+ seconds where:

  1. The flock is released (flock -n 9 is gone)
  2. No PID file exists yet
  3. The next hook invocation acquires the lock, passes the double-check inside the subshell, and launches a second start-observer.sh start

Because start-observer.sh only guards against duplicates by checking $PID_FILE (line 175 of start-observer.sh) — and the PID file doesn't yet exist — each observer instance in this window starts a new observer-loop.sh subprocess.

A robust fix is to write a "starting" sentinel inside the lock before backgrounding, and check it alongside the PID files:

LAZY_STARTING_SENTINEL="${PROJECT_DIR}/.observer-starting"
...
(
  flock -n 9 || exit 0
  _CHECK_OBSERVER_RUNNING "${PROJECT_DIR}/.observer.pid" || true
  _CHECK_OBSERVER_RUNNING "${CONFIG_DIR}/.observer.pid" || true
  if [ ! -f "${PROJECT_DIR}/.observer.pid" ] && \
     [ ! -f "${CONFIG_DIR}/.observer.pid" ] && \
     [ ! -f "$LAZY_STARTING_SENTINEL" ]; then
    touch "$LAZY_STARTING_SENTINEL"
    nohup bash -c '"'"'
      "${SKILL_ROOT}/agents/start-observer.sh" start >/dev/null 2>&1
      rm -f "$LAZY_STARTING_SENTINEL"
    '"'"' >/dev/null 2>&1 &
  fi
) 9>"$LAZY_START_LOCK"

The sentinel is checked (inside the lock) by all concurrent invocations and prevents a second launch; once start-observer.sh finishes and the PID file is stable the sentinel is removed.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (3)
skills/continuous-learning-v2/hooks/observe.sh (3)

292-303: ⚠️ Potential issue | 🟠 Major

Propagate CLV2_CONFIG consistently across disable checks, PID lookups, and startup.

Lines 296-302 only switch the observer.enabled read. CONFIG_DIR still stays on ${HOME}/.claude/homunculus, so the disabled sentinel and global PID checks keep using the default directory, and skills/continuous-learning-v2/agents/start-observer.sh still resolves its own default config file. With a custom CLV2_CONFIG, lazy-start can read one config but signal/start a different observer.

♻️ Suggested direction
-  CONFIG_FILE="${SKILL_ROOT}/config.json"
-  # Allow CLV2_CONFIG override
-  if [ -n "${CLV2_CONFIG:-}" ]; then
-    CONFIG_FILE="$CLV2_CONFIG"
-  fi
-  # Use effective config path for both existence check and reading
-  EFFECTIVE_CONFIG="$CONFIG_FILE"
+  DEFAULT_CONFIG_FILE="${SKILL_ROOT}/config.json"
+  EFFECTIVE_CONFIG="${CLV2_CONFIG:-$DEFAULT_CONFIG_FILE}"
+  CONFIG_DIR="$(dirname "$EFFECTIVE_CONFIG")"

Apply the same effective config resolution to the earlier disabled guard and to skills/continuous-learning-v2/agents/start-observer.sh.

Also applies to: 316-323, 332-345

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@skills/continuous-learning-v2/hooks/observe.sh` around lines 292 - 303, The
script computes EFFECTIVE_CONFIG too late so checks for the disabled sentinel
and PID/start logic still use the default CONFIG_DIR and CONFIG_FILE; move or
duplicate the CLV2_CONFIG override and EFFECTIVE_CONFIG resolution earlier and
use EFFECTIVE_CONFIG (not CONFIG_DIR or the original CONFIG_FILE) for the
disabled file check, PID lookups, OBSERVER_ENABLED decisions, and when invoking
skills/continuous-learning-v2/agents/start-observer.sh; also update
start-observer.sh to accept/resolve CLV2_CONFIG or an explicit config path so
the started observer uses the same EFFECTIVE_CONFIG value.

331-345: ⚠️ Potential issue | 🟠 Major

Keep the startup lock until the observer PID is visible.

Line 332 and Line 345 background start-observer.sh, but skills/continuous-learning-v2/agents/start-observer.sh:195-211 already daemonizes and returns before skills/continuous-learning-v2/agents/observer-loop.sh:137-138 writes the PID file. That leaves a window where the lock is gone and the next hook can start a second observer.

🔒 Suggested fix
-          nohup "${SKILL_ROOT}/agents/start-observer.sh" start >/dev/null 2>&1 &
+          "${SKILL_ROOT}/agents/start-observer.sh" start >/dev/null 2>&1 || true
+          for _ in 1 2 3 4 5; do
+            _CHECK_OBSERVER_RUNNING "${PROJECT_DIR}/.observer.pid" && break
+            _CHECK_OBSERVER_RUNNING "${CONFIG_DIR}/.observer.pid" && break
+            sleep 1
+          done

Apply the same change in the macOS fallback branch.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@skills/continuous-learning-v2/hooks/observe.sh` around lines 331 - 345, The
macOS fallback branch starts start-observer.sh in background and releases
LAZY_START_LOCK too early; modify the lockfile branch (inside the subshell where
lockfile -r 1 -l 30 "$LAZY_START_LOCK" is used) to mirror the Linux branch
behavior: after launching nohup "${SKILL_ROOT}/agents/start-observer.sh" start
>/dev/null 2>&1 &, do a short polling loop (with a reasonable timeout) that
checks for either "${PROJECT_DIR}/.observer.pid" or
"${CONFIG_DIR}/.observer.pid" (reuse _CHECK_OBSERVER_RUNNING or plain -f checks)
and only allow the subshell to exit (and thus remove the LAZY_START_LOCK via the
existing EXIT trap) once one of those PID files is present or the timeout
elapses; keep the existing trap and ensure the lock is still held during the
wait so a second observer cannot be started.

278-289: ⚠️ Potential issue | 🔴 Critical

Reject unreadable or malformed PID files before calling kill.

Line 282 and Line 358 can fail under set -e if another hook removes the file after the -f check, and the current [ -n "$pid" ] gate still lets values like -1, 0, or abc reach kill. Treat unreadable or non-numeric PIDs as stale in both paths.

🛡️ Suggested fix
 _CHECK_OBSERVER_RUNNING() {
   local pid_file="$1"
   if [ -f "$pid_file" ]; then
     local pid
-    pid=$(cat "$pid_file" 2>/dev/null)
-    if [ -n "$pid" ] && kill -0 "$pid" 2>/dev/null; then
+    pid=$(cat "$pid_file" 2>/dev/null || true)
+    case "$pid" in
+      ''|*[!0-9]*|0|1)
+        rm -f "$pid_file" 2>/dev/null || true
+        return 1
+        ;;
+    esac
+    if kill -0 "$pid" 2>/dev/null; then
       return 0  # Process is alive
     fi
     # Stale PID file - remove it
     rm -f "$pid_file" 2>/dev/null || true
   fi
@@
 for pid_file in "${PROJECT_DIR}/.observer.pid" "${CONFIG_DIR}/.observer.pid"; do
   if [ -f "$pid_file" ]; then
-    observer_pid=$(cat "$pid_file")
+    observer_pid=$(cat "$pid_file" 2>/dev/null || true)
+    case "$observer_pid" in
+      ''|*[!0-9]*|0|1)
+        rm -f "$pid_file" 2>/dev/null || true
+        continue
+        ;;
+    esac
     # Deduplicate: skip if already signaled this pass
     case "$signaled_pids" in
       *" $observer_pid "*) continue ;;
     esac

Also applies to: 356-365

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@skills/continuous-learning-v2/hooks/observe.sh` around lines 278 - 289, The
PID handling in _CHECK_OBSERVER_RUNNING allows unreadable or malformed values to
reach kill and can fail under set -e if the file is removed between the -f check
and reading; replace the current cat/kill flow with a safe read and numeric
validation: use read -r pid < "$pid_file" 2>/dev/null (or a subshell read) and
if the read fails treat the PID file as stale and rm -f it, then validate pid
with a regex like ^[0-9]+$ and ensure pid is greater than 0 before calling kill
-0; if validation fails remove the pid file and return 1. Apply the same robust
read+numeric check logic to the other similar block referenced (the logic around
lines handling PID file at the other path / function).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@skills/continuous-learning-v2/hooks/observe.sh`:
- Around line 292-303: The script computes EFFECTIVE_CONFIG too late so checks
for the disabled sentinel and PID/start logic still use the default CONFIG_DIR
and CONFIG_FILE; move or duplicate the CLV2_CONFIG override and EFFECTIVE_CONFIG
resolution earlier and use EFFECTIVE_CONFIG (not CONFIG_DIR or the original
CONFIG_FILE) for the disabled file check, PID lookups, OBSERVER_ENABLED
decisions, and when invoking
skills/continuous-learning-v2/agents/start-observer.sh; also update
start-observer.sh to accept/resolve CLV2_CONFIG or an explicit config path so
the started observer uses the same EFFECTIVE_CONFIG value.
- Around line 331-345: The macOS fallback branch starts start-observer.sh in
background and releases LAZY_START_LOCK too early; modify the lockfile branch
(inside the subshell where lockfile -r 1 -l 30 "$LAZY_START_LOCK" is used) to
mirror the Linux branch behavior: after launching nohup
"${SKILL_ROOT}/agents/start-observer.sh" start >/dev/null 2>&1 &, do a short
polling loop (with a reasonable timeout) that checks for either
"${PROJECT_DIR}/.observer.pid" or "${CONFIG_DIR}/.observer.pid" (reuse
_CHECK_OBSERVER_RUNNING or plain -f checks) and only allow the subshell to exit
(and thus remove the LAZY_START_LOCK via the existing EXIT trap) once one of
those PID files is present or the timeout elapses; keep the existing trap and
ensure the lock is still held during the wait so a second observer cannot be
started.
- Around line 278-289: The PID handling in _CHECK_OBSERVER_RUNNING allows
unreadable or malformed values to reach kill and can fail under set -e if the
file is removed between the -f check and reading; replace the current cat/kill
flow with a safe read and numeric validation: use read -r pid < "$pid_file"
2>/dev/null (or a subshell read) and if the read fails treat the PID file as
stale and rm -f it, then validate pid with a regex like ^[0-9]+$ and ensure pid
is greater than 0 before calling kill -0; if validation fails remove the pid
file and return 1. Apply the same robust read+numeric check logic to the other
similar block referenced (the logic around lines handling PID file at the other
path / function).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2b42327f-5973-4aef-8944-1f492c95c6e5

📥 Commits

Reviewing files that changed from the base of the PR and between e2be804 and bced23c.

📒 Files selected for processing (1)
  • skills/continuous-learning-v2/hooks/observe.sh

- Validate PID is a positive integer before kill calls to prevent
  signaling invalid targets (e.g. -1 could signal all processes)
- Pass config path via env var instead of interpolating shell variable
  into Python -c string to prevent injection/breakage on special paths
- Check CLV2_CONFIG-derived directory for disabled file so disable
  guard respects the same config source as lazy-start

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
_CHECK_OBSERVER_RUNNING "${PROJECT_DIR}/.observer.pid" || true
_CHECK_OBSERVER_RUNNING "${CONFIG_DIR}/.observer.pid" || true
if [ ! -f "${PROJECT_DIR}/.observer.pid" ] && [ ! -f "${CONFIG_DIR}/.observer.pid" ]; then
nohup "${SKILL_ROOT}/agents/start-observer.sh" start >/dev/null 2>&1 &

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLV2_CONFIG not forwarded to start-observer.sh

When a custom config is active via CLV2_CONFIG, observe.sh correctly reads observer.enabled: true from it and sets OBSERVER_ENABLED=true. However, when it calls start-observer.sh start (line 342), it does NOT forward CLV2_CONFIG in any effective way — start-observer.sh hardcodes CONFIG_FILE="${SKILL_ROOT}/config.json" (line 39 of start-observer.sh) and never inspects CLV2_CONFIG when choosing which config file to load.

If the default config.json has observer.enabled: false (or is missing), start-observer.sh will reach its guard on line 168:

if [ "$OBSERVER_ENABLED" != "true" ]; then
  echo "Observer is disabled in config.json..."
  exit 1
fi

and exit with code 1 — but because stdout and stderr are both redirected to /dev/null, this failure is completely silent. The observer never starts, and there is no indication of why.

The fix requires either passing the config path explicitly to start-observer.sh, or making start-observer.sh aware of CLV2_CONFIG. A minimal approach:

nohup env CLV2_CONFIG="${EFFECTIVE_CONFIG}" "${SKILL_ROOT}/agents/start-observer.sh" start >/dev/null 2>&1 &

The same fix is needed on line 355 in the lockfile path.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 1 file (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="skills/continuous-learning-v2/hooks/observe.sh">

<violation number="1" location="skills/continuous-learning-v2/hooks/observe.sh:293">
P1: PID reuse vulnerability: The observer lifecycle only validates PID format, not process identity. A stale PID file combined with PID reuse can cause SIGUSR1 to be sent to unrelated processes, potentially killing them and leaving the real observer unstarted.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

return 1
;;
esac
if kill -0 "$pid" 2>/dev/null; then

@cubic-dev-ai cubic-dev-ai Bot Mar 16, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: PID reuse vulnerability: The observer lifecycle only validates PID format, not process identity. A stale PID file combined with PID reuse can cause SIGUSR1 to be sent to unrelated processes, potentially killing them and leaving the real observer unstarted.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At skills/continuous-learning-v2/hooks/observe.sh, line 293:

<comment>PID reuse vulnerability: The observer lifecycle only validates PID format, not process identity. A stale PID file combined with PID reuse can cause SIGUSR1 to be sent to unrelated processes, potentially killing them and leaving the real observer unstarted.</comment>

<file context>
@@ -280,7 +283,14 @@ _CHECK_OBSERVER_RUNNING() {
+        return 1
+        ;;
+    esac
+    if kill -0 "$pid" 2>/dev/null; then
       return 0  # Process is alive
     fi
</file context>
Fix with Cubic

@albertlieyingadrian albertlieyingadrian changed the title feat(continuous-learning-v2): add lazy-start observer logic fix(continuous-learning-v2): add lazy-start observer logic Mar 16, 2026
@affaan-m affaan-m merged commit b57b573 into affaan-m:main Mar 16, 2026
4 checks passed
@albertlieyingadrian albertlieyingadrian deleted the feat/continuous-learning-v2-debug-lazy-start branch March 19, 2026 18:22
jacky99714 added a commit to jacky99714/everything-claude-code that referenced this pull request Mar 20, 2026
Merge upstream/main which includes:
- fix: observer memory explosion with throttling and tail sampling (affaan-m#536)
- fix: observer hooks hardening, secret scrubbing (affaan-m#348)
- fix: lazy-start observer logic (affaan-m#508)
- fix: read tool_response field in observe.sh (affaan-m#377)
- feat: add C++, Java, Rust, PyTorch language support
- feat: SQLite state store, skill evolution, session adapters
- feat: orchestration harness and selective install
- feat: DevFleet multi-agent orchestration skill

Conflict resolution:
- observe.sh: kept our hook_event_name + tool_response fixes,
  merged upstream's $PYTHON_CMD, secret scrubbing, lazy-start,
  throttled signaling, and session guards
- package.json: accepted upstream's new scripts and dependencies
FrancescoRosciano pushed a commit to FRosciano-Mambo/everything-claude-code that referenced this pull request Jun 1, 2026
)

* feat(continuous-learning-v2): add lazy-start observer logic

Auto-starts observer when observer.enabled: true in config and no .observer.pid exists.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(continuous-learning-v2): address PR review concerns

- Use flock for atomic check-then-act to prevent race conditions
- Check both project-scoped AND global PID files before starting
- Support CLV2_CONFIG override for config file path
- Check disabled file in lazy-start logic
- Use double-check pattern after acquiring lock

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(observe.sh): address PR review comments

- Add stale PID cleanup via _CHECK_OBSERVER_RUNNING function
- Add macOS fallback using lockfile when flock unavailable
- Fix CLV2_CONFIG override: use EFFECTIVE_CONFIG for both check and read
- Use proper Python context manager (with open() as f)
- Deduplicate signaled PIDs to avoid duplicate USR1 signals

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(observe.sh): wrap macOS lockfile fallback in subshell with trap

- Wrap lockfile block in subshell so exit 0 only terminates that block
- Add trap for EXIT to clean up lock file on script interruption
- Add -l 30 (30 second expiry) to prevent permanent lock file stuck

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(observe.sh): address remaining PR review comments

- Validate PID is a positive integer before kill calls to prevent
  signaling invalid targets (e.g. -1 could signal all processes)
- Pass config path via env var instead of interpolating shell variable
  into Python -c string to prevent injection/breakage on special paths
- Check CLV2_CONFIG-derived directory for disabled file so disable
  guard respects the same config source as lazy-start

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants