Skip to content

feat(tools): add tool reliability layer - CircuitBreaker, IdempotencyGuard, RetryPolicy#132

Merged
dgarson merged 1 commit intofeat/horizon-ui-completefrom
stephan/tool-reliability-layer
Feb 24, 2026
Merged

feat(tools): add tool reliability layer - CircuitBreaker, IdempotencyGuard, RetryPolicy#132
dgarson merged 1 commit intofeat/horizon-ui-completefrom
stephan/tool-reliability-layer

Conversation

@dgarson
Copy link
Owner

@dgarson dgarson commented Feb 24, 2026

Summary

Adds a tool reliability layer with three primitives:

  • CircuitBreaker: Prevents repeated tool calls from spamming when a service is down. Implements CLOSED/OPEN/HALF_OPEN state machine.
  • IdempotencyGuard: Deduplicates repeated tool calls within a TTL window.
  • RetryPolicy: Exponential backoff with jitter for resilient retry logic.

Changes

  • src/utils/circuit-breaker.ts - Standalone circuit breaker primitive
  • src/utils/idempotency.ts - In-memory idempotency store with TTL
  • src/utils/retry-policy.ts - Retry policy with exponential backoff
  • src/agents/tool-reliability.ts - Agent-level wrappers for tool reliability
  • Tests for all primitives

Related

  • bs-tim-5

@dgarson
Copy link
Owner Author

dgarson commented Feb 24, 2026

Tim this needs to be rebased on dgarson/fork

@dgarson
Copy link
Owner Author

dgarson commented Feb 24, 2026

@clawdbot

@dgarson dgarson force-pushed the stephan/tool-reliability-layer branch from d793b78 to ce7a2b6 Compare February 24, 2026 05:36
Copy link
Owner Author

@dgarson dgarson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Overall assessment: Needs refinement before merge

Summary

This PR introduces a useful reliability foundation:

  • with pending/completed/failed states
  • with exponential backoff + jitter
  • log fixes in /
  • unit tests for idempotency + retry behavior

The direction is strong and the primitives are generally clean and readable.

Concerns found

  1. Scope contamination in this PR

    • is unrelated to tool reliability.
    • This makes the PR harder to reason about and review.
  2. Architectural split/duplication risk

    • Reliability logic is now spread across and primitives.
    • There is no clear integration point yet (who is source of truth, and which layer should be used by tools).
  3. Test coverage gaps

    • Missing tests for failed-entry TTL expiry/re-execution in idempotency store.
    • Missing tests for jitter bounds and max-delay clamping behavior in retry policy.
    • No tests for state transitions in this PR.

Suggestions

  • Split out the topology spec file into its own PR.
  • Add/clarify a single integration layer that all tool calls use (to avoid parallel implementations).
  • Extend tests for failure/expiry and boundary cases.

Blocking issues before merge

  • Remove unrelated topology spec file from this PR (or move to separate PR).
  • Clarify integration ownership between and new utils before rollout.

Copy link
Owner Author

@dgarson dgarson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Overall assessment: Needs refinement before merge

Summary

This PR introduces a useful reliability foundation:

  • InMemoryIdempotencyStore with pending/completed/failed states
  • RetryPolicy with exponential backoff + jitter
  • log fixes in CircuitBreaker / IdempotencyGuard
  • unit tests for idempotency + retry behavior

The direction is strong and the primitives are generally clean and readable.

Concerns found

  1. Scope contamination in this PR

    • apps/web/ux-opus-design/AGENT-TOPOLOGY-SPEC.md is unrelated to tool reliability.
    • This makes the PR harder to reason about and review.
  2. Architectural split/duplication risk

    • Reliability logic is now spread across src/agents/tool-reliability.ts and src/utils/* primitives.
    • There is no clear integration point yet (who is source of truth, and which layer should be used by tools).
  3. Test coverage gaps

    • Missing tests for failed-entry TTL expiry/re-execution in idempotency store.
    • Missing tests for jitter bounds and max-delay clamping behavior in retry policy.
    • No tests for CircuitBreaker state transitions in this PR.

Suggestions

  • Split out the topology spec file into its own PR.
  • Add/clarify a single integration layer that all tool calls use (to avoid parallel implementations).
  • Extend tests for failure/expiry and boundary cases.

Blocking issues before merge

  • Remove unrelated topology spec file from this PR (or move to separate PR).
  • Clarify integration ownership between src/agents/tool-reliability.ts and new utils before rollout.

@dgarson dgarson changed the base branch from dgarson/fork to feat/horizon-ui-complete February 24, 2026 14:37
@dgarson dgarson merged commit e0cabe4 into feat/horizon-ui-complete Feb 24, 2026
2 of 6 checks passed
@dgarson dgarson deleted the stephan/tool-reliability-layer branch February 24, 2026 15:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant