Skip to content

feat(zeph-llm): EndpointPool round-robin with fail-skip #3610

@bug-ops

Description

@bug-ops

Part of epic #3602.

Scope

Implement a small round-robin pool of gonka node endpoints with a fail-skip cooldown. No tower-balance, no health-checking daemon — those are explicit non-goals.

Files to create

  • crates/zeph-llm/src/gonka/endpoints.rs:
    pub struct GonkaEndpoint { pub base_url: String, pub address: String }
    pub struct EndpointPool { nodes: Vec<GonkaEndpoint>, cursor: AtomicUsize, failed_until: Vec<AtomicU64> }
    impl EndpointPool {
        pub fn new(nodes: Vec<GonkaEndpoint>) -> Result<Self, LlmError>;  // empty = error
        pub fn next(&self) -> &GonkaEndpoint;                              // round-robin
        pub fn mark_failed(&self, idx: usize, cooldown: Duration);
        pub fn len(&self) -> usize;
    }

Behaviour

  • next() returns the next non-failed endpoint via AtomicUsize cursor.
  • mark_failed(idx, cooldown) stores now + cooldown in failed_until[idx]; next() skips entries whose cooldown has not expired.
  • Cooldown default 30 s.
  • If every endpoint is in cooldown, next() falls back to the least-recently-failed one.

Acceptance

  • Inline unit tests in endpoints.rs cover:
    • Round-robin order over 3 nodes.
    • Failed node skipped during cooldown, restored after.
    • All-failed fallback returns a valid endpoint.
    • Empty constructor errors.
  • cargo nextest run -p zeph-llm -E 'test(gonka_endpoint)' green.
  • No mutex usage (per await discipline rules); only atomics.

Depends on

None.

Size

S (~2h)

Metadata

Metadata

Assignees

Labels

P3Research — medium-high complexityenhancementNew feature or requestfeatureNew functionalityllmzeph-llm crate (Ollama, Claude)size/SSmall PR (11-50 lines)

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions