Skip to content

memory.graph: quality_gate fires on JSON extraction responses (false positive) #3601

@bug-ops

Description

@bug-ops

Description

[llm.router] quality_gate = 0.75 applies globally to all LLM calls routed through the main provider, including graph entity extraction. The quality gate measures cosine similarity between the query embedding and the response embedding. For graph extraction tasks (JSON input → JSON output with entities/edges), the structural dissimilarity between the extraction prompt and the structured JSON response causes systematic false positives (score ~0.55–0.70), below the 0.75 threshold.

This causes all graph extraction LLM calls to fall back to the next provider on every turn, adding latency and unnecessary provider cycling even when the extraction result is correct.

Reproduction Steps

  1. Configure [llm.router] quality_gate = 0.75 (default in testing.toml)
  2. Run a multi-turn session with graph memory enabled ([memory.graph] enabled = true)
  3. Observe logs:
INFO memory.graph_extract: thompson_quality_fallback provider="openai" score=0.56 threshold=0.75
INFO memory.graph_extract: thompson_quality_fallback provider="openai" score=0.58 threshold=0.75
INFO memory.graph_extract: thompson_quality_fallback provider="openai" score=0.57 threshold=0.75

All extraction calls fail the gate regardless of provider. The pattern repeats across every turn.

Expected Behavior

Graph extraction calls should bypass the quality gate, or the gate should only apply to conversational LLM calls. The quality gate is designed for coherence between user queries and assistant responses — not for structured JSON extraction tasks.

Actual Behavior

Every graph extraction LLM call logs thompson_quality_fallback with score ~0.55–0.70, below the 0.75 threshold. Since all providers fail the gate, the router returns the best-seen response on exhaustion (M2 path), adding unnecessary latency (all provider calls are made before returning).

Root Cause

spawn_graph_extraction uses self.provider.clone() — the main SemanticMemory provider. apply_routing_signals() in src/bootstrap/provider.rs:184 applies quality_gate globally to this provider. GraphConfig does not expose a separate extract_provider: ProviderName field (unlike ReasoningConfig and CompressionConfig which do), so there is no way to configure a provider without the quality gate for graph extraction.

Suggested Fix

  1. Add extract_provider: ProviderName to GraphConfig (matching the pattern already used by [memory.reasoning] and [memory.compression])
  2. Build this provider without quality_gate for graph extraction calls (since JSON extraction coherence is not measurable by response/query embedding similarity)
  3. Alternatively, add a per-call context label so the quality gate can be skipped for task-specific (non-conversational) LLM calls

Environment

  • Version: 0.20.1 (a030b2a)
  • Config: .local/config/testing.toml
  • Features: full
  • Observed: CI-668

Logs / Evidence

All graph_extract calls during a 2-turn session:

score=0.5765, score=0.5877, score=0.5695, score=0.7035, score=0.6081, score=0.6620, score=0.5601, score=0.5653

All below threshold 0.75. No single call passes the gate.

Metadata

Metadata

Assignees

Labels

P3Research — medium-high complexitybugSomething isn't workingmemoryzeph-memory crate (SQLite)

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions