memory.graph: quality_gate fires on JSON extraction responses (false positive)

## Description

`[llm.router] quality_gate = 0.75` applies globally to all LLM calls routed through the main provider, including graph entity extraction. The quality gate measures cosine similarity between the query embedding and the response embedding. For graph extraction tasks (JSON input → JSON output with entities/edges), the structural dissimilarity between the extraction prompt and the structured JSON response causes systematic false positives (score ~0.55–0.70), below the 0.75 threshold.

This causes all graph extraction LLM calls to fall back to the next provider on every turn, adding latency and unnecessary provider cycling even when the extraction result is correct.

## Reproduction Steps

1. Configure `[llm.router] quality_gate = 0.75` (default in testing.toml)
2. Run a multi-turn session with graph memory enabled (`[memory.graph] enabled = true`)
3. Observe logs:

```
INFO memory.graph_extract: thompson_quality_fallback provider="openai" score=0.56 threshold=0.75
INFO memory.graph_extract: thompson_quality_fallback provider="openai" score=0.58 threshold=0.75
INFO memory.graph_extract: thompson_quality_fallback provider="openai" score=0.57 threshold=0.75
```

All extraction calls fail the gate regardless of provider. The pattern repeats across every turn.

## Expected Behavior

Graph extraction calls should bypass the quality gate, or the gate should only apply to conversational LLM calls. The quality gate is designed for coherence between user queries and assistant responses — not for structured JSON extraction tasks.

## Actual Behavior

Every graph extraction LLM call logs `thompson_quality_fallback` with score ~0.55–0.70, below the 0.75 threshold. Since all providers fail the gate, the router returns the best-seen response on exhaustion (M2 path), adding unnecessary latency (all provider calls are made before returning).

## Root Cause

`spawn_graph_extraction` uses `self.provider.clone()` — the main `SemanticMemory` provider. `apply_routing_signals()` in `src/bootstrap/provider.rs:184` applies `quality_gate` globally to this provider. `GraphConfig` does not expose a separate `extract_provider: ProviderName` field (unlike `ReasoningConfig` and `CompressionConfig` which do), so there is no way to configure a provider without the quality gate for graph extraction.

## Suggested Fix

1. Add `extract_provider: ProviderName` to `GraphConfig` (matching the pattern already used by `[memory.reasoning]` and `[memory.compression]`)
2. Build this provider without `quality_gate` for graph extraction calls (since JSON extraction coherence is not measurable by response/query embedding similarity)
3. Alternatively, add a per-call context label so the quality gate can be skipped for task-specific (non-conversational) LLM calls

## Environment

- Version: 0.20.1 (a030b2a2)
- Config: .local/config/testing.toml
- Features: full
- Observed: CI-668

## Logs / Evidence

All graph_extract calls during a 2-turn session:
```
score=0.5765, score=0.5877, score=0.5695, score=0.7035, score=0.6081, score=0.6620, score=0.5601, score=0.5653
```
All below threshold 0.75. No single call passes the gate.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

memory.graph: quality_gate fires on JSON extraction responses (false positive) #3601

Description

Reproduction Steps

Expected Behavior

Actual Behavior

Root Cause

Suggested Fix

Environment

Logs / Evidence

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

memory.graph: quality_gate fires on JSON extraction responses (false positive) #3601

Description

Description

Reproduction Steps

Expected Behavior

Actual Behavior

Root Cause

Suggested Fix

Environment

Logs / Evidence

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions