[Bug] High latency and intermittent suggestions due to Espresso inference plan destruction on every request

## Bug Report

### Description
Cotabby exhibits high latency and intermittent autocomplete behavior on both Open Source (llama.cpp in-process) and Apple Intelligence backends. System log investigation via log stream revealed the root cause: the app uses Apple's Espresso framework but destroys and recreates all inference plans on every single autocomplete request instead of keeping them alive between consecutive inferences. This creates unnecessary overhead on each keystroke trigger.
Additionally, user-configured Rules appear to have no effect when using the Apple Intelligence backend.

### Steps to Reproduce
1 - Enable any backend (Open Source or Apple Intelligence)

2- Start typing in any app with Cotabby active

3- Run log stream --predicate 'process == "Cotabby"' --level debug 2>/dev/null in Terminal
Observe the repeating pattern per suggestion: Loaded network → Creating plan → [inference] → Destroying plan (5 plans created and destroyed per request)

4- Optionally run while true; do ps aux | grep -i "cotabby" | grep -v grep; sleep 1; done and observe RSS memory oscillating between ~1.1GB and ~1.24GB during active use

### Expected Behavior
Inference plans should persist in memory between consecutive autocomplete requests (keep-alive pattern), only being destroyed after a configurable idle timeout. This would eliminate the per-request overhead and significantly reduce latency.

### Environment
- **tabby version:** 0.1.1-beta (30)
- **macOS version:** macOS Tahoe 26.5

---
*Submitted via tabby feedback form*

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] High latency and intermittent suggestions due to Espresso inference plan destruction on every request #292

Bug Report

Description

Steps to Reproduce

Expected Behavior

Environment

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

[Bug] High latency and intermittent suggestions due to Espresso inference plan destruction on every request #292

Description

Bug Report

Description

Steps to Reproduce

Expected Behavior

Environment

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions