Skip to content

[Bug] High latency and intermittent suggestions due to Espresso inference plan destruction on every request #292

@FuJacob

Description

@FuJacob

Bug Report

Description

Cotabby exhibits high latency and intermittent autocomplete behavior on both Open Source (llama.cpp in-process) and Apple Intelligence backends. System log investigation via log stream revealed the root cause: the app uses Apple's Espresso framework but destroys and recreates all inference plans on every single autocomplete request instead of keeping them alive between consecutive inferences. This creates unnecessary overhead on each keystroke trigger.
Additionally, user-configured Rules appear to have no effect when using the Apple Intelligence backend.

Steps to Reproduce

1 - Enable any backend (Open Source or Apple Intelligence)

2- Start typing in any app with Cotabby active

3- Run log stream --predicate 'process == "Cotabby"' --level debug 2>/dev/null in Terminal
Observe the repeating pattern per suggestion: Loaded network → Creating plan → [inference] → Destroying plan (5 plans created and destroyed per request)

4- Optionally run while true; do ps aux | grep -i "cotabby" | grep -v grep; sleep 1; done and observe RSS memory oscillating between ~1.1GB and ~1.24GB during active use

Expected Behavior

Inference plans should persist in memory between consecutive autocomplete requests (keep-alive pattern), only being destroyed after a configurable idle timeout. This would eliminate the per-request overhead and significantly reduce latency.

Environment

  • tabby version: 0.1.1-beta (30)
  • macOS version: macOS Tahoe 26.5

Submitted via tabby feedback form

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:perfCrashes, hangs, CPU pinning, slow responsebugSomething isn't working

    Projects

    Status
    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions