Bug Report
Description
Cotabby exhibits high latency and intermittent autocomplete behavior on both Open Source (llama.cpp in-process) and Apple Intelligence backends. System log investigation via log stream revealed the root cause: the app uses Apple's Espresso framework but destroys and recreates all inference plans on every single autocomplete request instead of keeping them alive between consecutive inferences. This creates unnecessary overhead on each keystroke trigger.
Additionally, user-configured Rules appear to have no effect when using the Apple Intelligence backend.
Steps to Reproduce
1 - Enable any backend (Open Source or Apple Intelligence)
2- Start typing in any app with Cotabby active
3- Run log stream --predicate 'process == "Cotabby"' --level debug 2>/dev/null in Terminal
Observe the repeating pattern per suggestion: Loaded network → Creating plan → [inference] → Destroying plan (5 plans created and destroyed per request)
4- Optionally run while true; do ps aux | grep -i "cotabby" | grep -v grep; sleep 1; done and observe RSS memory oscillating between ~1.1GB and ~1.24GB during active use
Expected Behavior
Inference plans should persist in memory between consecutive autocomplete requests (keep-alive pattern), only being destroyed after a configurable idle timeout. This would eliminate the per-request overhead and significantly reduce latency.
Environment
- tabby version: 0.1.1-beta (30)
- macOS version: macOS Tahoe 26.5
Submitted via tabby feedback form
Bug Report
Description
Cotabby exhibits high latency and intermittent autocomplete behavior on both Open Source (llama.cpp in-process) and Apple Intelligence backends. System log investigation via log stream revealed the root cause: the app uses Apple's Espresso framework but destroys and recreates all inference plans on every single autocomplete request instead of keeping them alive between consecutive inferences. This creates unnecessary overhead on each keystroke trigger.
Additionally, user-configured Rules appear to have no effect when using the Apple Intelligence backend.
Steps to Reproduce
1 - Enable any backend (Open Source or Apple Intelligence)
2- Start typing in any app with Cotabby active
3- Run log stream --predicate 'process == "Cotabby"' --level debug 2>/dev/null in Terminal
Observe the repeating pattern per suggestion: Loaded network → Creating plan → [inference] → Destroying plan (5 plans created and destroyed per request)
4- Optionally run while true; do ps aux | grep -i "cotabby" | grep -v grep; sleep 1; done and observe RSS memory oscillating between ~1.1GB and ~1.24GB during active use
Expected Behavior
Inference plans should persist in memory between consecutive autocomplete requests (keep-alive pattern), only being destroyed after a configurable idle timeout. This would eliminate the per-request overhead and significantly reduce latency.
Environment
Submitted via tabby feedback form