edit_prediction: Add Ollama as inline completion provider#1
Merged
akhil-p-git merged 6 commits intomainfrom Dec 13, 2025
Merged
edit_prediction: Add Ollama as inline completion provider#1akhil-p-git merged 6 commits intomainfrom
akhil-p-git merged 6 commits intomainfrom
Conversation
Adds support for using Ollama as an edit prediction provider, enabling
users to get code completions from locally-running LLMs without requiring
external API keys.
Implementation:
- New `ollama_completion` crate implementing `EditPredictionProvider` trait
- Uses Fill-In-Middle (FIM) prompt format for context-aware completions
- Configured with 75ms debounce, 256 max tokens, 0.2 temperature
- Default model: qwen2.5-coder:7b at localhost:11434
To use:
1. Install Ollama and run: `ollama pull qwen2.5-coder:7b`
2. Add to settings: `{ "features": { "edit_prediction_provider": "ollama" } }`
Closes zed-industries#15968
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Allows users to configure Ollama edit prediction settings:
- `api_url`: Custom Ollama server URL (default: http://localhost:11434)
- `model`: Model to use for completions (default: qwen2.5-coder:7b)
Settings can be configured in settings.json:
```json
{
"edit_predictions": {
"ollama": {
"api_url": "http://localhost:11434",
"model": "qwen2.5-coder:7b"
}
}
}
```
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Add global OllamaConnectionStatus tracking (Unknown, Connected, Error) - Show error indicator (red dot) on status bar button when connection fails - Display helpful error messages in context menu (e.g., "Ollama server is not running") - Show success indicator (green "Connected") when Ollama is responding - Add 7 unit tests covering helper functions, status tracking, and serialization 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…for Ollama This commit adds three enhancements to the Ollama completion provider: **Model Health Check** - Validates Ollama server is running on first completion request - Verifies configured model is available via /api/tags endpoint - Provides actionable error messages (e.g., lists available models) - Only runs once per session to avoid repeated network calls **Context Optimization** - Limits prefix to 4KB and suffix to 1KB for efficient context windows - Smart line-boundary detection avoids cutting code mid-line - Prevents malformed prompts that could degrade completion quality **Completion Caching** - LRU cache (max 50 entries) stores recent completions - Hash-based lookup for O(1) cache hits - Eliminates redundant API calls for identical prompts - Automatic eviction of oldest entries when cache is full Adds 6 new unit tests (13 total): - test_prompt_hashing - test_completion_cache - test_cache_lru_eviction - test_tags_response_deserialization - test_tags_response_empty - test_provider_initial_state 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
**Streaming Completions** - Replaced non-streaming API with streaming for lower perceived latency - Processes line-delimited JSON responses incrementally - Tracks time-to-first-token for performance monitoring - Accumulates tokens as they arrive from Ollama **Telemetry Integration** - Reports completion events: model, token count, latency metrics - Tracks cache hits separately (zero latency, zero tokens) - Reports health check success/failure events - Reports completion errors with error messages - Uses standard telemetry::event! macro for consistency **Metrics Captured** - Model name used for completion - Token count from streaming response - Total completion time (ms) - Time to first token (ms) - key latency metric - Cache hit status Adds 4 new unit tests (17 total): - test_streaming_response_deserialization - test_streaming_request_serialization - test_completion_metrics - test_completion_metrics_cached 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…for Ollama
**Accept/Reject Telemetry**
- Reports "Ollama Completion Accepted" event when user accepts a completion
- Reports "Ollama Completion Discarded" event when user rejects/discards
- Includes model name in event properties for analytics
**did_show() Callback**
- Reports "Ollama Completion Shown" event when completion is displayed
- Tracks completion_length for understanding suggestion quality
- Enables measuring impression-to-acceptance rate
**Configurable Parameters**
- Added `temperature` setting (default: 0.2) - controls randomness
- Added `max_tokens` setting (default: 256) - controls completion length
- Both configurable via edit_predictions.ollama in settings.json
- Example: `"edit_predictions": { "ollama": { "temperature": 0.3, "max_tokens": 512 } }`
**Settings Schema**
- Updated OllamaEditPredictionSettingsContent with new fields
- Updated OllamaSettings runtime struct
- Wired settings through edit_prediction_registry
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
|
Just started trying out Zed today - and was looking for exactly this! Amazing work @akhil-p-git and good timing :) When do you think this will be released? |
Owner
Author
Sorry about that, life got busy. I can merge it any time! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR implements Ollama as an inline code completion provider for Zed, addressing issue #15968. It enables users to get AI-powered code completions from locally-running Large Language Models (LLMs) without requiring external API keys or cloud services.
Why Ollama?
Ollama is a popular open-source tool for running LLMs locally on macOS, Linux, and Windows. By integrating Ollama as an edit prediction provider, Zed users gain:
Features
1. Core Completion Provider
A new
ollama_completioncrate (~950 lines) implementing theEditPredictionProvidertrait:2. User Configuration
Full settings support via
settings.json:{ "features": { "edit_prediction_provider": "ollama" }, "edit_predictions": { "ollama": { "api_url": "http://localhost:11434", "model": "qwen2.5-coder:7b", "temperature": 0.2, "max_tokens": 256 } } }api_urlhttp://localhost:11434modelqwen2.5-coder:7btemperature0.2max_tokens2563. Error Handling & Status Indicators
Comprehensive error handling with visual feedback:
4. Model Health Check
Automatic validation on first completion request:
/api/tagsendpoint5. Context Window Optimization
Smart prompt building that respects model context limits:
6. Completion Caching
LRU cache eliminates redundant API calls:
7. Streaming Completions
Lower perceived latency with streaming API:
stream: truefor incremental responses8. Telemetry Integration
Comprehensive analytics for quality measurement:
Ollama CompletionOllama Completion ShownOllama Completion AcceptedOllama Completion DiscardedOllama Health Check PassedOllama Health Check FailedTechnical Implementation
Architecture
FIM Prompt Format
The model generates:
*n;Supported models: Qwen2.5-Coder, CodeLlama, DeepSeek-Coder, StarCoder, Codestral
Files Changed
crates/ollama_completion/Cargo.tomlcrates/ollama_completion/src/ollama_completion.rscrates/settings/src/settings_content/language.rscrates/language/src/language_settings.rscrates/zed/src/zed/edit_prediction_registry.rscrates/edit_prediction_button/src/edit_prediction_button.rscrates/edit_prediction_button/Cargo.tomlcrates/agent_ui/src/agent_ui.rscrates/zed/Cargo.tomlCargo.tomlKey Data Structures
Installation & Usage
Prerequisites
Install Ollama: https://ollama.ai
Pull a code completion model:
Recommended models:
qwen2.5-coder:7bcodellama:7bdeepseek-coder:6.7bstarcoder2:7bStart Ollama server (if not running as service):
Configuration
Add to
~/.config/zed/settings.json:{ "features": { "edit_prediction_provider": "ollama" } }Usage
Test Plan
Automated Tests (17 total)
Manual Testing Checklist
cargo build -p zedcompletes without errorscargo test -p ollama_completionpasses all testsPerformance Considerations
Commits
80df76c4e76750f61d00e5b5bd8f1aff3cedee4579Future Enhancements
Potential improvements for future PRs:
Related Issues
Compatibility
🤖 Generated with Claude Code