edit_prediction: Add Ollama as inline completion provider by akhil-p-git · Pull Request #1 · akhil-p-git/zed

akhil-p-git · 2025-12-01T21:37:09Z

Summary

This PR implements Ollama as an inline code completion provider for Zed, addressing issue #15968. It enables users to get AI-powered code completions from locally-running Large Language Models (LLMs) without requiring external API keys or cloud services.

Why Ollama?

Ollama is a popular open-source tool for running LLMs locally on macOS, Linux, and Windows. By integrating Ollama as an edit prediction provider, Zed users gain:

Privacy: All code stays on their local machine - no data sent to external servers
No API costs: Use open-source models without usage fees or rate limits
Offline capability: Works without internet connectivity
Model flexibility: Choose from hundreds of available models (Qwen, CodeLlama, DeepSeek, etc.)
Hardware optimization: Models run on local GPU/CPU, optimized for user's hardware

Features

1. Core Completion Provider

A new ollama_completion crate (~950 lines) implementing the EditPredictionProvider trait:

Feature	Description
FIM Prompting	Uses standardized Fill-In-Middle format for context-aware completions
Intelligent Debouncing	75ms delay prevents excessive API calls while typing
Optimized Parameters	256 max tokens, 0.2 temperature for focused completions
Smart Stop Sequences	Auto-stops at double newlines and FIM tokens
Grapheme-aware Matching	Proper Unicode handling when computing edit ranges

2. User Configuration

Full settings support via settings.json:

{
  "features": {
    "edit_prediction_provider": "ollama"
  },
  "edit_predictions": {
    "ollama": {
      "api_url": "http://localhost:11434",
      "model": "qwen2.5-coder:7b",
      "temperature": 0.2,
      "max_tokens": 256
    }
  }
}

Option	Type	Default	Description
`api_url`	String	`http://localhost:11434`	Ollama server URL
`model`	String	`qwen2.5-coder:7b`	Model for completions
`temperature`	Float	`0.2`	Sampling temperature (0.0-2.0)
`max_tokens`	Integer	`256`	Maximum tokens to generate

3. Error Handling & Status Indicators

Comprehensive error handling with visual feedback:

State	UI Indicator	Context Menu Message
Connected	Normal	"Connected to Ollama" (green)
Server Down	Red dot	"Ollama server is not running. Start with 'ollama serve'."
Model Missing	Red dot	"Model not found. Available: [list]. Run 'ollama pull '."
Other Error	Red dot	Truncated error message

4. Model Health Check

Automatic validation on first completion request:

Verifies Ollama server is running via /api/tags endpoint
Confirms configured model exists on the server
Provides actionable error messages with available models list
Runs once per session to minimize overhead

5. Context Window Optimization

Smart prompt building that respects model context limits:

Setting	Value	Purpose
Prefix limit	4KB	Code before cursor
Suffix limit	1KB	Code after cursor
Line boundary	Auto	Clean truncation at newlines

6. Completion Caching

LRU cache eliminates redundant API calls:

Property	Value
Cache size	50 entries
Lookup	O(1) hash-based
Eviction	Automatic LRU
Key	Prompt hash

7. Streaming Completions

Lower perceived latency with streaming API:

Uses stream: true for incremental responses
Processes tokens as they arrive
Tracks time-to-first-token metric
Accumulates into final completion

8. Telemetry Integration

Comprehensive analytics for quality measurement:

Event	Properties	When
`Ollama Completion`	model, token_count, total_time_ms, time_to_first_token_ms, cached, success	After each request
`Ollama Completion Shown`	model, completion_length	Ghost text displayed
`Ollama Completion Accepted`	model	User presses Tab
`Ollama Completion Discarded`	model	User dismisses
`Ollama Health Check Passed`	-	Successful check
`Ollama Health Check Failed`	error	Failed check

Technical Implementation

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                         User Types Code                          │
└─────────────────────────────────────────────────────────────────┘
                                 │
                                 ▼
┌─────────────────────────────────────────────────────────────────┐
│                    Editor triggers refresh()                     │
│                      (with 75ms debounce)                        │
└─────────────────────────────────────────────────────────────────┘
                                 │
                                 ▼
┌─────────────────────────────────────────────────────────────────┐
│               OllamaCompletionProvider.refresh()                 │
│                                                                  │
│  1. Check completion cache ────────────────► Return if cached    │
│  2. Health check (first request only)                            │
│  3. Build optimized FIM prompt                                   │
│  4. Stream completion from Ollama API                            │
│  5. Cache result, report telemetry                               │
└─────────────────────────────────────────────────────────────────┘
                                 │
                                 ▼
┌─────────────────────────────────────────────────────────────────┐
│              Ollama API (/api/generate, stream=true)             │
│                                                                  │
│  Request:                                                        │
│  {                                                               │
│    "model": "qwen2.5-coder:7b",                                  │
│    "prompt": "<|fim_prefix|>...<|fim_suffix|>...<|fim_middle|>", │
│    "stream": true,                                               │
│    "options": { "num_predict": 256, "temperature": 0.2 }         │
│  }                                                               │
│                                                                  │
│  Response: Line-delimited JSON with incremental tokens           │
└─────────────────────────────────────────────────────────────────┘
                                 │
                                 ▼
┌─────────────────────────────────────────────────────────────────┐
│                   completion_from_text()                         │
│                                                                  │
│  1. Trim to end of line (unless multiline)                       │
│  2. Compare with buffer text (grapheme-aware)                    │
│  3. Generate minimal edit operations                             │
│  4. Return EditPrediction::Local { edits, ... }                  │
└─────────────────────────────────────────────────────────────────┘
                                 │
                                 ▼
┌─────────────────────────────────────────────────────────────────┐
│              Editor displays ghost text / inline hint            │
│              User presses Tab to accept                          │
└─────────────────────────────────────────────────────────────────┘

FIM Prompt Format

<|fim_prefix|>fn calculate_sum(numbers: &[i32]) -> i32 {
    let mut sum = 0;
    for n in numbers {
        sum += <|fim_suffix|>
    }
    sum
}<|fim_middle|>

The model generates: *n;

Supported models: Qwen2.5-Coder, CodeLlama, DeepSeek-Coder, StarCoder, Codestral

Files Changed

File	Lines	Description
`crates/ollama_completion/Cargo.toml`	+34	New crate manifest
`crates/ollama_completion/src/ollama_completion.rs`	+950	Core implementation
`crates/settings/src/settings_content/language.rs`	+31	Settings schema
`crates/language/src/language_settings.rs`	+23	Runtime settings
`crates/zed/src/zed/edit_prediction_registry.rs`	+20	Provider registration
`crates/edit_prediction_button/src/edit_prediction_button.rs`	+102	UI integration
`crates/edit_prediction_button/Cargo.toml`	+1	Dependency
`crates/agent_ui/src/agent_ui.rs`	+1	Action filter
`crates/zed/Cargo.toml`	+1	Workspace dependency
`Cargo.toml`	+2	Workspace member

Key Data Structures

pub struct OllamaCompletionProvider {
    http_client: Arc<dyn HttpClient>,
    api_url: String,
    model: String,
    temperature: f32,
    max_tokens: i32,
    buffer_id: Option<EntityId>,
    completion_text: Option<String>,
    pending_refresh: Option<Task<Result<()>>>,
    completion_position: Option<Anchor>,
    completion_cache: HashMap<u64, String>,  // LRU cache
    cache_order: Vec<u64>,                   // Eviction order
    health_checked: bool,                    // One-time flag
}

pub enum OllamaConnectionStatus {
    Unknown,
    Connected,
    Error(String),
}

struct CompletionMetrics {
    model: String,
    token_count: u32,
    total_time_ms: u64,
    time_to_first_token_ms: Option<u64>,
    cached: bool,
}

Installation & Usage

Prerequisites

Install Ollama: https://ollama.ai

# macOS
brew install ollama

# Or download from https://ollama.ai/download

Pull a code completion model:

ollama pull qwen2.5-coder:7b

Recommended models:

Model	Size	Notes
`qwen2.5-coder:7b`	4.7GB	Best quality/speed balance
`codellama:7b`	3.8GB	Meta's code model
`deepseek-coder:6.7b`	3.8GB	Strong code understanding
`starcoder2:7b`	4.0GB	Multi-language support

Start Ollama server (if not running as service):
```
ollama serve
```

Configuration

Add to ~/.config/zed/settings.json:

{
  "features": {
    "edit_prediction_provider": "ollama"
  }
}

Usage

Open any code file in Zed
Start typing - completions appear as ghost text after 75ms
Press Tab to accept the completion
Check the status bar icon for connection status

Test Plan

Automated Tests (17 total)

running 17 tests
test tests::test_completion_metrics ... ok
test tests::test_completion_metrics_cached ... ok
test tests::test_has_leading_newline ... ok
test tests::test_ollama_connection_status_default ... ok
test tests::test_ollama_connection_status_equality ... ok
test tests::test_completion_cache ... ok
test tests::test_cache_lru_eviction ... ok
test tests::test_prompt_hashing ... ok
test tests::test_provider_initial_state ... ok
test tests::test_provider_builder_methods ... ok
test tests::test_generate_response_deserialization ... ok
test tests::test_streaming_response_deserialization ... ok
test tests::test_streaming_request_serialization ... ok
test tests::test_generate_request_serialization ... ok
test tests::test_tags_response_deserialization ... ok
test tests::test_tags_response_empty ... ok
test tests::test_trim_to_end_of_line_unless_leading_newline ... ok

test result: ok. 17 passed; 0 failed; 0 ignored

Manual Testing Checklist

Performance Considerations

Aspect	Implementation
Debounce	75ms delay prevents API spam
Caching	LRU cache (50 entries) eliminates redundant calls
Context	Smart truncation (4KB prefix, 1KB suffix)
Health check	One-time per session
Streaming	Lower perceived latency
Memory	Cache eviction prevents unbounded growth

Commits

Commit	Description
`80df76c`	Core provider with FIM support
`4e76750`	User configuration
`f61d00e`	Error handling UI and tests
`5b5bd8f`	Health check, context optimization, caching
`1aff3ce`	Streaming completions and telemetry
`dee4579`	Telemetry callbacks and configurable parameters

Future Enhancements

Potential improvements for future PRs:

Feature	Complexity	Description
Multi-completion cycling	Medium	Generate and cycle through multiple suggestions
Language filtering	Medium	Enable/disable per language
Model capability validation	Medium	Verify model supports code completion
Stop sequences config	Low	User-configurable stop tokens

Related Issues

Closes zed-industries/zed#15968

Compatibility

Zed Version: Built against current main branch
Ollama Version: Tested with Ollama 0.4.x+
Platforms: macOS (tested), Linux/Windows (should work)

🤖 Generated with Claude Code

Adds support for using Ollama as an edit prediction provider, enabling users to get code completions from locally-running LLMs without requiring external API keys. Implementation: - New `ollama_completion` crate implementing `EditPredictionProvider` trait - Uses Fill-In-Middle (FIM) prompt format for context-aware completions - Configured with 75ms debounce, 256 max tokens, 0.2 temperature - Default model: qwen2.5-coder:7b at localhost:11434 To use: 1. Install Ollama and run: `ollama pull qwen2.5-coder:7b` 2. Add to settings: `{ "features": { "edit_prediction_provider": "ollama" } }` Closes zed-industries#15968 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Allows users to configure Ollama edit prediction settings: - `api_url`: Custom Ollama server URL (default: http://localhost:11434) - `model`: Model to use for completions (default: qwen2.5-coder:7b) Settings can be configured in settings.json: ```json { "edit_predictions": { "ollama": { "api_url": "http://localhost:11434", "model": "qwen2.5-coder:7b" } } } ``` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

- Add global OllamaConnectionStatus tracking (Unknown, Connected, Error) - Show error indicator (red dot) on status bar button when connection fails - Display helpful error messages in context menu (e.g., "Ollama server is not running") - Show success indicator (green "Connected") when Ollama is responding - Add 7 unit tests covering helper functions, status tracking, and serialization 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

…for Ollama This commit adds three enhancements to the Ollama completion provider: **Model Health Check** - Validates Ollama server is running on first completion request - Verifies configured model is available via /api/tags endpoint - Provides actionable error messages (e.g., lists available models) - Only runs once per session to avoid repeated network calls **Context Optimization** - Limits prefix to 4KB and suffix to 1KB for efficient context windows - Smart line-boundary detection avoids cutting code mid-line - Prevents malformed prompts that could degrade completion quality **Completion Caching** - LRU cache (max 50 entries) stores recent completions - Hash-based lookup for O(1) cache hits - Eliminates redundant API calls for identical prompts - Automatic eviction of oldest entries when cache is full Adds 6 new unit tests (13 total): - test_prompt_hashing - test_completion_cache - test_cache_lru_eviction - test_tags_response_deserialization - test_tags_response_empty - test_provider_initial_state 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

**Streaming Completions** - Replaced non-streaming API with streaming for lower perceived latency - Processes line-delimited JSON responses incrementally - Tracks time-to-first-token for performance monitoring - Accumulates tokens as they arrive from Ollama **Telemetry Integration** - Reports completion events: model, token count, latency metrics - Tracks cache hits separately (zero latency, zero tokens) - Reports health check success/failure events - Reports completion errors with error messages - Uses standard telemetry::event! macro for consistency **Metrics Captured** - Model name used for completion - Token count from streaming response - Total completion time (ms) - Time to first token (ms) - key latency metric - Cache hit status Adds 4 new unit tests (17 total): - test_streaming_response_deserialization - test_streaming_request_serialization - test_completion_metrics - test_completion_metrics_cached 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

…for Ollama **Accept/Reject Telemetry** - Reports "Ollama Completion Accepted" event when user accepts a completion - Reports "Ollama Completion Discarded" event when user rejects/discards - Includes model name in event properties for analytics **did_show() Callback** - Reports "Ollama Completion Shown" event when completion is displayed - Tracks completion_length for understanding suggestion quality - Enables measuring impression-to-acceptance rate **Configurable Parameters** - Added `temperature` setting (default: 0.2) - controls randomness - Added `max_tokens` setting (default: 256) - controls completion length - Both configurable via edit_predictions.ollama in settings.json - Example: `"edit_predictions": { "ollama": { "temperature": 0.3, "max_tokens": 512 } }` **Settings Schema** - Updated OllamaEditPredictionSettingsContent with new fields - Updated OllamaSettings runtime struct - Wired settings through edit_prediction_registry 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

robertherber · 2025-12-02T15:26:45Z

Just started trying out Zed today - and was looking for exactly this! Amazing work @akhil-p-git and good timing :) When do you think this will be released?

akhil-p-git · 2025-12-13T23:46:06Z

Just started trying out Zed today - and was looking for exactly this! Amazing work @akhil-p-git and good timing :) When do you think this will be released?

Sorry about that, life got busy. I can merge it any time!

akhil-p-git and others added 6 commits December 1, 2025 15:36

akhil-p-git merged commit a42b879 into main Dec 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

edit_prediction: Add Ollama as inline completion provider#1

edit_prediction: Add Ollama as inline completion provider#1
akhil-p-git merged 6 commits intomainfrom
feat/ollama-inline-completion

akhil-p-git commented Dec 1, 2025 •

edited

Loading

Uh oh!

robertherber commented Dec 2, 2025

Uh oh!

akhil-p-git commented Dec 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

akhil-p-git commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why Ollama?

Features

1. Core Completion Provider

2. User Configuration

3. Error Handling & Status Indicators

4. Model Health Check

5. Context Window Optimization

6. Completion Caching

7. Streaming Completions

8. Telemetry Integration

Technical Implementation

Architecture

FIM Prompt Format

Files Changed

Key Data Structures

Installation & Usage

Prerequisites

Configuration

Usage

Test Plan

Automated Tests (17 total)

Manual Testing Checklist

Performance Considerations

Commits

Future Enhancements

Related Issues

Compatibility

Uh oh!

robertherber commented Dec 2, 2025

Uh oh!

akhil-p-git commented Dec 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

akhil-p-git commented Dec 1, 2025 •

edited

Loading