Conversation
Co-authored-by: micr0-dev <26364458+micr0-dev@users.noreply.github.com>
Co-authored-by: micr0-dev <26364458+micr0-dev@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Add option for persistent Ollama model serving
Add persistent Ollama model serving option with configurable keep-alive
Oct 2, 2025
micr0-dev
approved these changes
Oct 2, 2025
Owner
micr0-dev
left a comment
There was a problem hiding this comment.
Holy shit, it actually did a decent job what the heck? And this is pretty clean too, honestly I thought we were gonna have to do this via ollama serve, but this is much cleaner. I like.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Ollama automatically unloads models from RAM after each request, causing significant response time delays as the model needs to be reloaded for every alt-text generation. This is particularly problematic for active Altbot instances handling frequent requests, where users experience noticeable latency.
Solution
This PR adds a configurable
keep-aliveparameter that controls how long Ollama models stay loaded in RAM. The feature leverages Ollama's built-in--keepaliveflag to maintain models in memory between requests.Changes
Configuration
ollama_keep_aliveconfig option with three modes:-1: Persistent serving (model stays loaded indefinitely)0: Immediate unload after each request5m,30m,1h, etc.5m(5 minutes) if not specifiedImplementation
OllamaProviderto include keep-alive state--keepaliveparameter-1, model is pre-loaded during startup with user feedbackUser Experience
Usage Example
Performance Impact
-1(persistent)5m(default)0(immediate)Benefits
This feature is ideal for the main Altbot instance and other high-traffic deployments where response time is critical.
Resolves #40
Original prompt
💬 Share your feedback on Copilot coding agent for the chance to win a $200 gift card! Click here to start the survey.