Skip to content

Add persistent Ollama model serving option with configurable keep-alive#75

Merged
micr0-dev merged 3 commits intomainfrom
copilot/fix-40ca4948-bea5-4ddf-a1e3-ab41c569592c
Oct 2, 2025
Merged

Add persistent Ollama model serving option with configurable keep-alive#75
micr0-dev merged 3 commits intomainfrom
copilot/fix-40ca4948-bea5-4ddf-a1e3-ab41c569592c

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Oct 2, 2025

Problem

Ollama automatically unloads models from RAM after each request, causing significant response time delays as the model needs to be reloaded for every alt-text generation. This is particularly problematic for active Altbot instances handling frequent requests, where users experience noticeable latency.

Solution

This PR adds a configurable keep-alive parameter that controls how long Ollama models stay loaded in RAM. The feature leverages Ollama's built-in --keepalive flag to maintain models in memory between requests.

Changes

Configuration

  • Added ollama_keep_alive config option with three modes:
    • -1: Persistent serving (model stays loaded indefinitely)
    • 0: Immediate unload after each request
    • Duration strings: 5m, 30m, 1h, etc.
  • Default: 5m (5 minutes) if not specified

Implementation

  • Updated OllamaProvider to include keep-alive state
  • Modified all Ollama command invocations to include --keepalive parameter
  • When set to -1, model is pre-loaded during startup with user feedback
  • Applied consistently across image processing and translation operations

User Experience

  • Enhanced setup wizard with interactive keep-alive configuration
  • Clear explanation of options and trade-offs
  • Recommends persistent mode for active instances

Usage Example

[llm]
provider = "ollama"
ollama_model = "llava-phi3"
ollama_keep_alive = "-1"  # Keep model loaded persistently

Performance Impact

Keep-Alive Value RAM Usage Response Time Best For
-1 (persistent) Continuous Instant High-traffic instances
5m (default) 5 min after use Medium Moderate usage
0 (immediate) Minimal Slow Resource-constrained

Benefits

  1. Dramatically improved response times for active instances (no model loading delays)
  2. Configurable to balance performance vs. resource usage
  3. Backward compatible with sensible defaults
  4. User-friendly setup wizard guides optimal configuration

This feature is ideal for the main Altbot instance and other high-traffic deployments where response time is critical.

Resolves #40

Original prompt

This section details on the original issue you should resolve

<issue_title>Add option for persistent Ollama model serving</issue_title>
<issue_description>so that Ollama doesn't unload the model from ram while Altbot is running, this will greatly improve response times with Ollama based models, but will increase continuous server load. Good for very active Altbot instances like the main Altbot instance.</issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes #74

💬 Share your feedback on Copilot coding agent for the chance to win a $200 gift card! Click here to start the survey.

Copilot AI and others added 2 commits October 2, 2025 08:29
Co-authored-by: micr0-dev <26364458+micr0-dev@users.noreply.github.com>
Co-authored-by: micr0-dev <26364458+micr0-dev@users.noreply.github.com>
Copilot AI changed the title [WIP] Add option for persistent Ollama model serving Add persistent Ollama model serving option with configurable keep-alive Oct 2, 2025
Copilot AI requested a review from micr0-dev October 2, 2025 08:35
Copy link
Copy Markdown
Owner

@micr0-dev micr0-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Holy shit, it actually did a decent job what the heck? And this is pretty clean too, honestly I thought we were gonna have to do this via ollama serve, but this is much cleaner. I like.

@micr0-dev micr0-dev marked this pull request as ready for review October 2, 2025 08:43
@micr0-dev micr0-dev merged commit cdef8a9 into main Oct 2, 2025
@micr0-dev micr0-dev deleted the copilot/fix-40ca4948-bea5-4ddf-a1e3-ab41c569592c branch October 2, 2025 20:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add option for persistent Ollama model serving Reference to Gemini

2 participants