Skip to content

[Bug]: Ollama: agent bootstrap hardcodes KvSize=262144 (128K context), ignoring all config attempts to lower num_ctx — unusable on <32GB RAM #35436

@lohithburra01

Description

@lohithburra01

Bug type

Behavior bug (incorrect output/state without crash)

Summary

Environment

  • OpenClaw: 2026.3.2
  • OS: Windows 11
  • Node: 24.14.0
  • Ollama model: qwen3:4b (Q4_K_M, 2.5GB)
  • RAM: 16GB system + 11GB swap
  • GPU: RTX 2060 6GB VRAM

Summary

OpenClaw hardcodes 128K context (KvSize=262144) in the agent bootstrap and sends it directly to the Ollama runner, regardless of any config values. This causes Ollama to request 36GB of memory for a 2.5GB model, making it completely unusable on any machine with less than ~40GB RAM.

Error

model requires more system memory (34.4 GiB) than is available (17.7 GiB)

Ollama runner log confirming hardcoded value

msg=load request="{Operation:fit ... KvSize:262144 ...}"
msg="kv cache" device=CUDA0 size="36.0 GiB"
msg="model weights" device=CUDA0 size="2.3 GiB"
msg="total memory" size="39.5 GiB"

Config attempts that were all ignored

All of the following were tried and had zero effect:

"models.providers.ollama.models[].contextWindow": 4096
"models.providers.ollama.models[].numCtx": 4096
"models.providers.ollama.modelOptions.num_ctx": 4096
$env:OLLAMA_MAX_NUM_CTX = "4096"

Changing api type from openai-completions to ollama — no effect.

Workaround (working)

HTTP proxy on port 11435 that intercepts requests and rewrites num_ctx before forwarding to Ollama on 11434:

// ollama-proxy.js
const http = require('http');
const PROXY_PORT = 11435;
const OLLAMA_PORT = 11434;
const MAX_CTX = 16000; // OpenClaw minimum is 16000

const server = http.createServer((req, res) => {
  let body = '';
  req.on('data', chunk => { body += chunk.toString(); });
  req.on('end', () => {
    let modifiedBody = body;
    if (body && (req.headers['content-type'] || '').includes('application/json')) {
      try {
        const parsed = JSON.parse(body);
        if (parsed.options && parsed.options.num_ctx) {
          parsed.options.num_ctx = MAX_CTX;
        } else {
          if (!parsed.options) parsed.options = {};
          parsed.options.num_ctx = MAX_CTX;
        }
        modifiedBody = JSON.stringify(parsed);
      } catch (e) {}
    }
    const options = {
      hostname: '127.0.0.1',
      port: OLLAMA_PORT,
      path: req.url,
      method: req.method,
      headers: { ...req.headers, 'host': `127.0.0.1:${OLLAMA_PORT}`, 'content-length': Buffer.byteLength(modifiedBody) }
    };
    const proxyReq = http.request(options, (proxyRes) => {
      res.writeHead(proxyRes.statusCode, proxyRes.headers);
      proxyRes.pipe(res, { end: true });
    });
    proxyReq.on('error', (err) => { res.writeHead(502); res.end('Proxy error: ' + err.message); });
    proxyReq.write(modifiedBody);
    proxyReq.end();
  });
});

server.listen(PROXY_PORT, '127.0.0.1', () => {
  console.log(`Proxy running on ${PROXY_PORT}, forwarding to ${OLLAMA_PORT}`);
});

Then set baseUrl in openclaw.json to point to the proxy:

"models": {
  "providers": {
    "ollama": {
      "baseUrl": "http://localhost:11435"
    }
  }
}

Expected behavior

contextWindow or an equivalent config key under models.providers.ollama should control the num_ctx sent to Ollama.

Impact

This bug makes OpenClaw completely unusable with Ollama on any normal consumer PC. The workaround requires running a separate proxy process on every startup, which is not acceptable for a production setup.

Steps to reproduce

  1. Install Ollama and pull any model (tested: qwen3:4b)
  2. Add Ollama as a provider in openclaw.json with contextWindow: 4096
  3. Start openclaw gateway
  4. Send any message in TUI or Telegram
  5. Observe Ollama logs

Expected behavior

OpenClaw should pass num_ctx=4096 (or whatever contextWindow is set to)
to the Ollama runner.

Actual behavior

OpenClaw sends KvSize=262144 (128K tokens) to the Ollama runner regardless
of any config. Ollama requests 36GB RAM for a 2.5GB model and fails with:
"model requires more system memory (34.4 GiB) than is available (17.7 GiB)"

OpenClaw version

2026.3.2 (build 85377a2)

Operating system

Windows 11

Install method

npm install -g openclaw

Logs, screenshots, and evidence

msg=load request="{Operation:fit KvSize:262144 ...}"
msg="kv cache" device=CUDA0 size="36.0 GiB"
msg="model weights" device=CUDA0 size="2.3 GiB"
msg="total memory" size="39.5 GiB"
msg="Load failed" error="model requires more system memory (34.4 GiB) than is available (17.7 GiB)"

Impact and severity

Blocks workflow — completely. Any user running Ollama on a machine with
less than ~40GB RAM cannot use OpenClaw with local models at all.
This affects the majority of consumer hardware (most PCs have 16-32GB RAM).
Happens every time, 100% reproducible. Workaround requires running a
separate Node.js proxy process on every startup.

Additional information

Tried all of these config keys — all ignored:

  • models.providers.ollama.models[].contextWindow: 4096
  • models.providers.ollama.models[].numCtx: 4096
  • OLLAMA_MAX_NUM_CTX=4096 env var

Workaround: HTTP proxy on port 11435 that rewrites num_ctx in the
request body before forwarding to Ollama on 11434. See proxy code in
the body above.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions