Skip to content

[Feature] Bundle a zero-config CPU-only local model (llama.cpp + Qwen2.5-1.5B) as the default model — true offline-first experience #51

@SonicBotMan

Description

@SonicBotMan

🎯 Feature Goal

Currently, openclaw-portable runs OpenClaw entirely from a USB drive — Node.js, config, workspace — but still requires an external API key or a separately installed Ollama to actually talk to any AI model. This is the last missing piece for a true zero-dependency offline experience.

This issue proposes bundling a small, CPU-only local model directly inside the portable package, launching it as a sidecar process alongside the OpenClaw gateway, and auto-configuring it as the default model — so users plug in the USB and AI just works, with no internet, no API key, no pre-installed software.


🏗️ Architecture Analysis (based on current repo structure)

After reviewing the codebase, the integration fits cleanly into the existing launch flow:

start.sh / start.bat
  │
  ├─ [2/5] 设置环境 (Node, OpenClaw binary check)
  ├─ [NEW] [3/5] Launch bundled llama-server sidecar  ← INSERT HERE
  ├─ [4/5] 初始化工作目录
  └─ [5/5] openclaw gateway start

The install-models.js + deepMerge config mechanism already provides the exact hook needed to inject the bundled model provider config into openclaw.json — no architectural changes required.


📦 Recommended Model: Qwen2.5-1.5B-Instruct Q4_K_M

Attribute Value
Disk size ~900 MB
RAM usage ~1.2 GB
Inference engine llama.cpp llama-server (static binary, no install)
Tool calling ✅ Native support
Context window 32k tokens
CPU speed (4-core) ~8–12 tok/s
License Apache 2.0 ✅ redistributable

This is the smallest model with reliable tool-calling support, which OpenClaw's agent runtime requires. Smaller models (0.5B) lack consistent JSON function-call formatting.


📁 Proposed Directory Layout

Add the following structure inside the portable package (alongside existing node/, config/, data/):

openclaw-portable/
├── node/                    # existing
├── npm-global/              # existing  
├── config/                  # existing
├── data/                    # existing
├── llm/                     # NEW
│   ├── bin/
│   │   ├── llama-server-linux-x86_64      # ~8MB static binary
│   │   ├── llama-server-macos-arm64       # ~9MB
│   │   ├── llama-server-macos-x86_64      # ~9MB
│   │   └── llama-server-win32-avx2.exe    # ~10MB
│   ├── models/
│   │   └── qwen2.5-1.5b-instruct-q4_k_m.gguf   # ~900MB
│   └── server.log           # runtime, gitignored
├── start.sh                 # MODIFIED
├── start.bat                # MODIFIED
├── stop.sh                  # MODIFIED
└── stop.bat                 # MODIFIED

The llm/models/ directory should be listed in .gitignore and distributed via GitHub Releases as a separate download or via the build script.


🔧 Implementation Details

1. start.sh — Add sidecar launch step (between step 2 and current step 3)

# ============================================
# [NEW] 3/6 启动内置本地模型 (llama-server)
# ============================================
echo -e "${BLUE}[3/6] 启动内置本地模型...${NC}"

LLM_DIR="$USB_PATH/llm"
LLM_PORT=18080
LLM_PID_FILE="$LLM_DIR/server.pid"
LLM_LOG="$LLM_DIR/server.log"

# Detect platform binary
OS_TYPE="$(uname -s)"
ARCH_TYPE="$(uname -m)"

case "$OS_TYPE" in
  Linux*)  LLM_BIN="$LLM_DIR/bin/llama-server-linux-x86_64" ;;
  Darwin*)
    if [ "$ARCH_TYPE" = "arm64" ]; then
      LLM_BIN="$LLM_DIR/bin/llama-server-macos-arm64"
    else
      LLM_BIN="$LLM_DIR/bin/llama-server-macos-x86_64"
    fi ;;
  *) LLM_BIN="" ;;
esac

LLM_MODEL="$LLM_DIR/models/qwen2.5-1.5b-instruct-q4_k_m.gguf"
LLM_BUNDLED_READY=0

if [ -x "$LLM_BIN" ] && [ -f "$LLM_MODEL" ]; then
  # Check if already running on port
  if ! lsof -i :$LLM_PORT -sTCP:LISTEN -t &>/dev/null 2>&1; then
    THREADS=$(( $(nproc 2>/dev/null || sysctl -n hw.logicalcpu 2>/dev/null || echo 2) - 1 ))
    THREADS=$(( THREADS < 1 ? 1 : THREADS ))
    
    chmod +x "$LLM_BIN"
    nohup "$LLM_BIN" \
      --model "$LLM_MODEL" \
      --port $LLM_PORT \
      --host 127.0.0.1 \
      --ctx-size 32768 \
      --threads $THREADS \
      --parallel 1 \
      -ngl 0 \
      --log-disable \
      >> "$LLM_LOG" 2>&1 &
    echo $! > "$LLM_PID_FILE"
    echo -e "${GREEN}✅ llama-server 已启动 (PID: $!, port: $LLM_PORT, threads: $THREADS)${NC}"
    echo -e "${YELLOW}   模型加载中,首次响应约需 5-15 秒...${NC}"
    LLM_BUNDLED_READY=1
  else
    echo -e "${GREEN}✅ 内置模型已在运行 (port: $LLM_PORT)${NC}"
    LLM_BUNDLED_READY=1
  fi
else
  echo -e "${YELLOW}⚠️  内置模型未找到,跳过 (仍可使用云端 API)${NC}"
fi

2. Inject model config into openclaw.json before gateway starts

Add this block after the config file is initialized but before openclaw gateway start:

# Auto-inject bundled model config if model is available and user has no primary model set
if [ $LLM_BUNDLED_READY -eq 1 ]; then
  BUNDLED_MODEL_CONFIG=$(cat <<'JSONEOF'
{
  "models": {
    "mode": "merge",
    "providers": {
      "bundled-local": {
        "baseUrl": "http://127.0.0.1:18080/v1",
        "apiKey": "bundled-no-key",
        "api": "openai-completions",
        "models": [
          {
            "id": "qwen2.5-1.5b",
            "name": "Qwen2.5 1.5B (Bundled CPU)",
            "contextWindow": 32768,
            "maxTokens": 4096,
            "cost": { "input": 0, "output": 0 }
          }
        ]
      }
    }
  }
}
JSONEOF
  )
  
  # Use existing install-models.js merge mechanism
  echo "$BUNDLED_MODEL_CONFIG" > "$TEMP_DIR/bundled-model-inject.json"
  
  # Only set as default if user has NOT configured a primary model
  HAS_PRIMARY=$(node -e "
    try {
      const cfg = JSON.parse(require('fs').readFileSync('$TEMP_DIR/openclaw.json','utf8'));
      console.log(cfg?.agents?.defaults?.model?.primary ? 'yes' : 'no');
    } catch(e) { console.log('no'); }
  " 2>/dev/null || echo "no")
  
  if [ "$HAS_PRIMARY" = "no" ]; then
    # Merge and set as default
    node -e "
      const fs = require('fs');
      const cfgPath = '$TEMP_DIR/openclaw.json';
      const cfg = fs.existsSync(cfgPath) ? JSON.parse(fs.readFileSync(cfgPath,'utf8')) : {};
      const inject = JSON.parse(fs.readFileSync('$TEMP_DIR/bundled-model-inject.json','utf8'));
      
      // Deep merge providers
      cfg.models = cfg.models || {};
      cfg.models.providers = Object.assign({}, cfg.models.providers, inject.models.providers);
      cfg.models.mode = 'merge';
      
      // Set as default only if no primary configured
      cfg.agents = cfg.agents || {};
      cfg.agents.defaults = cfg.agents.defaults || {};
      cfg.agents.defaults.model = cfg.agents.defaults.model || {};
      cfg.agents.defaults.model.primary = 'bundled-local/qwen2.5-1.5b';
      
      fs.writeFileSync(cfgPath, JSON.stringify(cfg, null, 2));
      console.log('✅ 内置模型已设为默认模型');
    " 2>/dev/null && echo -e "${GREEN}   bundled-local/qwen2.5-1.5b 已注册为默认模型${NC}"
    rm -f "$TEMP_DIR/bundled-model-inject.json"
  else
    echo -e "${CYAN}   检测到已配置主模型,内置模型作为备用 (fallback)${NC}"
  fi
fi

3. stop.sh — Clean up llama-server on shutdown

# Kill bundled llama-server
LLM_PID_FILE="$USB_PATH/llm/server.pid"
if [ -f "$LLM_PID_FILE" ]; then
  LLM_PID=$(cat "$LLM_PID_FILE")
  if kill -0 "$LLM_PID" 2>/dev/null; then
    kill "$LLM_PID"
    echo -e "${GREEN}✅ 内置模型已停止 (PID: $LLM_PID)${NC}"
  fi
  rm -f "$LLM_PID_FILE"
fi

4. start.bat — Windows equivalent (PowerShell snippet)

REM === Start bundled llama-server ===
SET LLM_BIN=%USB_PATH%\llm\bin\llama-server-win32-avx2.exe
SET LLM_MODEL=%USB_PATH%\llm\models\qwen2.5-1.5b-instruct-q4_k_m.gguf
SET LLM_PORT=18080

IF EXIST "%LLM_BIN%" IF EXIST "%LLM_MODEL%" (
    netstat -ano | findstr :%LLM_PORT% >nul 2>&1
    IF ERRORLEVEL 1 (
        FOR /F "tokens=1" %%i IN ('wmic cpu get NumberOfLogicalProcessors /value ^| find "="') DO SET /A THREADS=%%i-1
        IF %THREADS% LSS 1 SET THREADS=1
        START /B "" "%LLM_BIN%" --model "%LLM_MODEL%" --port %LLM_PORT% --host 127.0.0.1 --ctx-size 32768 --threads %THREADS% --parallel 1 -ngl 0 >> "%USB_PATH%\llm\server.log" 2>&1
        ECHO [OK] llama-server started on port %LLM_PORT%
        SET LLM_BUNDLED_READY=1
    ) ELSE (
        ECHO [OK] Bundled model already running
        SET LLM_BUNDLED_READY=1
    )
) ELSE (
    ECHO [WARN] Bundled model not found, skipping
    SET LLM_BUNDLED_READY=0
)

📥 Model Distribution Strategy

The ~900MB GGUF file should NOT be committed to git. Recommended approach:

Option A (recommended): GitHub Releases attachment

# In build-offline-package.sh, add:
download_bundled_model() {
  MODEL_URL="https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct-GGUF/resolve/main/qwen2.5-1.5b-instruct-q4_k_m.gguf"
  MODEL_DIR="$SCRIPT_DIR/llm/models"
  mkdir -p "$MODEL_DIR"
  if [ ! -f "$MODEL_DIR/qwen2.5-1.5b-instruct-q4_k_m.gguf" ]; then
    echo "Downloading bundled model (~900MB)..."
    curl -L --progress-bar -o "$MODEL_DIR/qwen2.5-1.5b-instruct-q4_k_m.gguf" "$MODEL_URL"
  fi
}

Option B: First-run auto-download
Add a setup-llm.sh script that users run once to download the model into llm/models/. The start.sh gracefully degrades if the model is absent.

.gitignore additions:

llm/models/*.gguf
llm/bin/llama-server*
llm/server.log
llm/server.pid

⚠️ Known Risks & Mitigations

Risk Mitigation
Cold-start latency (5-15s model load) Show explicit "加载中" message; start llama-server before gateway
Port 18080 conflict Check with lsof/netstat before starting; fall back to 18081
AVX2 not supported (old CPUs) Detect with grep avx2 /proc/cpuinfo; warn user and skip
USB read speed bottleneck Copy model to $TEMP_DIR on first run if USB speed < 50MB/s
Context overflow (32k) Enable OpenClaw compaction in injected config: "compaction": { "enabled": true }
User already has API keys Respect existing agents.defaults.model.primary; register bundled as fallback only

🪜 Suggested Implementation Phases

  • Phase 1start.sh/stop.sh: Add llama-server sidecar lifecycle (no model bundled yet, just the plumbing)
  • Phase 2build-offline-package.sh: Add download_bundled_model() step; download llama binaries for all 4 platforms
  • Phase 3 — Config injection: Auto-register bundled-local provider and set as default model in openclaw.json
  • Phase 4start.bat/stop.bat: Windows parity
  • Phase 5setup-llm.sh: Standalone first-time model download helper
  • Phase 6 — Update README.md and OFFLINE-GUIDE.md with bundled model section

💡 Why This Matters

This feature would make openclaw-portable the first USB-bootable AI assistant that:

  1. Requires zero internet after initial setup
  2. Requires zero pre-installed software (no Ollama, no Python, no Docker)
  3. Has zero API cost for basic usage
  4. Works on any x86-64/ARM64 machine, just plug and run

The existing infrastructure (install-models.js merge logic, start.sh modular step structure, data/.openclaw/ config path) makes this integration very clean — this is essentially adding one new service to an already well-designed process manager.


Interested in contributing a draft PR for Phase 1 if the maintainer approves the direction.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions