:###:
: :
: :
.' '.
: :
|_______|
|kollzsh|
|‐‐‐‐‐‐‐|
| |
:_______:
koll.zsh: keyvez ollama for zsh
An oh-my-zsh plugin that integrates the OLLAMA AI model
with fzf to provide intelligent command
suggestions based on user input requirements.
- Intelligent Command Suggestions: Use OLLAMA, MLX, or llama.cpp to generate relevant terminal commands based on your query or input requirement.
- FZF Integration: Interactively select suggested commands using FZF's fuzzy finder, ensuring you find the right command for your task.
- MLX Support: Run models locally on Apple Silicon using MLX framework for faster inference without a server.
- llama.cpp Support: Run GGUF models locally using llama.cpp for cross-platform local inference.
- Thinking Mode: Use Ctrl-t to run queries in thinking mode (MLX only) for more complex reasoning tasks.
- REPL Mode: Interactive shell with history support - execute commands and see output directly, with options to edit, copy, or run.
- Customizable: Configure default shortcut, model, platform, and response number to suit your workflow.
The plugin includes a Rust binary for fast command generation. To build:
# Clone the repository
git clone https://github.com/keyvez/kollzsh.git ${ZSH_CUSTOM:-~/.oh-my-zsh/custom}/plugins/kollzsh
# Build the Rust binary
cd ${ZSH_CUSTOM:-~/.oh-my-zsh/custom}/plugins/kollzsh
cargo build --releaseIf you don't have Rust installed, the plugin will fall back to Python scripts automatically.
Core Requirements:
fzffor interactive selection of commands
With Rust binary (recommended):
- Rust toolchain (
rustup+cargo) for building
Without Rust binary (fallback):
python3withhttpxpackage
OLLAMAserver running
uvpackage manager (install)- Apple Silicon Mac (M1/M2/M3/M4)
Dependencies (mlx-lm, transformers) are automatically managed by uv at runtime.
- llama.cpp installation with
llama-cliorllama-serverbinary - A GGUF model file
- vLLM server running (
pip install vllm) - NVIDIA GPU with CUDA support
The following environment variables can be set to customize the behavior:
| Variable Name | Description | Default Value |
|---|---|---|
KOLLZSH_PLATFORM |
Platform to use (ollama, MLX, llamacpp, or vllm) |
ollama |
KOLLZSH_MODEL |
Model to use for command generation | qwen2.5-coder:3b |
KOLLZSH_HOTKEY |
Default shortcut key for triggering the plugin | ^o (Ctrl-o) |
KOLLZSH_THINKING_HOTKEY |
Shortcut key for thinking mode (MLX only) | ^t (Ctrl-t) |
KOLLZSH_REPL_HOTKEY |
Shortcut key for REPL mode | ^x^o (Ctrl-x Ctrl-o) |
KOLLZSH_COMMAND_COUNT |
Number of command suggestions displayed | 5 |
KOLLZSH_URL |
API endpoint URL (Ollama only) | http://localhost:11434 |
KOLLZSH_API_KEY |
API key for external APIs (DeepSeek/OpenAI) | None |
KOLLZSH_MAX_TOKENS |
Maximum tokens for MLX response | 1024 |
KOLLZSH_LLAMACPP_PATH |
Path to llama.cpp installation directory | None |
KOLLZSH_LLAMACPP_MODEL |
Path to GGUF model file for llama.cpp | None |
KOLLZSH_LLAMACPP_SERVER_URL |
llama.cpp server URL | http://localhost:8080 |
KOLLZSH_LLAMACPP_N_CTX |
Context size for llama.cpp | 2048 |
KOLLZSH_LLAMACPP_N_GPU_LAYERS |
GPU layers for llama.cpp (-1 for all) | -1 |
KOLLZSH_VLLM_SERVER_URL |
vLLM server URL | http://localhost:8000 |
KOLLZSH_VLLM_MODEL |
Model name for vLLM | None (auto-detect) |
| Variable Name | Value |
|---|---|
KOLLZSH_URL |
https://api.deepseek.com |
KOLLZSH_API_KEY |
apply for an API key (https://platform.deepseek.com/api_keys) |
KOLLZSH_MODEL |
deepseek-chat |
# Use DeepSeek API
export KOLLZSH_URL="https://api.deepseek.com"
export KOLLZSH_API_KEY="your_api_key_here"
export KOLLZSH_MODEL="deepseek-chat"| Variable Name | Value |
|---|---|
KOLLZSH_URL |
https://api.openai.com |
KOLLZSH_API_KEY |
apply for an API key (https://platform.openai.com/api-keys) |
KOLLZSH_MODEL |
gpt-4-turbo-preview |
# Use OpenAI API
export KOLLZSH_URL="https://api.openai.com"
export KOLLZSH_API_KEY="your_api_key_here"
export KOLLZSH_MODEL="gpt-4-turbo-preview"| Variable Name | Value |
|---|---|
KOLLZSH_PLATFORM |
MLX |
KOLLZSH_MODEL |
Qwen/Qwen3-14B-MLX-4bit |
# Use local MLX model on Apple Silicon
export KOLLZSH_PLATFORM="MLX"
export KOLLZSH_MODEL="Qwen/Qwen3-14B-MLX-4bit"
# Optional: increase max tokens for longer responses
export KOLLZSH_MAX_TOKENS="2048"Available MLX Models:
Qwen/Qwen3-14B-MLX-4bit- Recommended for general useQwen/Qwen3-8B-MLX-4bit- Faster, smaller modelmlx-community/Llama-3.2-3B-Instruct-4bit- Llama-based alternative- Any model from mlx-community
Thinking Mode (Ctrl-t):
When using MLX platform, press Ctrl-t to run your query in thinking mode. This
enables the model's internal reasoning (using <think> tags) for more complex
tasks. The thinking process and response will be displayed in the terminal.
llama.cpp supports two modes: CLI mode (runs llama-cli directly) and server mode (connects to llama-server). CLI mode is used automatically when no server is running.
| Variable Name | Value |
|---|---|
KOLLZSH_PLATFORM |
llamacpp |
KOLLZSH_LLAMACPP_PATH |
/path/to/llama.cpp |
KOLLZSH_LLAMACPP_MODEL |
/path/to/model.gguf |
CLI Mode (recommended for simplicity):
# Configure llama.cpp platform with CLI mode
export KOLLZSH_PLATFORM="llamacpp"
export KOLLZSH_LLAMACPP_PATH="/home/user/llama.cpp"
export KOLLZSH_LLAMACPP_MODEL="/home/user/models/qwen2.5-coder-3b-q4_k_m.gguf"
# Optional: customize inference settings
export KOLLZSH_LLAMACPP_N_CTX="4096"
export KOLLZSH_LLAMACPP_N_GPU_LAYERS="-1" # -1 for all layers on GPUWith CLI mode, llama-cli runs directly each time you press Ctrl-o. No server needed!
Server Mode (for faster repeated queries):
If you have llama-server running, it will be used automatically for faster responses:
# Start server manually
llama-server -m /path/to/model.gguf -c 2048 -ngl -1 --port 8080
# Or use the helper function
kollzsh-start-llamacpp# If you only have a server running (no CLI), just set the URL
export KOLLZSH_PLATFORM="llamacpp"
export KOLLZSH_LLAMACPP_SERVER_URL="http://localhost:8080"Recommended GGUF Models:
qwen2.5-coder-3b-instruct-q4_k_m.gguf- Good balance of speed and qualityqwen2.5-coder-7b-instruct-q4_k_m.gguf- Better quality, more resources- Any instruction-tuned GGUF model from Hugging Face
| Variable Name | Value |
|---|---|
KOLLZSH_PLATFORM |
vllm |
KOLLZSH_VLLM_MODEL |
Qwen/Qwen2.5-Coder-3B-Instruct |
# Configure vLLM platform
export KOLLZSH_PLATFORM="vllm"
export KOLLZSH_VLLM_MODEL="Qwen/Qwen2.5-Coder-3B-Instruct"
# Optional: customize server URL (default is port 8000)
export KOLLZSH_VLLM_SERVER_URL="http://localhost:8000"Starting the server:
The vLLM server is NOT auto-started on shell init. Use the helper function:
# Start the server using the helper function
kollzsh-start-vllm
# Stop the server when done
kollzsh-stop-vllmOr start manually:
vllm serve Qwen/Qwen2.5-Coder-3B-Instruct --port 8000Using with an already running server:
# If you already have vLLM running, just set the platform
export KOLLZSH_PLATFORM="vllm"
export KOLLZSH_VLLM_SERVER_URL="http://localhost:8000"
# Model is auto-detected from the running serverRecommended vLLM Models:
Qwen/Qwen2.5-Coder-3B-Instruct- Fast, good for command generationQwen/Qwen2.5-Coder-7B-Instruct- Better qualitymistralai/Ministral-3B-Instruct-2412- Lightweight alternative- Any instruction-tuned model supported by vLLM
-
Clone the repository to
oh-my-zshcustom plugin foldergit clone https://github.com/keyvez/kollzsh.git ${ZSH_CUSTOM:-~/.oh-my-zsh/custom}/plugins/kollzsh -
Build the Rust binary (optional but recommended for faster startup):
cd ${ZSH_CUSTOM:-~/.oh-my-zsh/custom}/plugins/kollzsh cargo build --release
-
Enable the plugin in ~/.zshrc:
plugins=( [plugins...] kollzsh )
-
Input what you want to do then trigger the plugin:
- Press Ctrl-o (default) to get command suggestions via fzf
- Press Ctrl-t (MLX only) to run in thinking mode for complex queries
- Press Ctrl-x Ctrl-o (or type
kollzsh-repl) to enter REPL mode
-
Interact with FZF: Type a query or input requirement, and FZF will display suggested terminal commands. Select one to execute.
REPL mode provides an interactive shell for exploring AI-generated commands:
╔════════════════════════════════════════════════════════════╗
║ 🍺 KOLLZSH REPL (AI-powered command suggestions) ║
╠════════════════════════════════════════════════════════════╣
║ • Type a task description and press Enter ║
║ • Use ↑/↓ arrows to navigate history ║
║ • Select a command with fzf, then choose to run it ║
║ • Type 'exit', 'quit', or press Ctrl-C/Ctrl-D to exit ║
╚════════════════════════════════════════════════════════════╝
kollzsh> list all docker containers
After selecting a command from fzf, you can:
- [r]un - Execute the command and see its output
- [e]dit - Modify the command before running
- [c]opy - Copy to clipboard
- [s]kip - Skip and ask a new question
History is persisted to ~/.local/share/kollzsh/repl_history.
Get Started
Experience the power of AI-driven command suggestions in your MacOS terminal! This plugin is perfect for developers, system administrators, and anyone looking to streamline their workflow.
Let me know if you have any specific requests or changes!
