Blink is a native macOS menu-bar AI companion. Hold ctrl+option for voice push-to-talk, or double-tap ctrl for the text bar. Blink captures your screen, understands what you're looking at, replies in voice, and can drive the UI on your behalf — click buttons, open apps, run searches, fill in text — through the macOS Accessibility tree (no pixel guessing).
Download Blink → — free, universal binary (Apple Silicon + Intel), macOS 14.2+. Bring your own API keys. Installed copies auto-update in place via Sparkle, and updates keep your granted permissions.
- Voice push-to-talk. Hold ctrl+option, ask anything, let go. Blink answers in one or two sentences and only speaks when asked.
- Text mode. Double-tap ctrl to summon a floating composer near the cursor. Slash commands (
/agent,/voice,/screen, …) and@mentions are inline. The text bar uses the same brain router as voice, so "check the weather in tokyo" reachesweb_searchinstead of dead-ending. - Agentic UI control. Blink finds buttons, menu items, links, checkboxes, tabs in any focused app through the Accessibility tree and invokes them by label — "click Stop Sharing", "press Send", "start sharing screen". When a label miss happens, the agent calls
inspect_uito see what's actually on screen as structured data (role + label + coordinates), then clicks the right thing. No randomly-clicked pixels. - Composite actions.
web_search,new_tab,open_app,click_button,inspect_ui,type_text,key_press,scroll,wait_for_appcover most everyday tasks. - Sees your screen. ScreenCaptureKit screenshots feed the model on demand, so questions like "what is this" or "where do I click" work against the actual UI.
- App-aware RAG. Built-in knowledge for Onshape, Blender, Photoshop, Illustrator, and Figma — answers to "how do I extrude this" use the bundled knowledge base, not just generic web facts.
- Cross-session memory. Blink embeds each exchange and recalls relevant past conversations on later launches, scoped to the app you're focused on — so "what's my dog's name" still works tomorrow. Serverless and on-device: no setup, just an OpenAI or Hugging Face key for the embeddings.
- Agent Mode. Send Blink longer jobs — research, refactors, file work, settings tweaks — and it runs them in the background through a bundled Codex runtime without taking the screen.
- Pluggable transcription. Apple Speech (local), AssemblyAI, Deepgram, OpenAI Whisper, or Mistral Voxtral via the HuggingFace router. Picked from Settings → Voice.
- Apple Liquid Glass surfaces. The floating panel, overlay, and cards use translucent system materials; the settings window uses a flat graphite dark theme. No dark gradients.
- Local-only. API keys live in
~/.config/blink/secrets.env. Nothing ships through a hosted proxy. A local control bridge at127.0.0.1:32123lets other trusted local apps drive the overlay, screenshots, captions, and TTS.
- macOS 14.2 or newer
- Xcode 16 with the macOS SDK
- An Apple Developer team configured in Xcode for local signing
mkdir -p ~/.config/blink
chmod 700 ~/.config/blink
$EDITOR ~/.config/blink/secrets.env
chmod 600 ~/.config/blink/secrets.envInside the file:
ANTHROPIC_API_KEY=your_anthropic_key
ELEVENLABS_API_KEY=your_elevenlabs_key
ELEVENLABS_VOICE_ID=your_elevenlabs_voice_id
OPENAI_API_KEY=your_openai_or_codex_key
# Optional: open-source Codex backend AND Voxtral transcription via the
# HuggingFace Inference Router. Set HUGGINGFACE_API_KEY here, then either:
# - enable the HF agent backend:
# defaults write com.blink.blink blinkAgentBackend huggingface
# (default agent model is meta-llama/Llama-3.3-70B-Instruct)
# - or pick Voxtral in Settings → Voice to use
# mistralai/Voxtral-Small-24B-2507 for transcription.
HUGGINGFACE_API_KEY=your_hf_tokenThen open the Xcode project, set your signing team, and Cmd+R:
open Blink.xcodeprojBlink's long-term memory is serverless and on-device — no background server to run, no extra setup. Each exchange is embedded and stored in a local vector store at ~/Library/Application Support/Blink/conversation-memory.json, and relevant past conversations are recalled on later launches, scoped to the app you're focused on.
Embeddings are computed client-side: Blink uses OPENAI_API_KEY (text-embedding-3-small) when set, otherwise the HuggingFace router (all-MiniLM-L6-v2). So memory just needs one of those keys — there's nothing else to install. Switching embedding providers changes the vector dimension, so reset memory in Settings if you change which key you use.
On first launch, grant Microphone, Accessibility, and Screen Recording when macOS prompts. Accessibility is required for:
- the global ctrl+option push-to-talk shortcut to work outside Blink's own windows,
- the agent's
click_button/inspect_uitools to read and invoke other apps' UI controls, - the cursor overlay to position itself over the focused window.
| Action | Shortcut |
|---|---|
| Push-to-talk | hold ctrl + option |
| Text bar | double-tap ctrl |
| Dismiss text bar | esc |
| Clear text bar draft | x button (right side of the bar) |
| Submit text bar | return or ↑ button |
Blink/— app sources (SwiftUI + AppKit bridging)BlinkAgentLoop.swift— direct tool-use agent loop, AX-based click helpers, intent routerCompanionManager.swift— central app state machine, text-mode bar, push-to-talk wiringBuddy*TranscriptionProvider.swift,VoxtralHFTranscriptionProvider.swift— pluggable transcription providersBlinkComputerUseRuntime.swift/BlinkComputerUseModels.swift— CGEvent mouse/keyboard primitives, window enumeration
BlinkTests/,BlinkUITests/— unit and UI testsBlinkWidgets/— WidgetKit extensionAppResources/Blink/— bundled Codex runtime, skill packs, and wiki seed
See CONTRIBUTING.md.
MIT.