Skip to content

jasonca2023/blink

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Blink

Blink is a native macOS menu-bar AI companion. Hold ctrl+option for voice push-to-talk, or double-tap ctrl for the text bar. Blink captures your screen, understands what you're looking at, replies in voice, and can drive the UI on your behalf — click buttons, open apps, run searches, fill in text — through the macOS Accessibility tree (no pixel guessing).

Download Blink → — free, universal binary (Apple Silicon + Intel), macOS 14.2+. Bring your own API keys. Installed copies auto-update in place via Sparkle, and updates keep your granted permissions.

Features

  • Voice push-to-talk. Hold ctrl+option, ask anything, let go. Blink answers in one or two sentences and only speaks when asked.
  • Text mode. Double-tap ctrl to summon a floating composer near the cursor. Slash commands (/agent, /voice, /screen, …) and @ mentions are inline. The text bar uses the same brain router as voice, so "check the weather in tokyo" reaches web_search instead of dead-ending.
  • Agentic UI control. Blink finds buttons, menu items, links, checkboxes, tabs in any focused app through the Accessibility tree and invokes them by label — "click Stop Sharing", "press Send", "start sharing screen". When a label miss happens, the agent calls inspect_ui to see what's actually on screen as structured data (role + label + coordinates), then clicks the right thing. No randomly-clicked pixels.
  • Composite actions. web_search, new_tab, open_app, click_button, inspect_ui, type_text, key_press, scroll, wait_for_app cover most everyday tasks.
  • Sees your screen. ScreenCaptureKit screenshots feed the model on demand, so questions like "what is this" or "where do I click" work against the actual UI.
  • App-aware RAG. Built-in knowledge for Onshape, Blender, Photoshop, Illustrator, and Figma — answers to "how do I extrude this" use the bundled knowledge base, not just generic web facts.
  • Cross-session memory. Blink embeds each exchange and recalls relevant past conversations on later launches, scoped to the app you're focused on — so "what's my dog's name" still works tomorrow. Serverless and on-device: no setup, just an OpenAI or Hugging Face key for the embeddings.
  • Agent Mode. Send Blink longer jobs — research, refactors, file work, settings tweaks — and it runs them in the background through a bundled Codex runtime without taking the screen.
  • Pluggable transcription. Apple Speech (local), AssemblyAI, Deepgram, OpenAI Whisper, or Mistral Voxtral via the HuggingFace router. Picked from Settings → Voice.
  • Apple Liquid Glass surfaces. The floating panel, overlay, and cards use translucent system materials; the settings window uses a flat graphite dark theme. No dark gradients.
  • Local-only. API keys live in ~/.config/blink/secrets.env. Nothing ships through a hosted proxy. A local control bridge at 127.0.0.1:32123 lets other trusted local apps drive the overlay, screenshots, captions, and TTS.

Requirements

  • macOS 14.2 or newer
  • Xcode 16 with the macOS SDK
  • An Apple Developer team configured in Xcode for local signing

Setup

mkdir -p ~/.config/blink
chmod 700 ~/.config/blink
$EDITOR ~/.config/blink/secrets.env
chmod 600 ~/.config/blink/secrets.env

Inside the file:

ANTHROPIC_API_KEY=your_anthropic_key
ELEVENLABS_API_KEY=your_elevenlabs_key
ELEVENLABS_VOICE_ID=your_elevenlabs_voice_id
OPENAI_API_KEY=your_openai_or_codex_key

# Optional: open-source Codex backend AND Voxtral transcription via the
# HuggingFace Inference Router. Set HUGGINGFACE_API_KEY here, then either:
#   - enable the HF agent backend:
#       defaults write com.blink.blink blinkAgentBackend huggingface
#     (default agent model is meta-llama/Llama-3.3-70B-Instruct)
#   - or pick Voxtral in Settings → Voice to use
#     mistralai/Voxtral-Small-24B-2507 for transcription.
HUGGINGFACE_API_KEY=your_hf_token

Then open the Xcode project, set your signing team, and Cmd+R:

open Blink.xcodeproj

Cross-session memory

Blink's long-term memory is serverless and on-device — no background server to run, no extra setup. Each exchange is embedded and stored in a local vector store at ~/Library/Application Support/Blink/conversation-memory.json, and relevant past conversations are recalled on later launches, scoped to the app you're focused on.

Embeddings are computed client-side: Blink uses OPENAI_API_KEY (text-embedding-3-small) when set, otherwise the HuggingFace router (all-MiniLM-L6-v2). So memory just needs one of those keys — there's nothing else to install. Switching embedding providers changes the vector dimension, so reset memory in Settings if you change which key you use.

On first launch, grant Microphone, Accessibility, and Screen Recording when macOS prompts. Accessibility is required for:

  • the global ctrl+option push-to-talk shortcut to work outside Blink's own windows,
  • the agent's click_button / inspect_ui tools to read and invoke other apps' UI controls,
  • the cursor overlay to position itself over the focused window.

Triggers

Action Shortcut
Push-to-talk hold ctrl + option
Text bar double-tap ctrl
Dismiss text bar esc
Clear text bar draft x button (right side of the bar)
Submit text bar return or button

Project layout

  • Blink/ — app sources (SwiftUI + AppKit bridging)
    • BlinkAgentLoop.swift — direct tool-use agent loop, AX-based click helpers, intent router
    • CompanionManager.swift — central app state machine, text-mode bar, push-to-talk wiring
    • Buddy*TranscriptionProvider.swift, VoxtralHFTranscriptionProvider.swift — pluggable transcription providers
    • BlinkComputerUseRuntime.swift / BlinkComputerUseModels.swift — CGEvent mouse/keyboard primitives, window enumeration
  • BlinkTests/, BlinkUITests/ — unit and UI tests
  • BlinkWidgets/ — WidgetKit extension
  • AppResources/Blink/ — bundled Codex runtime, skill packs, and wiki seed

Contributing

See CONTRIBUTING.md.

License

MIT.

About

A cursor that listens. macOS menu bar app that replaces your system cursor with a custom triangle that pulses, then transforms in place when you talk to it. Designed to listen and help with any question you ask.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors