hermes-livekit

LiveKit WebRTC voice gateway plugin for hermes-agent.

Lets a Hermes gateway join a LiveKit room as an agent participant, transcribe participant speech via Hermes's STT pipeline, run the agent loop, and publish TTS replies back to the room as audio.

Requirements

An existing hermes-agent install (this plugin attaches to it; it does not vendor hermes itself).
ffmpeg on PATH — used to decode TTS audio for the WebRTC publish path.
- macOS: brew install ffmpeg
- Debian / Ubuntu: sudo apt install ffmpeg
A reachable LiveKit server (LiveKit Cloud or self-hosted) with an API key / secret pair.

Install

Install into the same Python environment as your hermes-agent:

pip install git+https://github.com/kortexa-ai/hermes-livekit.git

pip resolves the pinned livekit / livekit-api SDK versions automatically. The plugin is auto-discovered through the hermes_agent.plugins entry-point group — no edits to hermes-agent's source tree are required.

Note: hermes plugins install kortexa-ai/hermes-livekit is not the right path for this plugin. That command git clones into ~/.hermes/plugins/ without resolving pip deps; you'd then have to pip install 'livekit==1.1.7' 'livekit-api==1.1.0' by hand. The pip install above is one command and keeps the SDK pins in sync with the plugin version.

Local / editable install

For development on a checkout (e.g. ~/src/hermes-livekit/):

pip install -e ~/src/hermes-livekit

Enable

After install, add livekit to the enabled-plugins list:

hermes plugins enable livekit

(Or edit ~/.hermes/config.yaml and add livekit to plugins.enabled.)

Then enable the platform in the same config:

platforms:
  livekit:
    enabled: true
plugins:
  enabled:
    - livekit

Configure

Set these env vars (or supply equivalents under platforms.livekit.extra in ~/.hermes/config.yaml):

Var	Required	Notes
`LIVEKIT_URL`	yes	`wss://your-project.livekit.cloud` or `wss://your-self-hosted/`
`LIVEKIT_API_KEY`	yes	from your LiveKit project / server config
`LIVEKIT_API_SECRET`	yes	from your LiveKit project / server config
`LIVEKIT_ROOM`	no	room the agent joins; default `hermes`
`LIVEKIT_AGENT_NAME`	no	display name; default `Hermes` (asks the LLM if unset)
`LIVEKIT_AGENT_AVATAR`	no	avatar URL or local image path (encoded as data URI)
`LIVEKIT_HOME_CHANNEL`	no	cron / cross-platform delivery target; defaults to `LIVEKIT_ROOM`
`LIVEKIT_ALLOWED_USERS`	no	comma-separated participant identities
`LIVEKIT_ALLOW_ALL_USERS`	no	`1`/`true` allows any participant (dev only)
`LIVEKIT_PRESENCE_POLL_INTERVAL`	no	seconds; auto-picked (cloud 30s, local 5s)

Or run the interactive prompt:

hermes config

Verify

hermes gateway restart
hermes gateway status      # should show 🎙️ LiveKit as connected

Join the configured room from any LiveKit client (web, mobile, voice-agent desktop). The agent watches the room when empty and joins as soon as a real participant arrives, then transcribes incoming audio and replies via TTS.

Data channel protocol

Outbound (agent → client) is unchanged from earlier voice-only versions — final text replies on topic hermes-chat, agent:* lifecycle events with no topic. The 0.2.0 release adds an inbound channel for client-driven control + camera snapshots.

Outbound (agent → client)

Topic	Payload	When
`hermes-chat`	UTF-8 text	After agent generates a reply
(no topic)	JSON `{"type": "agent:<...>", "payload": {...}}`	Lifecycle events (see below)

Agent lifecycle event types:

agent:listening-start / agent:listening-stop — VAD detected speech start/end
agent:user-transcript — STT (or typed message) finalized; payload {transcript, final, identity, source?}
agent:thinking-start — agent about to invoke the LLM
agent:speaking-start / agent:speaking-stop — TTS playback boundary
agent:agent-transcript — assistant reply text mirrored on data channel
agent:frame-captured — a video frame was sampled and queued; payload {identity, width, height, bytes, timestamp}
agent:frame-capture-failed — client:capture-frame could not be honored; payload {reason, identity?, detail?}

Remote-tool events (0.3.0+, flat envelope — no payload wrapper, sent only to the owning participant via destination_identities):

agent:tool-registered — ack to client:tool-register; {name, success, reason?, detail?}
agent:tool-unregistered — ack to client:tool-unregister; {name, success, reason?}
agent:tool-call — agent invoking a client-registered tool; {call_id, name, arguments}
agent:tool-call-cancelled — agent loop unwound while the call was in flight; {call_id, name}
agent:tool-call-timeout — plugin timed out waiting for a result (default 30s, override via HERMES_LIVEKIT_TOOL_TIMEOUT_SEC); {call_id, name}

Inbound (client → agent), topic `hermes-control`

JSON payloads of the form {"type": "client:<...>", ...}:

// sample the next frame from this client's published video track
{"type": "client:capture-frame"}

// inject a typed message (skips STT). Pending captures attach automatically.
{"type": "client:message", "text": "what's in this picture?"}

// runtime control hooks
{"type": "client:control", "action": "pause"}    // stop sampling audio
{"type": "client:control", "action": "resume"}   // resume sampling audio

Remote-tool messages (0.3.0+):

// register a tool the agent can call. input_schema is JSON Schema for
// the tool's arguments (`type: object`, with `properties` and `required`).
{
  "type": "client:tool-register",
  "name": "desktop_notify",
  "description": "Show a desktop notification.",
  "input_schema": {
    "type": "object",
    "properties": {"title": {"type": "string"}, "body": {"type": "string"}},
    "required": ["title", "body"]
  }
}

// give back a tool the client no longer wants to offer
{"type": "client:tool-unregister", "name": "desktop_notify"}

// respond to an inbound agent:tool-call (exactly one of result/error)
{"type": "client:tool-result", "call_id": "tc_abc123", "result": {"shown": true}}
{"type": "client:tool-result", "call_id": "tc_abc123", "error": "permission denied"}

For tools to be visible to the LLM, add hermes-livekit-tools to the livekit toolset list in ~/.hermes/config.yaml (platform_toolsets.livekit). The plugin does not auto-activate the toolset.

Tools and pending calls are cleaned up automatically when the registering participant disconnects. Full design and roadmap (large/binary results, multi-client coexistence, native LiveKit RPC pivot) in docs/remote-tools-design.md.

Unknown type values are ignored silently — keeps the topic compatible with apps that share the same data channel for unrelated control traffic.

Video / camera-frame semantics

The agent does not consume video tracks continuously. When you publish a camera as a video track, the adapter just subscribes to it — no frames are decoded until you ask. Send {"type": "client:capture-frame"} on hermes-control and the agent samples the very next frame, encodes it as JPEG (quality 85), and queues it locally.

The frame attaches to the next user message dispatched by the adapter (either a closed voice utterance or a client:message). The hermes agent loop then processes it through its existing image_input_mode: auto vision path — exactly the same code path used by image attachments on other platforms.

Frames captured but never claimed by a message are cleaned up on disconnect. Frames attached to a message stay on disk through the agent turn (the agent loop is fire-and-forget after handle_message).

Status

Experimental. Carved out of the kortexa/gateway-livekit branch on the kortexa-ai/hermes-agent fork (PR NousResearch/hermes-agent#3894) so it can be installed on top of upstream main without patching core.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
docs		docs
examples		examples
hermes_livekit		hermes_livekit
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
PLAN.md		PLAN.md
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

hermes-livekit

Requirements

Install

Local / editable install

Enable

Configure

Verify

Data channel protocol

Outbound (agent → client)

Inbound (client → agent), topic `hermes-control`

Video / camera-frame semantics

Status

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

hermes-livekit

Requirements

Install

Local / editable install

Enable

Configure

Verify

Data channel protocol

Outbound (agent → client)

Inbound (client → agent), topic hermes-control

Video / camera-frame semantics

Status

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Inbound (client → agent), topic `hermes-control`

Packages