-
Notifications
You must be signed in to change notification settings - Fork 385
Expand file tree
/
Copy pathunload_on_switch.yaml
More file actions
50 lines (46 loc) · 1.58 KB
/
unload_on_switch.yaml
File metadata and controls
50 lines (46 loc) · 1.58 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
# Demonstrates the `unload` on_agent_switch builtin hook.
#
# Two agents share Docker Model Runner but use different models that don't
# fit in GPU memory at the same time. Wiring the `unload` builtin into
# each agent's `on_agent_switch` hook chain asks the previous agent's
# DMR endpoint(s) to release GPU memory every time the active agent
# transfers control. The hook is pure: it reads the model snapshot the
# runtime ships on every on_agent_switch dispatch and POSTs to DMR's
# `_unload` endpoint over plain HTTP — no provider-specific runtime
# coupling. For cloud-only providers (OpenAI, Anthropic, ...) the hook
# is a silent no-op since they don't expose an HTTP unload endpoint.
#
# Switching back and forth between `coder` and `reviewer` therefore costs
# one model load per switch instead of failing on out-of-memory.
agents:
coder:
model: qwen3-large
description: Writes Go code on demand.
instruction: |
You write idiomatic, well-tested Go.
When you finish a change, hand off to `reviewer`.
handoffs:
- reviewer
hooks:
on_agent_switch:
- type: builtin
command: unload
reviewer:
model: qwen3-coder
description: Reviews code for clarity and correctness.
instruction: |
You critique Go code written by `coder`. Be concise.
Hand back to `coder` with concrete change requests.
handoffs:
- coder
hooks:
on_agent_switch:
- type: builtin
command: unload
models:
qwen3-large:
provider: dmr
model: ai/qwen3
qwen3-coder:
provider: dmr
model: ai/smollm2