Context
Part of #415. Both applyMediaUnderstanding() and applyLinkUnderstanding() pipelines must be gutted.
Architectural decision: Middleware routes media and links to CLI runtimes — it does NOT process them itself. Text extraction, image vision, video vision, and URL content fetching are the CLI agent's job, not ours. The multimodal contract (#385) and runtime implementations (#397, #396) ensure the path from channel → CLI exists. STT/TTS is the only exception because it's channel-level infrastructure.
Blast radius
src/media-understanding/ — 49 files, ~5.5k lines
Consumers to decouple/remove:
| Consumer |
Imports |
Action |
src/auto-reply/reply/get-reply.ts |
applyMediaUnderstanding |
Remove call |
src/auto-reply/templating.ts |
Types |
Remove media-understanding types |
src/auto-reply/status.ts |
MediaUnderstandingDecision |
Remove media decision display |
src/auto-reply/reply/commands-status.ts |
MediaUnderstandingDecision |
Remove |
src/auto-reply/reply/commands-info.ts |
Media understanding refs |
Remove |
src/auto-reply/reply/get-reply-inline-actions.ts |
Media understanding refs |
Remove |
src/auto-reply/reply/get-reply-directives-apply.ts |
Media understanding refs |
Remove |
src/auto-reply/media-note.ts + test |
Media decision types |
Remove or simplify |
src/discord/voice/manager.ts |
runCapability from runner |
Switch to src/stt/ directly |
src/stt/preflight.ts |
isAudioAttachment, runCapability, types |
Relocate needed utilities into src/stt/ |
src/stt/providers/shared.ts |
Shared provider utilities |
Relocate into src/stt/providers/ |
src/stt/providers/google/audio.ts |
generateGeminiInlineDataText |
Relocate into src/stt/ |
src/stt/providers/audio.test-helpers.ts |
Test helpers |
Relocate into src/stt/ |
Config schema to remove: MediaUnderstandingScopeSchema, MediaUnderstandingCapabilitiesSchema, MediaUnderstandingAttachmentsSchema, MediaUnderstandingModelSchema, ToolsMediaUnderstandingSchema in src/config/zod-schema.core.ts. Types in src/config/types.tools.ts.
src/link-understanding/ — 6 files, ~333 lines
Consumers to remove:
| Consumer |
Imports |
Action |
src/auto-reply/reply/get-reply.ts |
applyLinkUnderstanding |
Remove call |
src/auto-reply/reply/get-reply.reset-hooks-fallback.test.ts |
Mock |
Remove mock |
src/auto-reply/templating.ts |
LinkUnderstanding context field |
Remove field |
Config schema to remove: LinkToolsConfig, LinkModelConfig types, tools.links config field, schema labels/help entries.
Note: link-understanding imports shared utilities from media-understanding (CLI_OUTPUT_MAX_BUFFER, resolveTimeoutMs, resolveMediaUnderstandingScope) — both modules die together, no relocation needed for these.
Work order
- Relocate STT shared utilities — move what
src/stt/ actually needs out of media-understanding/ into src/stt/
- Switch discord/voice — use
src/stt/ directly instead of media-understanding runner
- Remove both pipeline calls from
get-reply.ts (applyMediaUnderstanding, applyLinkUnderstanding)
- Remove auto-reply consumers — media decision types, status display, media note, link understanding context field
- Remove config schema —
tools.media (image/video/audio vision config), tools.links
- Delete
src/media-understanding/ and src/link-understanding/ entirely
- Verify — all tests pass, STT still works, voice messages work, no orphan imports
Depends on
Does NOT depend on
Related
Context
Part of #415. Both
applyMediaUnderstanding()andapplyLinkUnderstanding()pipelines must be gutted.Architectural decision: Middleware routes media and links to CLI runtimes — it does NOT process them itself. Text extraction, image vision, video vision, and URL content fetching are the CLI agent's job, not ours. The multimodal contract (#385) and runtime implementations (#397, #396) ensure the path from channel → CLI exists. STT/TTS is the only exception because it's channel-level infrastructure.
Blast radius
src/media-understanding/— 49 files, ~5.5k linesConsumers to decouple/remove:
src/auto-reply/reply/get-reply.tsapplyMediaUnderstandingsrc/auto-reply/templating.tssrc/auto-reply/status.tsMediaUnderstandingDecisionsrc/auto-reply/reply/commands-status.tsMediaUnderstandingDecisionsrc/auto-reply/reply/commands-info.tssrc/auto-reply/reply/get-reply-inline-actions.tssrc/auto-reply/reply/get-reply-directives-apply.tssrc/auto-reply/media-note.ts+ testsrc/discord/voice/manager.tsrunCapabilityfrom runnersrc/stt/directlysrc/stt/preflight.tsisAudioAttachment,runCapability, typessrc/stt/src/stt/providers/shared.tssrc/stt/providers/src/stt/providers/google/audio.tsgenerateGeminiInlineDataTextsrc/stt/src/stt/providers/audio.test-helpers.tssrc/stt/Config schema to remove:
MediaUnderstandingScopeSchema,MediaUnderstandingCapabilitiesSchema,MediaUnderstandingAttachmentsSchema,MediaUnderstandingModelSchema,ToolsMediaUnderstandingSchemainsrc/config/zod-schema.core.ts. Types insrc/config/types.tools.ts.src/link-understanding/— 6 files, ~333 linesConsumers to remove:
src/auto-reply/reply/get-reply.tsapplyLinkUnderstandingsrc/auto-reply/reply/get-reply.reset-hooks-fallback.test.tssrc/auto-reply/templating.tsLinkUnderstandingcontext fieldConfig schema to remove:
LinkToolsConfig,LinkModelConfigtypes,tools.linksconfig field, schema labels/help entries.Note: link-understanding imports shared utilities from media-understanding (
CLI_OUTPUT_MAX_BUFFER,resolveTimeoutMs,resolveMediaUnderstandingScope) — both modules die together, no relocation needed for these.Work order
src/stt/actually needs out ofmedia-understanding/intosrc/stt/src/stt/directly instead of media-understanding runnerget-reply.ts(applyMediaUnderstanding,applyLinkUnderstanding)tools.media(image/video/audio vision config),tools.linkssrc/media-understanding/andsrc/link-understanding/entirelyDepends on
Does NOT depend on
Related