fix(media): resize images once at ingest instead of re-inlining them every turn (#1296)#1345
Conversation
…v#1296) OpenSpec change proposing a fix for netclaw-dev#1296: image egress reads each file in full and inline-base64-copies it on every LLM turn, an OOM vector on the 1Gi daemon. Introduces a shared SkiaSharp-backed ImageNormalizer (shrink-on-decode via SKCodec sample-size) with two seams: normalize-at-ingestion for chat attachments (OpenCode model) and downscale-at-egress + content-hash cache for file_read images (Codex model). Bounds by long-edge (~1568px) AND base64 byte budget (~5MB); fails loud (drop-with-note, never raw passthrough); configurable caps with schema sync. SkiaSharp chosen over ImageSharp (Six Labors Split License), NetVips (LGPL native), Magick.NET (memory-heavy), MagicScaler (asymmetric cross-platform). Files API and A/V egress (netclaw-dev#1266/netclaw-dev#1297) explicitly out of scope. Artifacts: proposal, design, specs (new bounded-image-egress capability + modified netclaw-tools), tasks. Passes strict openspec validation.
…law-dev#1296) Implements tasks 1.1 + 2.x of the bound-image-egress change: the shared, injectable IImageNormalizer that downscales an image to a bounded payload (long-edge cap + base64 byte budget) and re-encodes once, with fail-loud drop. No runtime wiring yet — this is the self-contained, fully-tested core. - Add SkiaSharp 3.119.4 + Linux NoDependencies native asset (MIT; avoids ImageSharp's Six Labors Split License). Referenced from Netclaw.Media. - ImageDecodeMath: pure ChooseDecodeSampleSize (drives JPEG/WebP shrink-on-decode so a full-res bitmap is never materialized) + Base64Length. - SkiaImageNormalizer: scaled decode -> resize-to-cap -> JPEG quality ladder / size shrink to byte budget; opaque->JPEG, alpha->PNG; passthrough when already bounded; never throws (returns Dropped). - 59 passing tests, fixtures generated in-test with SkiaSharp (no binaries). Two design refinements found while building (documented in design.md): - Memory bound is NATIVE (Skia heap), so the planned GC.GetAllocatedBytes assertion was dropped; the bound is proven via unit-tested sample-size + output-dimension assertions instead. - Scaled decode is format-dependent (JPEG/WebP only); PNG/GIF/BMP get a fail-loud 256MiB decode ceiling (drop, don't OOM) rather than full decode.
…fig (netclaw-dev#1296) Revises the change after verifying the file_read data flow in code: - file_read images are NOT re-read from disk each turn. MaterializeModelInputFiles calls SessionMediaStore.CopyFile, persisting them into session media on first use — exactly like chat attachments (WriteDataContent). Both origins are read back via the single ChatMessageConverter egress path. - Therefore the content-hash cache is redundant: normalize at the two SessionMediaStore writers and every image is bounded once at the source. Drop the cache (old task group 4) and collapse "two seams" to one. - Drop runtime config: the byte budget is the memory-safety lever, so a user-raisable knob could re-open the OOM. Bounds are fixed constants; no netclaw-config.v1.schema.json change. Net: removes a capability requirement, a task group, and the schema work; more aligned with the memory-safety goal. The "-and-cache" in the change name is now vestigial (noted in design Non-Goals). Strict validation passes.
…dev#1296) Wires the image normalizer into the single media-store write boundary, so every image bound for a model is bounded exactly once on write and the egress read path never materializes an unbounded payload. - WriteDataContent (chat attachments + persisted message media) and CopyFile (file_read model-input handoff) normalize images before writing; non-image media passes through untouched. The normalized artifact (PNG may become JPEG) replaces the original; the reference's length/MIME reflect it. - CopyFile becomes nullable: a drop returns null, the caller skips it and the existing RequestedCount > MediaReferences.Count gap drives the model-input handoff warning — so a refused image is surfaced, not silent. - Shared stateless SkiaImageNormalizer instance (no DI plumbing through the static call sites); fixed constant bounds (no runtime config). - 6 seam tests: oversized chat/file_read image bounded; non-image passthrough; small image passthrough; undecodable image dropped + nothing written.
…law-dev#1296) The media-store seam decodes every image on write, so fake magic-byte PNG stubs are now correctly dropped as undecodable. Replace them with real, decodable fixtures via a shared Netclaw.Tests.Utilities.TestImages helper: - TestImages.SmallPng() — real opaque PNG small enough to pass through the normalizer byte-for-byte (round-trip assertions hold). - TestImages.Image(w,h,...) — detailed image for downscale/budget fixtures. Updated ChatMessageConverterTests, SessionToolExecutionPipelineTests, LlmSessionImageDeliveryTests, SubAgentActorTests. Full Netclaw.Actors.Tests suite green (2200), solution builds clean, slopwatch + headers pass.
…tclaw-dev#1296) Completes the fail-loud contract for the chat-attachment path. Previously a dropped chat image (undecodable or un-boundable) returned null and vanished silently; now WriteDataContent returns a MediaWriteResult that distinguishes a silent skip (non-media/empty) from a drop carrying the reason. - WriteDataContent -> MediaWriteResult { Reference?, DroppedReason? }. - ChannelPipeline (inbound attachments) and ChatMessageConverter.FromAiMessage append a visible "[image omitted: <reason>]" note to the message content on a drop, so the model sees an attachment was present but couldn't be included. - CopyFile (file_read path) unchanged — it still surfaces drops via the existing model-input handoff warning. - Tests: converter appends the note + preserves text on drop; seam tests assert the reason is carried. Full Actors suite green (2201).
…it (netclaw-dev#1296) Adds a `netclaw doctor` check that fails loud if SkiaSharp's native libSkiaSharp can't load — turning a packaging regression (the native asset missing from the binary) into an up-front diagnostic instead of a runtime throw deep in the channel/tool pipeline when the first image arrives. - ImageNormalizerProbe.TryProbe() (Netclaw.Media): encode -> normalize round-trip; returns null on success or a short error. Keeps native usage encapsulated in the media library. - ImageProcessingDoctorCheck (Netclaw.Cli): thin check that maps the probe to Pass/Error with remediation; registered alongside the other native-dep probe (sqlite). - Netclaw.Cli.csproj: IncludeNativeLibrariesForSelfExtract=true so a manual `dotnet publish` of the CLI embeds libSkiaSharp (mirrors the daemon csproj; the release script already passes the flag to every component). Docker (1.2): no Dockerfile change needed — self-contained single-file publish with IncludeNativeLibrariesForSelfExtract embeds + self-extracts the native lib (same mechanism as e_sqlite3); the single-binary COPY already carries it. Tests: probe returns null with the real native lib; doctor check passes. All 143 Doctor tests green; solution builds clean; slopwatch + headers pass.
…claw-dev#1296) Group 5: evals not triggered (no identity/skill/tool-schema/memory/model change; model-facing contract unchanged) and no checked-in benchmark warranted — the OOM metric is native (Skia) memory, which BenchmarkDotNet's MemoryDiagnoser can't see, and normalization is off the hot path. Recorded one-off before/after numbers in design.md (e.g. 64MP source decoded bitmap 244MB -> 15MB; payload -> 123KB).
…ormat (netclaw-dev#1296) Resolves the max-effort review findings: - HIGH: the normalizer no longer throws. Normalize wraps its SkiaSharp calls in try/catch and returns Dropped (honoring the IImageNormalizer "never throws" contract), so a native-load failure / native OOM / malformed codec degrades to a per-image drop instead of faulting the whole tool turn or the channel ingestion stream. - The normalizer now ONLY resizes and keeps the source container format — a PNG stays PNG, a JPEG stays JPEG. No PNG->JPEG transcode, no quality ladder, no iterative byte-shrink. Screenshots/diagrams keep their lossless fidelity. - GIF (and any format Skia can't re-encode) passes through untouched when within the byte budget, else drops — animation is never silently flattened. - Only model-input-eligible images are normalized; bmp/tiff (image but not model-input, never inlined) are stored byte-for-byte. - Batch budget releases the over-reservation after CopyFile, so large-but- downscalable file_read images are no longer rejected on pre-resize source size. - Enabled=false bypass returns a null MediaType so the caller keeps its declared MIME (was mislabeling JPEG/WebP as image/png). - Cleanups: SKData.CreateCopy(span) drops a managed copy; drop-reason byte budget is formatted B/KB/MB (no more "0MB"). Media tests 62, Actors 2202, doctor probe green; solution builds clean.
Aaronontheweb
left a comment
There was a problem hiding this comment.
Overall looks good - probably need to do another pass on some of this in a follow-up PR
| } | ||
|
|
||
| [Fact] | ||
| public void FromAiMessage_appends_omitted_note_when_image_is_dropped() |
| /// <summary>Disposable temp directory for session media tests.</summary> | ||
| // A real, decodable PNG small enough that the egress image normalizer passes it | ||
| // through byte-for-byte (so media round-trip assertions still hold). | ||
| private static byte[] SmallPng(int size = 16) |
| /// <summary> | ||
| /// Outcome of running an image through <see cref="IImageNormalizer"/>. | ||
| /// </summary> | ||
| public enum ImageNormalizationOutcome |
| /// <see cref="ImageNormalizationOutcome.Dropped"/> the caller MUST attach no image | ||
| /// and surface <see cref="Reason"/> as a visible note (fail-loud, never raw passthrough). | ||
| /// </summary> | ||
| public sealed record ImageNormalizationResult |
There was a problem hiding this comment.
probably should use some value objects here depending on how frequently this stuff is used.
| /// Runs a tiny encode → normalize round-trip. Returns <c>null</c> when imaging | ||
| /// works, or a short error string describing why it does not. | ||
| /// </summary> | ||
| public static string? TryProbe() |
There was a problem hiding this comment.
probably should use a real result type here rather than relying on null
| <ItemGroup> | ||
| <PackageReference Include="Microsoft.Extensions.Http" /> | ||
| <PackageReference Include="SkiaSharp" /> | ||
| <PackageReference Include="SkiaSharp.NativeAssets.Linux.NoDependencies" /> |
There was a problem hiding this comment.
what about Windows and OS X?
There was a problem hiding this comment.
They come in transitively — the base SkiaSharp package depends on NativeAssets.Win32 and NativeAssets.macOS. Desktop Linux is the only platform SkiaSharp doesn't pull by default (the fontconfig thing), so it's the only explicit add. CI confirmed it: Test-windows-latest and Test-macos-26 both passed here. Added a comment in #1348 noting this.
| public sealed class ImageProcessingDoctorCheckTests | ||
| { | ||
| [Fact] | ||
| public async Task Passes_when_native_imaging_library_is_available() |
| /// egress fails at runtime. This check turns that packaging regression into a loud, | ||
| /// up-front diagnostic. | ||
| /// </summary> | ||
| public sealed class ImageProcessingDoctorCheck : IDoctorCheck |
There was a problem hiding this comment.
glad we check for this because this would be a tremendous PITA to debug without it
| var mediaRef = SessionMediaStore.WriteDataContent(data, sessionDir); | ||
| if (mediaRef is not null) | ||
| mediaRefs.Add(mediaRef); | ||
| var write = SessionMediaStore.WriteDataContent(data, sessionDir); |
There was a problem hiding this comment.
this was a very smooth way to integrate the image re-sizing / sampling into the main LLM session pipeline without too much fanfare. very nice.
| var mediaRef = SessionMediaStore.WriteDataContent(data, sessionDir); | ||
| if (mediaRef is not null) | ||
| mediaRefs.Add(mediaRef); | ||
| var write = SessionMediaStore.WriteDataContent(data, sessionDir); |
There was a problem hiding this comment.
why is this duplicated between the ChannelPipeline and here?
…t, dedup (#1348) Follow-up to the image-egress PR: - Wrap the normalizer's bare ints in tidy value objects so px and bytes can't be confused at a call site: Pixels, ByteSize, JpegQuality. ByteSize.ToString folds in the human B/KB/MB formatting (replacing the FormatBudget helper). - ImageNormalizerProbe returns a real ImagingProbeResult { IsWorking, Error } instead of null-as-success. - Dedup the "write DataContent -> add reference or append [image omitted] note" logic that was copy-pasted in ChannelPipeline and ChatMessageConverter into one SessionMediaStore.WriteMediaInto helper; AppendOmittedImageNote is now private. - Comments: spell out what hits the "not a decodable image" drop; leave a breadcrumb where an iterative size-reduction loop would go if over-budget drops get common; note in Tests.Utilities.csproj that Win/macOS native assets come transitively from base SkiaSharp (only Linux needs the explicit add). Media tests 65, Actors 2211; solution builds clean; slopwatch + headers pass.
Closes #1296.
The problem
Every image still in the un-compacted window gets read off disk in full and base64'd into the request on every turn. Nothing resizes it; the only cap is admission (25MB/file, 10 files/message). So one message can put ~660MB of base64 into a single request, then do it again next turn. That's an OOM waiting to happen on a 1Gi daemon, and it's wasted work: providers downscale server-side anyway, so the model never sees the extra pixels.
What this does
Adds a SkiaSharp-backed
ImageNormalizerinNetclaw.Mediathat does exactly one thing: resize. If an image's long edge is over ~1568px, it scales down and re-encodes in the same format it came in as. PNG stays PNG, JPEG stays JPEG. No transcoding, no quality games. Scaled decode (JPEG DCT) means we don't fully materialize an oversized source, and a 256MiB decode ceiling drops anything pathological rather than risk the OOM we're here to fix.It hooks in at one place: the two
SessionMediaStorewriters,WriteDataContent(chat attachments) andCopyFile(thefile_readhandoff). Both get read back through the sameChatMessageConverteregress path, so an image is bounded once when it lands and that's it. No per-turn work, no cache to keep coherent — the media store already is the dedup.If an image can't be bounded by resizing alone (or it's a format Skia can't re-encode, like a GIF that's over budget), we drop it and say so: a visible
[image omitted: ...]note on the chat path, the handoff warning on thefile_readpath. We never silently degrade it or ship it raw.The caps are constants, not config. The byte budget is what keeps us under the memory ceiling, so a knob to raise it would just hand the OOM back.
Out of scope
file_id. None of the OSS coding agents I looked at use it for images, and the Anthropic SDK we wrap can't reach it anyway.Packaging
The self-contained single-file publish already embeds and self-extracts
libSkiaSharp(same pathe_sqlite3takes), so no Dockerfile change. I added anetclaw doctorcheck that loads the native lib and fails loudly if it can't, so a packaging regression shows up as a clean diagnostic instead of a stack trace the first time someone sends an image. Also flippedIncludeNativeLibrariesForSelfExtracton the CLI so the probe's copy ships too.What it saves
Native decoded-bitmap memory, which is the metric that actually OOMs us:
Per image, and paid once at ingestion now instead of every turn.
On the review
I ran this through a multi-agent review before opening it and it earned its keep — found 6 real bugs, all fixed here:
file_read) or tear down channel ingestion (chat). It now catches and returns a drop, which is what the contract claimed all along.Enabled=falseMIME mislabel and two cleanups."Just resize, keep the format" is actually the second design. The first one transcoded opaque PNGs to JPEG, which is both lossier (ringing on screenshots) and often bigger. Glad that one didn't ship.
Tests
Netclaw.Media.Tests(62): the decode-sample math, resize/passthrough/drop behavior, GIF passthrough and over-budget drop, format preservation, the memory ceiling, the doctor probe.Netclaw.Actors.Tests(2202): both media-store writers, non-image and bmp/tiff passthrough, the drop +[image omitted]note, model-input materialization.Planning and specs live under
openspec/changes/bound-image-egress-with-downscale-and-cache/if you want the long version.