Skip to content

fix(media): resize images once at ingest instead of re-inlining them every turn (#1296)#1345

Merged
Aaronontheweb merged 14 commits into
netclaw-dev:devfrom
Aaronontheweb:investigate-1296-image-egress
Jun 7, 2026
Merged

fix(media): resize images once at ingest instead of re-inlining them every turn (#1296)#1345
Aaronontheweb merged 14 commits into
netclaw-dev:devfrom
Aaronontheweb:investigate-1296-image-egress

Conversation

@Aaronontheweb

@Aaronontheweb Aaronontheweb commented Jun 7, 2026

Copy link
Copy Markdown
Collaborator

Closes #1296.

The problem

Every image still in the un-compacted window gets read off disk in full and base64'd into the request on every turn. Nothing resizes it; the only cap is admission (25MB/file, 10 files/message). So one message can put ~660MB of base64 into a single request, then do it again next turn. That's an OOM waiting to happen on a 1Gi daemon, and it's wasted work: providers downscale server-side anyway, so the model never sees the extra pixels.

What this does

Adds a SkiaSharp-backed ImageNormalizer in Netclaw.Media that does exactly one thing: resize. If an image's long edge is over ~1568px, it scales down and re-encodes in the same format it came in as. PNG stays PNG, JPEG stays JPEG. No transcoding, no quality games. Scaled decode (JPEG DCT) means we don't fully materialize an oversized source, and a 256MiB decode ceiling drops anything pathological rather than risk the OOM we're here to fix.

It hooks in at one place: the two SessionMediaStore writers, WriteDataContent (chat attachments) and CopyFile (the file_read handoff). Both get read back through the same ChatMessageConverter egress path, so an image is bounded once when it lands and that's it. No per-turn work, no cache to keep coherent — the media store already is the dedup.

If an image can't be bounded by resizing alone (or it's a format Skia can't re-encode, like a GIF that's over budget), we drop it and say so: a visible [image omitted: ...] note on the chat path, the handoff warning on the file_read path. We never silently degrade it or ship it raw.

The caps are constants, not config. The byte budget is what keeps us under the memory ceiling, so a knob to raise it would just hand the OOM back.

Out of scope

Packaging

The self-contained single-file publish already embeds and self-extracts libSkiaSharp (same path e_sqlite3 takes), so no Dockerfile change. I added a netclaw doctor check that loads the native lib and fails loudly if it can't, so a packaging regression shows up as a clean diagnostic instead of a stack trace the first time someone sends an image. Also flipped IncludeNativeLibrariesForSelfExtract on the CLI so the probe's copy ships too.

What it saves

Native decoded-bitmap memory, which is the metric that actually OOMs us:

Source decoded bitmap, full → shrink-on-decode
8000×8000 (64MP) 244 MB → 15 MB
6000×4000 (24MP) 92 MB → 23 MB
3840×2160 (4K) 32 MB → 8 MB

Per image, and paid once at ingestion now instead of every turn.

On the review

I ran this through a multi-agent review before opening it and it earned its keep — found 6 real bugs, all fixed here:

  • The normalizer could throw on a native failure, which would fault the whole turn (file_read) or tear down channel ingestion (chat). It now catches and returns a drop, which is what the contract claimed all along.
  • GIF/BMP/TIFF were getting silently transcoded, and GIFs lost their animation. Now we keep the source format, and bmp/tiff (which the model can't ingest anyway) are stored byte-for-byte.
  • The batch byte budget charged the pre-resize source size, so it rejected large-but-downscalable images that would've fit fine after resizing.
  • Plus an Enabled=false MIME mislabel and two cleanups.

"Just resize, keep the format" is actually the second design. The first one transcoded opaque PNGs to JPEG, which is both lossier (ringing on screenshots) and often bigger. Glad that one didn't ship.

Tests

  • Netclaw.Media.Tests (62): the decode-sample math, resize/passthrough/drop behavior, GIF passthrough and over-budget drop, format preservation, the memory ceiling, the doctor probe.
  • Netclaw.Actors.Tests (2202): both media-store writers, non-image and bmp/tiff passthrough, the drop + [image omitted] note, model-input materialization.
  • Clean build (0 warnings), slopwatch clean, headers verified.

Planning and specs live under openspec/changes/bound-image-egress-with-downscale-and-cache/ if you want the long version.

…v#1296)

OpenSpec change proposing a fix for netclaw-dev#1296: image egress reads each file in
full and inline-base64-copies it on every LLM turn, an OOM vector on the 1Gi
daemon. Introduces a shared SkiaSharp-backed ImageNormalizer (shrink-on-decode
via SKCodec sample-size) with two seams: normalize-at-ingestion for chat
attachments (OpenCode model) and downscale-at-egress + content-hash cache for
file_read images (Codex model). Bounds by long-edge (~1568px) AND base64 byte
budget (~5MB); fails loud (drop-with-note, never raw passthrough); configurable
caps with schema sync.

SkiaSharp chosen over ImageSharp (Six Labors Split License), NetVips (LGPL
native), Magick.NET (memory-heavy), MagicScaler (asymmetric cross-platform).
Files API and A/V egress (netclaw-dev#1266/netclaw-dev#1297) explicitly out of scope.

Artifacts: proposal, design, specs (new bounded-image-egress capability +
modified netclaw-tools), tasks. Passes strict openspec validation.
…law-dev#1296)

Implements tasks 1.1 + 2.x of the bound-image-egress change: the shared,
injectable IImageNormalizer that downscales an image to a bounded payload
(long-edge cap + base64 byte budget) and re-encodes once, with fail-loud drop.
No runtime wiring yet — this is the self-contained, fully-tested core.

- Add SkiaSharp 3.119.4 + Linux NoDependencies native asset (MIT; avoids
  ImageSharp's Six Labors Split License). Referenced from Netclaw.Media.
- ImageDecodeMath: pure ChooseDecodeSampleSize (drives JPEG/WebP shrink-on-decode
  so a full-res bitmap is never materialized) + Base64Length.
- SkiaImageNormalizer: scaled decode -> resize-to-cap -> JPEG quality ladder /
  size shrink to byte budget; opaque->JPEG, alpha->PNG; passthrough when already
  bounded; never throws (returns Dropped).
- 59 passing tests, fixtures generated in-test with SkiaSharp (no binaries).

Two design refinements found while building (documented in design.md):
- Memory bound is NATIVE (Skia heap), so the planned GC.GetAllocatedBytes
  assertion was dropped; the bound is proven via unit-tested sample-size +
  output-dimension assertions instead.
- Scaled decode is format-dependent (JPEG/WebP only); PNG/GIF/BMP get a
  fail-loud 256MiB decode ceiling (drop, don't OOM) rather than full decode.
…fig (netclaw-dev#1296)

Revises the change after verifying the file_read data flow in code:

- file_read images are NOT re-read from disk each turn. MaterializeModelInputFiles
  calls SessionMediaStore.CopyFile, persisting them into session media on first
  use — exactly like chat attachments (WriteDataContent). Both origins are read
  back via the single ChatMessageConverter egress path.
- Therefore the content-hash cache is redundant: normalize at the two
  SessionMediaStore writers and every image is bounded once at the source. Drop
  the cache (old task group 4) and collapse "two seams" to one.
- Drop runtime config: the byte budget is the memory-safety lever, so a
  user-raisable knob could re-open the OOM. Bounds are fixed constants; no
  netclaw-config.v1.schema.json change.

Net: removes a capability requirement, a task group, and the schema work; more
aligned with the memory-safety goal. The "-and-cache" in the change name is now
vestigial (noted in design Non-Goals). Strict validation passes.
…dev#1296)

Wires the image normalizer into the single media-store write boundary, so every
image bound for a model is bounded exactly once on write and the egress read path
never materializes an unbounded payload.

- WriteDataContent (chat attachments + persisted message media) and CopyFile
  (file_read model-input handoff) normalize images before writing; non-image
  media passes through untouched. The normalized artifact (PNG may become JPEG)
  replaces the original; the reference's length/MIME reflect it.
- CopyFile becomes nullable: a drop returns null, the caller skips it and the
  existing RequestedCount > MediaReferences.Count gap drives the model-input
  handoff warning — so a refused image is surfaced, not silent.
- Shared stateless SkiaImageNormalizer instance (no DI plumbing through the
  static call sites); fixed constant bounds (no runtime config).
- 6 seam tests: oversized chat/file_read image bounded; non-image passthrough;
  small image passthrough; undecodable image dropped + nothing written.
…law-dev#1296)

The media-store seam decodes every image on write, so fake magic-byte PNG stubs
are now correctly dropped as undecodable. Replace them with real, decodable
fixtures via a shared Netclaw.Tests.Utilities.TestImages helper:

- TestImages.SmallPng() — real opaque PNG small enough to pass through the
  normalizer byte-for-byte (round-trip assertions hold).
- TestImages.Image(w,h,...) — detailed image for downscale/budget fixtures.

Updated ChatMessageConverterTests, SessionToolExecutionPipelineTests,
LlmSessionImageDeliveryTests, SubAgentActorTests. Full Netclaw.Actors.Tests
suite green (2200), solution builds clean, slopwatch + headers pass.
…tclaw-dev#1296)

Completes the fail-loud contract for the chat-attachment path. Previously a
dropped chat image (undecodable or un-boundable) returned null and vanished
silently; now WriteDataContent returns a MediaWriteResult that distinguishes a
silent skip (non-media/empty) from a drop carrying the reason.

- WriteDataContent -> MediaWriteResult { Reference?, DroppedReason? }.
- ChannelPipeline (inbound attachments) and ChatMessageConverter.FromAiMessage
  append a visible "[image omitted: <reason>]" note to the message content on a
  drop, so the model sees an attachment was present but couldn't be included.
- CopyFile (file_read path) unchanged — it still surfaces drops via the existing
  model-input handoff warning.
- Tests: converter appends the note + preserves text on drop; seam tests assert
  the reason is carried. Full Actors suite green (2201).
…it (netclaw-dev#1296)

Adds a `netclaw doctor` check that fails loud if SkiaSharp's native
libSkiaSharp can't load — turning a packaging regression (the native asset
missing from the binary) into an up-front diagnostic instead of a runtime
throw deep in the channel/tool pipeline when the first image arrives.

- ImageNormalizerProbe.TryProbe() (Netclaw.Media): encode -> normalize
  round-trip; returns null on success or a short error. Keeps native usage
  encapsulated in the media library.
- ImageProcessingDoctorCheck (Netclaw.Cli): thin check that maps the probe to
  Pass/Error with remediation; registered alongside the other native-dep probe
  (sqlite).
- Netclaw.Cli.csproj: IncludeNativeLibrariesForSelfExtract=true so a manual
  `dotnet publish` of the CLI embeds libSkiaSharp (mirrors the daemon csproj;
  the release script already passes the flag to every component).

Docker (1.2): no Dockerfile change needed — self-contained single-file publish
with IncludeNativeLibrariesForSelfExtract embeds + self-extracts the native lib
(same mechanism as e_sqlite3); the single-binary COPY already carries it.

Tests: probe returns null with the real native lib; doctor check passes. All
143 Doctor tests green; solution builds clean; slopwatch + headers pass.
…claw-dev#1296)

Group 5: evals not triggered (no identity/skill/tool-schema/memory/model change;
model-facing contract unchanged) and no checked-in benchmark warranted — the OOM
metric is native (Skia) memory, which BenchmarkDotNet's MemoryDiagnoser can't see,
and normalization is off the hot path. Recorded one-off before/after numbers in
design.md (e.g. 64MP source decoded bitmap 244MB -> 15MB; payload -> 123KB).
…ormat (netclaw-dev#1296)

Resolves the max-effort review findings:

- HIGH: the normalizer no longer throws. Normalize wraps its SkiaSharp calls in
  try/catch and returns Dropped (honoring the IImageNormalizer "never throws"
  contract), so a native-load failure / native OOM / malformed codec degrades to
  a per-image drop instead of faulting the whole tool turn or the channel
  ingestion stream.
- The normalizer now ONLY resizes and keeps the source container format — a PNG
  stays PNG, a JPEG stays JPEG. No PNG->JPEG transcode, no quality ladder, no
  iterative byte-shrink. Screenshots/diagrams keep their lossless fidelity.
- GIF (and any format Skia can't re-encode) passes through untouched when within
  the byte budget, else drops — animation is never silently flattened.
- Only model-input-eligible images are normalized; bmp/tiff (image but not
  model-input, never inlined) are stored byte-for-byte.
- Batch budget releases the over-reservation after CopyFile, so large-but-
  downscalable file_read images are no longer rejected on pre-resize source size.
- Enabled=false bypass returns a null MediaType so the caller keeps its declared
  MIME (was mislabeling JPEG/WebP as image/png).
- Cleanups: SKData.CreateCopy(span) drops a managed copy; drop-reason byte budget
  is formatted B/KB/MB (no more "0MB").

Media tests 62, Actors 2202, doctor probe green; solution builds clean.
@Aaronontheweb Aaronontheweb changed the title fix(media): bound image egress — resize once at the media-store seam (#1296) fix(media): resize images once at ingest instead of re-inlining them every turn (#1296) Jun 7, 2026
@Aaronontheweb Aaronontheweb added tools Issues related to agent tools: file_read, web_search, shell_execute, image processing, etc. performance Memory, CPU, and I/O optimization issues. labels Jun 7, 2026
@Aaronontheweb Aaronontheweb merged commit 098f7a2 into netclaw-dev:dev Jun 7, 2026
15 checks passed
@Aaronontheweb Aaronontheweb deleted the investigate-1296-image-egress branch June 7, 2026 12:39

@Aaronontheweb Aaronontheweb left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good - probably need to do another pass on some of this in a follow-up PR

}

[Fact]
public void FromAiMessage_appends_omitted_note_when_image_is_dropped()

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

/// <summary>Disposable temp directory for session media tests.</summary>
// A real, decodable PNG small enough that the egress image normalizer passes it
// through byte-for-byte (so media round-trip assertions still hold).
private static byte[] SmallPng(int size = 16)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

/// <summary>
/// Outcome of running an image through <see cref="IImageNormalizer"/>.
/// </summary>
public enum ImageNormalizationOutcome

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense

/// <see cref="ImageNormalizationOutcome.Dropped"/> the caller MUST attach no image
/// and surface <see cref="Reason"/> as a visible note (fail-loud, never raw passthrough).
/// </summary>
public sealed record ImageNormalizationResult

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably should use some value objects here depending on how frequently this stuff is used.

/// Runs a tiny encode → normalize round-trip. Returns <c>null</c> when imaging
/// works, or a short error string describing why it does not.
/// </summary>
public static string? TryProbe()

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably should use a real result type here rather than relying on null

<ItemGroup>
<PackageReference Include="Microsoft.Extensions.Http" />
<PackageReference Include="SkiaSharp" />
<PackageReference Include="SkiaSharp.NativeAssets.Linux.NoDependencies" />

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about Windows and OS X?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They come in transitively — the base SkiaSharp package depends on NativeAssets.Win32 and NativeAssets.macOS. Desktop Linux is the only platform SkiaSharp doesn't pull by default (the fontconfig thing), so it's the only explicit add. CI confirmed it: Test-windows-latest and Test-macos-26 both passed here. Added a comment in #1348 noting this.

public sealed class ImageProcessingDoctorCheckTests
{
[Fact]
public async Task Passes_when_native_imaging_library_is_available()

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

/// egress fails at runtime. This check turns that packaging regression into a loud,
/// up-front diagnostic.
/// </summary>
public sealed class ImageProcessingDoctorCheck : IDoctorCheck

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

glad we check for this because this would be a tremendous PITA to debug without it

var mediaRef = SessionMediaStore.WriteDataContent(data, sessionDir);
if (mediaRef is not null)
mediaRefs.Add(mediaRef);
var write = SessionMediaStore.WriteDataContent(data, sessionDir);

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was a very smooth way to integrate the image re-sizing / sampling into the main LLM session pipeline without too much fanfare. very nice.

var mediaRef = SessionMediaStore.WriteDataContent(data, sessionDir);
if (mediaRef is not null)
mediaRefs.Add(mediaRef);
var write = SessionMediaStore.WriteDataContent(data, sessionDir);

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this duplicated between the ChannelPipeline and here?

Aaronontheweb added a commit that referenced this pull request Jun 7, 2026
…t, dedup (#1348)

Follow-up to the image-egress PR:

- Wrap the normalizer's bare ints in tidy value objects so px and bytes can't be
  confused at a call site: Pixels, ByteSize, JpegQuality. ByteSize.ToString folds
  in the human B/KB/MB formatting (replacing the FormatBudget helper).
- ImageNormalizerProbe returns a real ImagingProbeResult { IsWorking, Error }
  instead of null-as-success.
- Dedup the "write DataContent -> add reference or append [image omitted] note"
  logic that was copy-pasted in ChannelPipeline and ChatMessageConverter into one
  SessionMediaStore.WriteMediaInto helper; AppendOmittedImageNote is now private.
- Comments: spell out what hits the "not a decodable image" drop; leave a
  breadcrumb where an iterative size-reduction loop would go if over-budget drops
  get common; note in Tests.Utilities.csproj that Win/macOS native assets come
  transitively from base SkiaSharp (only Linux needs the explicit add).

Media tests 65, Actors 2211; solution builds clean; slopwatch + headers pass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Memory, CPU, and I/O optimization issues. tools Issues related to agent tools: file_read, web_search, shell_execute, image processing, etc.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Image egress reads the whole file and inline-base64-copies it every LLM call (no downscale, no streaming)

1 participant