Skip to content

netclawd runs Server GC by default (Web SDK); Workstation GC suits the daemon's container footprint #1294

@Aaronontheweb

Description

@Aaronontheweb

What happens

netclawd runs with Server GC, because Netclaw.Daemon builds on the Web SDK and nothing overrides the default. In a memory-limited container (we run these at a 1Gi cap) Server GC is the wrong trade: it reserves larger heap segments, keeps one heap per core, collects Gen2/LOH less eagerly, and is slow to hand memory back to the OS. The result is a higher peak RSS and a worse chance of riding out a transient spike before the cgroup OOM killer steps in.

Why

The daemon project uses Microsoft.NET.Sdk.Web, which turns ServerGarbageCollection on by default, and there's no override anywhere in the project tree:

netclawd is a single-tenant, low-concurrency process — one operator, a handful of sessions, a SignalR gateway and webhook listener. That's the workload Workstation GC is meant for. Server GC's throughput benefit (saturating many cores under heavy allocation churn) doesn't apply, and we pay its memory cost instead.

This showed up in practice: a daemon doing autonomous log-pulling spiked past its 1Gi limit and got SIGKILL'd (the entrypoint restarts the inner process, so the pod restart count stays 0 and it's easy to miss). The underlying allocation problem is the unbounded shell read in the companion issue; Server GC is what turns that spike into a kill instead of a hiccup.

Suggested direction

Switch the daemon to Workstation GC, ideally with background collection on:

<ServerGarbageCollection>false</ServerGarbageCollection>
<ConcurrentGarbageCollection>true</ConcurrentGarbageCollection>

(Equivalent to DOTNET_gcServer=0 as deployment env, but baking it into the binary keeps it consistent across every deployment.) Optionally pair with DOTNET_GCConserveMemory for the most constrained containers.

A PR doing exactly this is open alongside this issue. Worth a sanity check that no high-throughput self-hosted deployment is depending on Server GC; for the agent's actual concurrency profile, Workstation should be the better default.

Metadata

Metadata

Assignees

No one assigned

    Labels

    dockerDocker image packaging, publishing, and containerized workflowsreliabilityRetries, resilience, graceful degradation

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions