Skip to content

Fix high CPU usage from log writer spin-wait, regex recompilation, and byte-by-byte tunnel I/O#197

Merged
atauenis merged 6 commits intoatauenis:devfrom
paulscan:performance-fixes
Feb 2, 2026
Merged

Fix high CPU usage from log writer spin-wait, regex recompilation, and byte-by-byte tunnel I/O#197
atauenis merged 6 commits intoatauenis:devfrom
paulscan:performance-fixes

Conversation

@paulscan
Copy link
Contributor

@paulscan paulscan commented Feb 1, 2026

I was experiencing extremely high CPU usage (900%+) after only a bit of use when running WebOne in a Docker container on an Apple Silicon Mac. The proxy would become unusable and the container would consume all available CPU cores.

After investigating, identified three independent issues that compound under load:

1. Log writer spin-wait

LogAgent.WriteLine spawns a Task for each log message that busy-waits with while (!LogStreamWriterReady) { }. Under high connection volume, these Tasks pile up and spin, burning CPU cycles.

2. Regex patterns recompiled on every request

The default configuration has ~140 Edit rules with regex patterns. Each request recompiles these patterns via new Regex() calls in hot paths like HttpTransit.ProcessTransit(). Regex compilation is expensive.

3. Byte-by-byte tunnel I/O

The tunnel servers (HttpSecurePassthroughServer, HttpSecureNonHttpServer, HttpSecureNonHttpDecryptServer) read and write one byte at a time using BinaryReader.ReadByte(). This causes excessive system call overhead for every byte transferred.

Solution

  1. Log writer: Replace spin-wait with lock() synchronization
  2. Regex: Cache compiled Regex instances in a ConcurrentDictionary with 5-second timeout
  3. Tunnel I/O: Use 8KB buffered reads/writes instead of byte-by-byte

Testing

  • Before fixes: 900%+ CPU, container unusable
  • After fixes: 0.03% idle, peaks ~7% under load with multiple browsers + Dropbox through the proxy
  • Tested on macOS with Apple M4. Originally ran via Rosetta (x86 emulation), then recompiled for native ARM64 — high CPU issue occurred on both.

Note

This fix was developed with assistance from Claude Opus 4.5. All changes were tested on my local environment before submission.

atauenis and others added 6 commits May 25, 2025 22:13
Replace spin-wait with proper lock for log file writes. The previous
implementation used a busy-wait loop that consumed CPU cycles while
waiting for LogStreamWriterReady, causing high CPU usage under load.

Co-Authored-By: Claude <noreply@anthropic.com>
Cache compiled Regex instances using ConcurrentDictionary to avoid
recompiling the same patterns on every request. Patterns are compiled
with a 5-second timeout to prevent ReDoS attacks.

This significantly reduces CPU usage when processing requests through
Edit rules, as the ~140 default regex patterns no longer need to be
recompiled for each request.

Co-Authored-By: Claude <noreply@anthropic.com>
Replace byte-by-byte I/O with 8KB buffered reads/writes in tunnel
servers. The previous implementation read and wrote one byte at a time,
causing excessive system call overhead and high CPU usage during
tunnel connections.

Affected servers:
- HttpSecurePassthroughServer (CONNECT passthrough)
- HttpSecureNonHttpServer (non-HTTP SSL tunneling)
- HttpSecureNonHttpDecryptServer (decrypted non-HTTP tunneling)

Co-Authored-By: Claude <noreply@anthropic.com>
@atauenis atauenis changed the base branch from master to dev February 2, 2026 07:45
@atauenis
Copy link
Owner

atauenis commented Feb 2, 2026

Thank you for these important fixes!

The high CPU load is a rare bug which I have not caught manually in last ~18 major versions. The simple loop-based lock was not a good thing (however, which haven't produced any problems on low load conditions and when there are no problems on write to log). And when there are problems on writing to log, it make a hardest freeze with 100%+ CPU load. Now it is gone.

One byte buffer for non-HTTPS tunnels initially introduced for "short message with variable length"-based protocols like IRC, FTP or Telnet for lower ping. But seems that IRC is still correctly working even with 8K buffers, so the buffer is not really too expensive.

@atauenis atauenis merged commit e373e3a into atauenis:dev Feb 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants