Skip to content

Use memory pools where needed to help reduce allocations#541

Merged
NGTmeaty merged 2 commits into
mainfrom
memory-optimizations
Jan 16, 2026
Merged

Use memory pools where needed to help reduce allocations#541
NGTmeaty merged 2 commits into
mainfrom
memory-optimizations

Conversation

@NGTmeaty

@NGTmeaty NGTmeaty commented Jan 7, 2026

Copy link
Copy Markdown
Collaborator

No description provided.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces memory pools using sync.Pool to reduce memory allocations in performance-critical paths of the web crawler, particularly in body processing, XML parsing, and data copying operations.

Key changes:

  • Optimized UUID generation to use uuid.NewString() instead of uuid.New().String()
  • Added sync.Pool for bufio.Reader instances in XML parsing
  • Added sync.Pool for bytes.Buffer instances in headless and general body processing
  • Added sync.Pool for byte slice buffers in the connection utility's copy operations

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
pkg/models/item.go Optimized UUID string generation to avoid intermediate allocation
internal/pkg/postprocessor/extractor/xml.go Added bufio.Reader pool for XML parsing to reuse reader instances
internal/pkg/archiver/headless/body.go Added bytes.Buffer pool for headless body processing with proper cleanup
internal/pkg/archiver/general/body.go Added bytes.Buffer pool for MIME detection, but contains a critical double-Put bug
internal/pkg/archiver/connutil/connutil.go Added byte slice pool for copy operations to avoid repeated 4KB allocations

Comment thread internal/pkg/archiver/general/body.go Outdated
@codecov-commenter

codecov-commenter commented Jan 7, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 93.54839% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 56.61%. Comparing base (fbb1f0e) to head (fbd49fc).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
internal/pkg/archiver/general/body.go 85.71% 1 Missing ⚠️
internal/pkg/archiver/headless/body.go 90.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #541      +/-   ##
==========================================
+ Coverage   56.48%   56.61%   +0.13%     
==========================================
  Files         131      131              
  Lines        6581     6606      +25     
==========================================
+ Hits         3717     3740      +23     
- Misses       2489     2491       +2     
  Partials      375      375              
Flag Coverage Δ
e2etests 42.17% <74.19%> (+0.11%) ⬆️
unittests 28.92% <64.51%> (+0.13%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

@willmhowes willmhowes left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good, and I compiled locally and successfully ran a test crawl as a sanity check!

@NGTmeaty NGTmeaty merged commit a7d82f1 into main Jan 16, 2026
5 checks passed
@NGTmeaty NGTmeaty deleted the memory-optimizations branch January 16, 2026 01:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants