Skip to content

Conversation

@samm81
Copy link
Contributor

@samm81 samm81 commented Oct 4, 2025

Close file descriptors after mmap on Unix to reduce the total number of open file descriptors.

On Unix-like platforms, close the file descriptor immediately after a successful mmap.Map of .sndx index cache files. This keeps the mapping valid until Unmap (per POSIX semantics) and significantly reduces steady-state FD usage when many indexes are open, helping avoid EMFILE ("too many open files").

  • Split mmapOpenWithRetry into platform-specific implementations:
    • committed_content_index_disk_cache_unix.go (!windows):
      • Map RDONLY, close FD immediately.
      • Return closer that only unmaps.
    • committed_content_index_disk_cache_windows.go (windows):
      • Keep FD open until Unmap.
      • Return closer that unmaps and closes FD.
  • Remove old mmapOpenWithRetry and mmap import from repo/content/committed_content_index_disk_cache.go
  • Add Linux-only unit test verifying FD count does not grow proportionally: repo/content/committed_content_index_fd_linux_test.go
    • Creates N small indexes, opens them all, checks /proc/self/fd delta stays low.

Notes:

  • Behavior unchanged on Windows due to OS semantics.
  • Mapping failures close the FD to avoid leaks.
  • Unlink semantics remain correct; mappings stay valid until Unmap.

@samm81
Copy link
Contributor Author

samm81 commented Oct 4, 2025

trying to solve my own issue #4831 . works ok on my machine locally so I figured I'd push it up if it would help anyone else.

@jkowalski
Copy link
Contributor

please rebase and fix merge conflict, we should test this on Mac and Linux and then we can merge

@samm81 samm81 force-pushed the reduce-open-fds-unix branch from 8a2f7bd to f682b7d Compare October 9, 2025 08:09
Copy link
Collaborator

@julio-lopez julio-lopez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for proposing the changes, they are pointing in the right direction.
Please take a look at the inline comments.

Thanks again,

// Do not run in parallel to avoid fd count noise from other tests.
ctx := testlogging.Context(t)
ft := faketime.NewClockTimeWithOffset(0)
cache := &diskCommittedContentIndexCache{testutil.TempDirectory(t), ft.NowFunc(), func() int { return 3 }, testlogging.Printf(t.Logf, ""), DefaultIndexCacheSweepAge}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@samm81 This failed to compile in a previous CI job and it is going to fail again due to a type mismatch for the "log" type.

See https://github.com/kopia/kopia/actions/runs/18611420883/job/53293937511?pr=4866

Comment on lines 63 to 65
if delta := after - before; delta > maxDelta {
t.Fatalf("fd count grew too much after opening %d indexes: before=%d after=%d delta=%d (max allowed %d)", N, before, after, delta, maxDelta)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be simplified as:

Suggested change
if delta := after - before; delta > maxDelta {
t.Fatalf("fd count grew too much after opening %d indexes: before=%d after=%d delta=%d (max allowed %d)", N, before, after, delta, maxDelta)
}
require.LessOrEqualf(t, after - before, maxDelta "fd count grew too much after opening %d indexes:", N)

The delta (that is after - before) and maxDelta will be already included in the message when the assertion fails.

ft := faketime.NewClockTimeWithOffset(0)
cache := &diskCommittedContentIndexCache{testutil.TempDirectory(t), ft.NowFunc(), func() int { return 3 }, testlogging.Printf(t.Logf, ""), DefaultIndexCacheSweepAge}

const N = 200
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
const N = 200
const indexCount = 200

samm81 and others added 10 commits October 22, 2025 19:41
On Unix-like platforms, close the file descriptor immediately after a successful
mmap.Map of .sndx index cache files. This keeps the mapping valid until Unmap
(per POSIX semantics) and significantly reduces steady-state FD usage when many
indexes are open, helping avoid EMFILE ("too many open files").

- Split mmapOpenWithRetry into platform-specific implementations:
  - committed_content_index_disk_cache_unix.go (!windows):
    - Map RDONLY, close FD immediately.
    - Return closer that only unmaps.
  - committed_content_index_disk_cache_windows.go (windows):
    - Keep FD open until Unmap.
    - Return closer that unmaps and closes FD.
- Remove old mmapOpenWithRetry and mmap import from
  repo/content/committed_content_index_disk_cache.go.
- Add Linux-only unit test verifying FD count does not grow proportionally:
  repo/content/committed_content_index_fd_linux_test.go
  - Creates N small indexes, opens them all, checks /proc/self/fd delta stays low.

Notes:
- Behavior unchanged on Windows due to OS semantics.
- Mapping failures close the FD to avoid leaks.
- Unlink semantics remain correct; mappings stay valid until Unmap.
Co-authored-by: Julio Lopez <1953782+julio-lopez@users.noreply.github.com>
Co-authored-by: Julio Lopez <1953782+julio-lopez@users.noreply.github.com>
@julio-lopez julio-lopez requested a review from Copilot October 23, 2025 03:57
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR optimizes file descriptor usage on Unix platforms by closing FDs immediately after memory-mapping index files, while preserving the existing behavior on Windows due to OS-specific requirements.

Key changes:

  • Splits mmapOpenWithRetry into platform-specific implementations with Unix closing FDs post-mmap and Windows retaining them
  • Removes the original cross-platform implementation and mmap import from the base file
  • Adds a Linux-specific test verifying FD count remains low when many indexes are opened

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
repo/content/committed_content_index_disk_cache.go Removes original mmapOpenWithRetry implementation and mmap-go import
repo/content/committed_content_index_disk_cache_unix.go New Unix implementation that closes FD immediately after successful mmap
repo/content/committed_content_index_disk_cache_windows.go New Windows implementation preserving retry logic and keeping FD open
repo/content/committed_content_index_fd_linux_test.go Adds test verifying FD count doesn't grow proportionally when opening many indexes

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

func (c *diskCommittedContentIndexCache) mmapOpenWithRetry(_ context.Context, path string) (mmap.MMap, func() error, error) {
f, err := os.Open(path) //nolint:gosec
if err != nil {
return nil, nil, errors.Wrap(err, "unable to open file despite retries")
Copy link

Copilot AI Oct 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message 'unable to open file despite retries' is misleading on Unix since no retry logic exists in this implementation. Consider changing to a simpler message like 'unable to open file' or 'error opening file for mmap'.

Suggested change
return nil, nil, errors.Wrap(err, "unable to open file despite retries")
return nil, nil, errors.Wrap(err, "unable to open file")

Copilot uses AI. Check for mistakes.
retryCount := 0
for err != nil && retryCount < maxRetries {
retryCount++
contentlog.Log2(ctx, c.log, "retry unable to mmap.Open()",
Copy link

Copilot AI Oct 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Corrected spacing in log message from 'retry unable to mmap.Open()' to 'retry: unable to mmap.Open()' for better readability.

Suggested change
contentlog.Log2(ctx, c.log, "retry unable to mmap.Open()",
contentlog.Log2(ctx, c.log, "retry: unable to mmap.Open()",

Copilot uses AI. Check for mistakes.
@julio-lopez julio-lopez changed the title feat(content): close FD after mmap on Unix to reduce open FDs feat(general): reduce number of open file descriptors used for indexes on Unix Oct 23, 2025
@julio-lopez julio-lopez changed the title feat(general): reduce number of open file descriptors used for indexes on Unix feat(general): reduce number of open file descriptors for indexes on Unix Oct 23, 2025
@codecov
Copy link

codecov bot commented Oct 23, 2025

Codecov Report

❌ Patch coverage is 45.83333% with 13 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.71%. Comparing base (cb455c6) to head (cc34ff7).
⚠️ Report is 692 commits behind head on master.

Files with missing lines Patch % Lines
...content/committed_content_index_disk_cache_unix.go 45.83% 11 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4866      +/-   ##
==========================================
+ Coverage   75.86%   76.71%   +0.84%     
==========================================
  Files         470      537      +67     
  Lines       37301    40968    +3667     
==========================================
+ Hits        28299    31427    +3128     
- Misses       7071     7494     +423     
- Partials     1931     2047     +116     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@julio-lopez julio-lopez merged commit 11765d6 into kopia:master Oct 23, 2025
29 of 31 checks passed
@julio-lopez
Copy link
Collaborator

@samm81 Thanks for your digging into this issue and for the corresponding improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants