Skip to content

chore: add logging to Filestore.purger#26089

Merged
davidby-influx merged 4 commits intomaster-1.xfrom
DSB_purger_logging
Mar 5, 2025
Merged

chore: add logging to Filestore.purger#26089
davidby-influx merged 4 commits intomaster-1.xfrom
DSB_purger_logging

Conversation

@davidby-influx
Copy link
Copy Markdown
Contributor

Also fixes error type checks in
TestCompactor_CompactFull_InProgress

@davidby-influx davidby-influx marked this pull request as draft March 3, 2025 17:00
@davidby-influx davidby-influx marked this pull request as ready for review March 5, 2025 00:34
Copy link
Copy Markdown

@devanbenz devanbenz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few code nits and a question about logging file names post purge.

removeErr := c.RemoveTmpFiles(files)
if removeErr == nil {
return errors.Join(originalErrs...)
if len(originalErrs) == 1 {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this required? Wouldn't errors.Join(originalErrs...) capture a single error?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does, but this is a quick hack to allow older code looking for a specific error by casting instead of using errors.Is to continue to function.

Comment thread tsdb/engine/tsm1/compact_test.go Outdated
e := errors.Unwrap(err)
assert.NotNil(t, e, "expected an error wrapped by errCompactionInProgress")
assert.Truef(t, errors.Is(e, fs.ErrExist), "error did not indicate file existence: %v", e)
assert.ErrorContains(t, err, "file exists", "unexpected error writing snapshot for %s", f2Name)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert.ErrorContainsf for string formatting?

go func() {
var purgeCount int
defer func() {
logger.Info("removed", zap.Int("files", purgeCount))
Copy link
Copy Markdown

@devanbenz devanbenz Mar 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be a good idea to log out the file names that were removed as well as the count? I see we log the names out above, this may make it easier to compare the lists in the logs if any issues arise.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We print them out as they are deleted when the log level is debug

@davidby-influx davidby-influx requested a review from devanbenz March 5, 2025 19:10
Copy link
Copy Markdown

@devanbenz devanbenz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@davidby-influx davidby-influx merged commit 2ab5aad into master-1.x Mar 5, 2025
@davidby-influx davidby-influx deleted the DSB_purger_logging branch March 5, 2025 19:46
davidby-influx added a commit that referenced this pull request Mar 5, 2025
Also fixes error type checks in
TestCompactor_CompactFull_InProgress

(cherry picked from commit 2ab5aad)
@philjb
Copy link
Copy Markdown
Contributor

philjb commented Mar 5, 2025

I like the logging and error improvements here! they are much better than what was there.

I note several other issues though in the purging code (not related to the pr changes)

  • purger uses a go map, which don't release backing memory ever - so it'll be as large as the largest set of compaction input files (which might be a lot for a full compaction). Minor issue, but could remedy by allocating a new map when len(p.files) == 0.
  • The purger seems to suffer from a similar issue that @geoffrey Wossum identified with the retention service. It calls tsmfile.InUse() and then later tsmfile.Close() without blocking new readers in the inbetween - a time of check, time of use flaw. Therefore tsmfile.Close() which waits on readers to finish (Unref()) has an unbounded runtime. The purger's lock it held over the close() call, which means the lock is held for an unbounded time, which you might say is ok because this happens in the purger's "purge" goroutine, but the purger.Add(files[]) call needs to get that lock too and purger.Add() is called synchronously in Filestore.replace() which means replace() has an unbounded runtime. All this means, if i'm reading it right, that purging tsm files could freeze up compaction.
    • One could make it go f.purger.add(inuse) to decouple the purger and compaction which is nice, but it'd be better to use a sync map and/or reduce the purger lock holding time and/or put tsm file closing into a goroutine and/or call SetNewReadersBlocked before the InUse check, like the retention service does.

EDIT: extracted into ticket

davidby-influx added a commit that referenced this pull request Mar 5, 2025
Also fixes error type checks in
TestCompactor_CompactFull_InProgress

(cherry picked from commit 2ab5aad)
devanbenz pushed a commit that referenced this pull request May 9, 2025
Also fixes error type checks in
TestCompactor_CompactFull_InProgress

(cherry picked from commit 2ab5aad)
(cherry picked from commit efebf4d)
devanbenz pushed a commit that referenced this pull request May 12, 2025
Also fixes error type checks in
TestCompactor_CompactFull_InProgress

(cherry picked from commit 2ab5aad)
(cherry picked from commit efebf4d)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants