Skip to content

backupccl: avoid over-shrinking memory monitor#79503

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
stevendanna:ssd/manifest-reading-mem-monitor-bug
Apr 6, 2022
Merged

backupccl: avoid over-shrinking memory monitor#79503
craig[bot] merged 1 commit intocockroachdb:masterfrom
stevendanna:ssd/manifest-reading-mem-monitor-bug

Conversation

@stevendanna
Copy link
Copy Markdown
Collaborator

This change updates DecryptFile to optionally use a memory monitor and
adjusts the relevant calling functions.

Previously, the readManifest function that used DecryptFile could
result in shrinking too many bytes from the bound account:

ERROR: a panic has occurred!
root: no bytes in account to release, current 29571, free 30851
(1) attached stack trace
  -- stack trace:
  | runtime.gopanic
  | 	GOROOT/src/runtime/panic.go:1038
  | [...repeated from below...]
Wraps: (2) attached stack trace
  -- stack trace:
  | github.com/cockroachdb/cockroach/pkg/util/log/logcrash.ReportOrPanic
  | 	github.com/cockroachdb/cockroach/pkg/util/log/logcrash/crash_reporting.go:374
  | github.com/cockroachdb/cockroach/pkg/util/mon.(*BoundAccount).Shrink
  | 	github.com/cockroachdb/cockroach/pkg/util/mon/bytes_usage.go:709
  | github.com/cockroachdb/cockroach/pkg/ccl/backupccl.(*restoreResumer).doResume.func1
  | 	github.com/cockroachdb/cockroach/pkg/ccl/backupccl/pkg/ccl/backupccl/restore_job.go:1244

This was the result of us freeing cap(descBytes) despite the fact
that we never accounted for the fact that descBytes had been
re-assigned after the decryption without accounting for changes in the
capacity used by the old vs new buffer. That is, we called mem.Grow
with the capcity of the old buffer, but mem.Shrink with the capacity
of the new buffer.

We generally expect that the size of encrypted data to be larger than
the size of the plaintext data, so in most cases, the existing code
would work without error because it was freeing an amount smaller than
the original allocation.

However, while the plaintext data is smaller than the encrypted data,
that doesn't mean that the capacity of the slice holding the
plaintext data
is smaller than the capacity of the slice holding the
encrypted data
.

The slice holding the encrypted data was created with mon.ReadAll
which has slice growth strategy of doubling the size until it reaches
8MB. The slice holding the plaintext data was created by
ioutil.ReadAll that defers to appends slice growth strategy, which
differs from that used in mon.ReadAll.

In the current implementations, for byte sizes around 30k, the
capacity of the buffer create by append's strategy is larger than that
created by mon.ReadAll despite the len of the buffer being smaller:

before decrypt: len(descBytes) = 27898, cap(descBytes) = 32768
 after decrypt: len(descBytes) = 27862, cap(descBytes) = 40960

We could have fixed this by simply adjusting the memory account by the
difference. Instead, I've opted to thread the memory monitor into
DecryptFile to better account for the fact that we technically do hold
2 copies of the data during this decryption.

Fixes #79488

Release note: None

@stevendanna stevendanna requested review from a team and adityamaru and removed request for a team April 6, 2022 14:56
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Copy link
Copy Markdown
Contributor

@adityamaru adityamaru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find!

return nil, err
}
statsBytes, err = storageccl.DecryptFile(statsBytes, encryptionKey)
statsBytes, err = storageccl.DecryptFile(ctx, statsBytes, encryptionKey, nil)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: nil /* mm */

require.NoError(t, writeBackupManifest(ctx, st, storage, "testmanifest", encOpts, desc))
_, sz, err := readBackupManifest(ctx, &mem, storage, "testmanifest", encOpts)
require.NoError(t, err)
mem.Shrink(ctx, sz)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should we also Close() the bound account here?

This change updates DecryptFile to optionally use a memory monitor and
adjusts the relevant calling functions.

Previously, the readManifest function that used DecryptFile could
result in shrinking too many bytes from the bound account:

```
ERROR: a panic has occurred!
root: no bytes in account to release, current 29571, free 30851
(1) attached stack trace
  -- stack trace:
  | runtime.gopanic
  | 	GOROOT/src/runtime/panic.go:1038
  | [...repeated from below...]
Wraps: (2) attached stack trace
  -- stack trace:
  | github.com/cockroachdb/cockroach/pkg/util/log/logcrash.ReportOrPanic
  | 	github.com/cockroachdb/cockroach/pkg/util/log/logcrash/crash_reporting.go:374
  | github.com/cockroachdb/cockroach/pkg/util/mon.(*BoundAccount).Shrink
  | 	github.com/cockroachdb/cockroach/pkg/util/mon/bytes_usage.go:709
  | github.com/cockroachdb/cockroach/pkg/ccl/backupccl.(*restoreResumer).doResume.func1
  | 	github.com/cockroachdb/cockroach/pkg/ccl/backupccl/pkg/ccl/backupccl/restore_job.go:1244
```

This was the result of us freeing `cap(descBytes)` despite the fact
that we never accounted for the fact that `descBytes` had been
re-assigned after the decryption without accounting for changes in the
capacity used by the old vs new buffer. That is, we called mem.Grow
with the capcity of the old buffer, but mem.Shrink with the capacity
of the new buffer.

We generally expect that the size of encrypted data to be larger than
the size of the plaintext data, so in most cases, the existing code
would work without error because it was freeing an amount smaller than
the original allocation.

However, while the plaintext data is smaller than the encrypted data,
that doesn't mean that the _capacity of the slice holding the
plaintext data_ is smaller than the _capacity of the slice holding the
encrypted data_.

The slice holding the encrypted data was created with mon.ReadAll
which has slice growth strategy of doubling the size until it reaches
8MB. The slice holding the plaintext data was created by
ioutil.ReadAll that defers to appends slice growth strategy, which
differs from that used in mon.ReadAll.

In the current implementations, for byte sizes around 30k, the
capacity of the buffer create by append's strategy is larger than that
created by mon.ReadAll despite the len of the buffer being smaller:

    before decrypt: len(descBytes) = 27898, cap(descBytes) = 32768
     after decrypt: len(descBytes) = 27862, cap(descBytes) = 40960

We could have fixed this by simply adjusting the memory account by the
difference. Instead, I've opted to thread the memory monitor into
DecryptFile to better account for the fact that we technically do hold
2 copies of the data during this decryption.

Fixes cockroachdb#79488

Release note: None
@stevendanna stevendanna force-pushed the ssd/manifest-reading-mem-monitor-bug branch from d04e281 to 30419a8 Compare April 6, 2022 17:40
@stevendanna
Copy link
Copy Markdown
Collaborator Author

bors r=adityamaru

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Apr 6, 2022

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

roachtest: backup/KMS/GCS/n3cpu4 failed

3 participants