colserde: perform memory accounting for scratch slices in the converter#94660
Conversation
fba13da to
2815410
Compare
DrewKimball
left a comment
There was a problem hiding this comment.
Reviewed 57 of 57 files at r1, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @michae2 and @yuzefovich)
pkg/col/colserde/arrowbatchconverter.go line 82 at r1 (raw file):
// acc must be a non-nil unlimited memory account unless ArrowToBatchOnly mode // is used. The account will be shrunk in Release, however, it is the caller's // responsibility to close the account.
[nit] Might be worth it to call out that Release can't be called after the caller has closed the account.
pkg/col/colserde/arrowbatchconverter.go line 361 at r1 (raw file):
for i := range c.scratch.values { // Some of these might be nil in which case footprint won't change. footprint += int64(len(c.scratch.values[i]))
Should we use cap here instead of len?
This commit introduces the memory accounting for values and offsets scratch slices that are being reused across `BatchToArrow` calls since recently. This required a lot of plumbing to propagate the memory account from all places. The converter is careful to only grow and shrink the account by its own usage, so the same memory account can be reused across multiple converters (as long as there is no concurrency going on, and we only can have concurrency for hash router outputs, so there we give each output a separate account). An additional minor change is that now in `diskQueue.Enqueue` we properly `Close` the `FileSerializer` before `nil`-ing it out. This isn't a problem per se since it is the caller's responsibility to close the account used by the serializer, but it's nice to properly release the accounted for bytes. Release note: None
2815410 to
2922777
Compare
yuzefovich
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball and @michae2)
pkg/col/colserde/arrowbatchconverter.go line 82 at r1 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
[nit] Might be worth it to call out that
Releasecan't be called after the caller has closed the account.
Good point, done.
pkg/col/colserde/arrowbatchconverter.go line 361 at r1 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
Should we use
caphere instead oflen?
Indeed, nice catch, thanks.
|
TFTR! bors r+ |
|
Build succeeded: |
This commit introduces the memory accounting for values and offsets
scratch slices that are being reused across
BatchToArrowcalls sincerecently. This required a lot of plumbing to propagate the memory
account from all places. The converter is careful to only grow and
shrink the account by its own usage, so the same memory account can be
reused across multiple converters (as long as there is no concurrency
going on, and we only can have concurrency for hash router outputs, so
there we give each output a separate account).
An additional minor change is that now in
diskQueue.Enqueueweproperly
ClosetheFileSerializerbeforenil-ing it out. Thisisn't a problem per se since it is the caller's responsibility to close
the account used by the serializer, but it's nice to properly release
the accounted for bytes.
Epic: None
Release note: None