fix(table): fix refcount leak in enrichRecordsWithPosDeleteFields#762
Conversation
…teFields Arrays returned by NewArray() have refcount=1. NewRecordBatch calls Retain() on each column, bumping to refcount=2. Without an explicit Release() on the temporary arrays, the count never drops back to 1 when the record batch is released by the caller. Fix by assigning NewArray() results to local variables and deferring their Release(), so the lifecycle is: NewArray() -> refcount 1, NewRecordBatch Retain() -> refcount 2, deferred Release() -> refcount 1 (owned by outData), caller releases outData -> refcount 0 -> freed. Also extend TestEnrichRecordsWithPosDeleteFields to use memory.NewCheckedAllocator with mem.AssertSize(t, 0) to catch this class of leak going forward. Fixes leak introduced in apache#721.
|
@zeroshade i see some flackiness in one integration test, i will spend some cycle to try to track this, but this PR seems reviewable now. |
zeroshade
left a comment
There was a problem hiding this comment.
just a couple small nitpicks, but otherwise looks good!
table/arrow_scanner.go
Outdated
| panic("position delete schema should have required field 'pos'") | ||
| } | ||
|
|
||
| newColsSchema := arrow.NewSchema([]arrow.Field{filePathField[0], posField[0]}, nil) |
There was a problem hiding this comment.
since we control the PositionalDeleteArrowSchema as a global, do we really need these checks?
Can't we just use PositionalDeleteArrowSchema as-is without needing to create an entirely new schema object here?
There was a problem hiding this comment.
make sense, fixed.
table/arrow_scanner.go
Outdated
| filePathBuilder, ok := rb.Field(0).(*array.StringBuilder) | ||
| if !ok { | ||
| return nil, errors.New("file path field is not a string") | ||
| } | ||
| posBuilder, ok := rb.Field(1).(*array.Int64Builder) | ||
| if !ok { | ||
| return nil, errors.New("pos field is not a int64") | ||
| } |
There was a problem hiding this comment.
Since we control the schema, and are making the builder, we don't need the type checks here, we should be able to just assume they are correct (unit tests can validate that the output schema is correct) and just do filePathBuilder, posBuilder := rb.Field(0).(*array.StringBuilder), rb.Field(1).(*array.Int64Builder)
There was a problem hiding this comment.
i'm a bit paranoic here :D
There was a problem hiding this comment.
yeah, a bit too paranoic.
Arrays returned by NewArray() have refcount=1. NewRecordBatch calls Retain() on each column, bumping to refcount=2. Without an explicit Release() on the temporary arrays, the count never drops back to 1 when the record batch is released by the caller.
Fix by assigning NewArray() results to local variables and deferring their Release(), so the lifecycle is: NewArray() -> refcount 1, NewRecordBatch Retain() -> refcount 2, deferred Release() -> refcount 1 (owned by outData), caller releases outData -> refcount 0 -> freed.
Also extend TestEnrichRecordsWithPosDeleteFields to use memory.NewCheckedAllocator with mem.AssertSize(t, 0) to catch this class of leak going forward.
Fixes leak introduced in #721.