Describe the bug, including details regarding any error messages, version, and platform.
When reading a subset of columns from a struct field (partial column projection), the getReader function in pqarrow/file_reader.go fails to filter childFields in sync with childReaders, causing a length mismatch and potential panic.
In file_reader.go lines 594-604, only childReaders is pruned to remove nil entries, but childFields retains zero-valued arrow.Field{} entries for skipped children:
|
// because we performed getReader concurrently, we need to prune out any empty readers |
|
childReaders = slices.DeleteFunc(childReaders, |
|
func(r *ColumnReader) bool { return r == nil }) |
|
if len(childFields) == 0 { |
|
return nil, nil |
|
} |
|
filtered := arrow.Field{ |
|
Name: arrowField.Name, Nullable: arrowField.Nullable, |
|
Metadata: arrowField.Metadata, Type: arrow.StructOf(childFields...), |
|
} |
|
out = newStructReader(&rctx, &filtered, field.LevelInfo, childReaders, fr.Props) |
This causes panic on arrow.StructOf(childFields...)
|
func StructOf(fs ...Field) *StructType { |
|
n := len(fs) |
|
if n == 0 { |
|
return &StructType{} |
|
} |
|
|
|
t := &StructType{ |
|
fields: make([]Field, n), |
|
index: make(map[string][]int, n), |
|
} |
|
for i, f := range fs { |
|
if f.Type == nil { |
|
panic("arrow: field with nil DataType") |
|
} |
Reproduction
package main
import (
"bytes"
"context"
"github.com/apache/arrow-go/v18/arrow"
"github.com/apache/arrow-go/v18/arrow/array"
"github.com/apache/arrow-go/v18/arrow/memory"
"github.com/apache/arrow-go/v18/parquet/file"
"github.com/apache/arrow-go/v18/parquet/pqarrow"
)
func main() {
schema := arrow.NewSchema([]arrow.Field{
{Name: "nested", Type: arrow.StructOf(
arrow.Field{Name: "a", Type: arrow.PrimitiveTypes.Float64},
arrow.Field{Name: "b", Type: arrow.PrimitiveTypes.Float64},
)},
}, nil)
buf := new(bytes.Buffer)
writer, _ := pqarrow.NewFileWriter(schema, buf, nil, pqarrow.DefaultWriterProps())
b := array.NewRecordBuilder(memory.DefaultAllocator, schema)
sb := b.Field(0).(*array.StructBuilder)
sb.Append(true)
sb.FieldBuilder(0).(*array.Float64Builder).Append(1.0)
sb.FieldBuilder(1).(*array.Float64Builder).Append(2.0)
writer.Write(b.NewRecord())
writer.Close()
pf, _ := file.NewParquetReader(bytes.NewReader(buf.Bytes()))
fr, _ := pqarrow.NewFileReader(pf, pqarrow.ArrowReadProperties{}, memory.DefaultAllocator)
// Only read nested.a (leaf index 0), not nested.b (leaf index 1)
partialLeaves := map[int]bool{0: true}
fieldIdx, _ := fr.Manifest.GetFieldIndices([]int{0})
// This panics due to childFields/childReaders length mismatch
fr.GetFieldReader(context.Background(), fieldIdx[0], partialLeaves, []int{0})
}
It is expected that partial struct column reads should work correctly, returning a reader for only the selected fields. but It actually panic due to mismatched slice lengths.
Suggested Fix
Filter childFields alongside childReaders:
childReaders = slices.DeleteFunc(childReaders,
func(r *ColumnReader) bool { return r == nil })
childFields = slices.DeleteFunc(childFields,
func(f arrow.Field) bool { return f.Type == nil })
if len(childFields) == 0 {
return nil, nil
}
Environment
arrow-go version: v18.5.0
Go version: 1.25.5
Component(s)
Parquet
Describe the bug, including details regarding any error messages, version, and platform.
When reading a subset of columns from a struct field (partial column projection), the getReader function in pqarrow/file_reader.go fails to filter childFields in sync with childReaders, causing a length mismatch and potential panic.
In file_reader.go lines 594-604, only childReaders is pruned to remove nil entries, but childFields retains zero-valued arrow.Field{} entries for skipped children:
arrow-go/parquet/pqarrow/file_reader.go
Lines 594 to 604 in 3a01d9f
This causes panic on arrow.StructOf(childFields...)
arrow-go/arrow/datatype_nested.go
Lines 406 to 419 in ca6e0eb
Reproduction
It is expected that partial struct column reads should work correctly, returning a reader for only the selected fields. but It actually panic due to mismatched slice lengths.
Suggested Fix
Filter childFields alongside childReaders:
Environment
arrow-go version: v18.5.0
Go version: 1.25.5
Component(s)
Parquet