-
Notifications
You must be signed in to change notification settings - Fork 4.1k
[Go][Parquet] Panic when writing a ListOf(DeltaBinaryPacked field) with no data #39309
Description
Describe the bug, including details regarding any error messages, version, and platform.
It's been quite hard to understand how to write code for using arrow in go, but I believe I've managed to get it working. One of the thing that still stumbles me is the following scenario (I did my best to make it minimal, so it's a bit strange, but as long as you don't think I'm doing something that should be "forbidden" altogether, then consider it's the short form of my use case).
package main
import (
"os"
"github.com/apache/arrow/go/v14/arrow"
"github.com/apache/arrow/go/v14/arrow/array"
"github.com/apache/arrow/go/v14/arrow/memory"
"github.com/apache/arrow/go/v14/parquet"
"github.com/apache/arrow/go/v14/parquet/pqarrow"
)
func main() {
schema := arrow.NewSchema(
[]arrow.Field{
{Name: "ts", Type: arrow.ListOf(arrow.PrimitiveTypes.Uint64)},
}, nil)
builder := array.NewRecordBuilder(memory.DefaultAllocator, schema)
listBuilder := builder.Field(0).(*array.ListBuilder)
listBuilder.Append(true)
arrowRec := builder.NewRecord()
f, err := os.CreateTemp(".", "test.parquet")
if err != nil {
panic(err)
}
fileWriter, err := pqarrow.NewFileWriter(
arrowRec.Schema(),
f,
parquet.NewWriterProperties(
parquet.WithDictionaryFor("ts.list.element", false),
parquet.WithEncodingFor("ts.list.element", parquet.Encodings.DeltaBinaryPacked),
),
pqarrow.DefaultWriterProps(),
)
parquetWriter := fileWriter
if err != nil {
panic(err)
}
parquetWriter.WriteBuffered(arrowRec)
arrowRec.Release()
}(I have no idea whether this is the right way to build a parquet with the lib, feel free to point me to better ways if there are)
So in human words: I'm creating an Arrow schema containing a list of uints. Then I write a single empty list. I indicate that parquet should write the uints as DeltaBinaryPacked. Finally, I write the parquet.
Without the Encoding part, this works, but when I specify DeltaBinaryPacked, I get a panic.
Failure is in BitWriter.Written, but the BitWriter pointer is nil.
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xc6a3fc]
goroutine 1 [running]:
github.com/apache/arrow/go/v14/parquet/internal/utils.(*BitWriter).Written(0x0)
/home/joachim/go/pkg/mod/github.com/apache/arrow/go/v14@v14.0.2/parquet/internal/utils/bit_writer.go:109 +0x1c
github.com/apache/arrow/go/v14/parquet/internal/encoding.(*deltaBitPackEncoder).EstimatedDataEncodedSize(0xc0002be900)
/home/joachim/go/pkg/mod/github.com/apache/arrow/go/v14@v14.0.2/parquet/internal/encoding/delta_bit_packing.go:461 +0x2c
github.com/apache/arrow/go/v14/parquet/file.(*columnWriter).commitWriteAndCheckPageLimit(0xc0003d8000, 0x1, 0x0)
/home/joachim/go/pkg/mod/github.com/apache/arrow/go/v14@v14.0.2/parquet/file/column_writer.go:255 +0x86
github.com/apache/arrow/go/v14/parquet/file.(*Int64ColumnChunkWriter).WriteBatchSpaced.func1(0x0, 0x1)
/home/joachim/go/pkg/mod/github.com/apache/arrow/go/v14@v14.0.2/parquet/file/column_writer_types.gen.go:350 +0x412
github.com/apache/arrow/go/v14/parquet/file.doBatches(0x1, 0x400, 0xc0002d00b0)
/home/joachim/go/pkg/mod/github.com/apache/arrow/go/v14@v14.0.2/parquet/file/column_writer.go:631 +0xf5
github.com/apache/arrow/go/v14/parquet/file.(*Int64ColumnChunkWriter).WriteBatchSpaced(0xc0003d8000, {0x0, 0x0, 0x0}, {0xc00036c3c0, 0x1, 0x80}, {0xc00036c500, 0x1, 0x80}, ...)
/home/joachim/go/pkg/mod/github.com/apache/arrow/go/v14@v14.0.2/parquet/file/column_writer_types.gen.go:336 +0x247
github.com/apache/arrow/go/v14/parquet/pqarrow.writeDenseArrow(0xc000346730, {0x17c89b0, 0xc0003d8000}, {0x17c57a0, 0xc00033ba80}, {0xc00036c3c0, 0x1, 0x80}, {0xc00036c500, 0x1, ...}, ...)
/home/joachim/go/pkg/mod/github.com/apache/arrow/go/v14@v14.0.2/parquet/pqarrow/encode_arrow.go:436 +0x57a9
github.com/apache/arrow/go/v14/parquet/pqarrow.WriteArrowToColumn({0x17b4180, 0xc00037a870}, {0x17c89b0, 0xc0003d8000}, {0x17c57a0, 0xc00033ba80}, {0xc00036c3c0, 0x1, 0x80}, {0xc00036c500, ...}, ...)
/home/joachim/go/pkg/mod/github.com/apache/arrow/go/v14@v14.0.2/parquet/pqarrow/encode_arrow.go:232 +0x3cf
github.com/apache/arrow/go/v14/parquet/pqarrow.(*ArrowColumnWriter).Write(0xc0002d1860, {0x17b4180, 0xc00037a870})
/home/joachim/go/pkg/mod/github.com/apache/arrow/go/v14@v14.0.2/parquet/pqarrow/encode_arrow.go:193 +0x832
github.com/apache/arrow/go/v14/parquet/pqarrow.(*FileWriter).WriteColumnChunked(0xc0003d4000, 0xc00033ba00, 0x0, 0x1)
/home/joachim/go/pkg/mod/github.com/apache/arrow/go/v14@v14.0.2/parquet/pqarrow/file_writer.go:313 +0x27a
github.com/apache/arrow/go/v14/parquet/pqarrow.(*FileWriter).WriteColumnData(0xc0003d4000, {0x17c5aa0, 0xc0003466e0})
/home/joachim/go/pkg/mod/github.com/apache/arrow/go/v14@v14.0.2/parquet/pqarrow/file_writer.go:322 +0x17d
github.com/apache/arrow/go/v14/parquet/pqarrow.(*FileWriter).WriteBuffered(0xc0003d4000, {0x17c3ae8, 0xc00037a4e0})
/home/joachim/go/pkg/mod/github.com/apache/arrow/go/v14@v14.0.2/parquet/pqarrow/file_writer.go:186 +0x83b
main.main()
/home/joachim/src/goplayground/play/main.go:66 +0x472
Runs on
v14, supposedly on master, but I new enough in go that I'm not really sure it's really master and not the latest stable v14 tag, go version go1.21.4 linux/amd64, ubuntu 22.04 LTS
Component(s)
Go, Parquet