Describe the bug, including details regarding any error messages, version, and platform.
When using the following props for a ParquetWriter, there is a discrepancy between the sum of RowGroupTotalBytesWritten() for each Write() call and the actual number of bytes seen by the target io.Writer interface.
parquetProps := parquet.NewWriterProperties(
parquet.WithAllocator(memory.DefaultAllocator),
parquet.WithCompression(compress.Codecs.Snappy),
parquet.WithCompressionLevel(flate.DefaultCompression),
parquet.WithDictionaryDefault(false),
parquet.WithStats(false),
parquet.WithMaxRowGroupLength(math.MaxInt64),
)
arrowProps := pqarrow.NewArrowWriterProperties(pqarrow.WithAllocator(memory.DefaultAllocator))
In this specific case, a 13 MB file had only reported about 10 MB written via RowGroupTotalBytesWritten() calls. Some of the discrepancy can be attributed to metadata that is not included in the row groups, but this likely doesn't explain the entire difference. We should investigate the root cause and either fix it or document the explanation for future users of this API.
Related to arrow-adbc@1456
Component(s)
Go, Parquet