-
Notifications
You must be signed in to change notification settings - Fork 4.1k
[Go] 32-bit panic: utils.GetMinMaxInt32() #40672
Copy link
Copy link
Closed
Description
Describe the bug, including details regarding any error messages, version, and platform.
While working on a parquet file parser, I was running our tests on a 32-bit system and came across a panic. This is reproducable with the in-tree parquet-reader, using the the parquet file at the end of this report. First, it works on my 64-bit system as expected:
$ git clone git@github.com:apache/arrow
$ cd arrow/go/parquet/cmd/parquet_reader
$ cp ~/input.parquet .
$ go run . input.parquet
File name: input.parquet
Version: v2.6
Created By: parquet-cpp-arrow version 15.0.1
Num Rows: 1
Number of RowGroups: 1
Number of Real Columns: 2
Number of Columns: 2
Number of Selected Columns: 2
Column 0: value (INT64)
Column 1: timestamp (BYTE_ARRAY/UTF8)
--- Row Group: 0 ---
--- Total Bytes: 201 ---
--- Rows: 1 ---
Column 0
Values: 1, Min: 42, Max: 42, Null Values: 0, Distinct Values: 0
Compression: SNAPPY, Encodings: PLAIN RLE RLE_DICTIONARY
Uncompressed Size: 92, Compressed Size: 96
Column 1
Values: 1, Min: [49 55 49 48 54 56 51 54 48 56 49 52 51 50 50 56 54 57 50], Max: [49 55 49 48 54 56 51 54 48 56 49 52 51 50 50 56 54 57 50], Null Values: 0, Distinct Values: 0
Compression: SNAPPY, Encodings: PLAIN RLE RLE_DICTIONARY
Uncompressed Size: 109, Compressed Size: 113
--- Values ---
value |timestamp |
42 |1710683608143228692|Once I force 32-bit arch you can see the crash:
$ GOARCH=386 go run . input.parquet
File name: input.parquet
Version: v2.6
Created By: parquet-cpp-arrow version 15.0.1
Num Rows: 1
Number of RowGroups: 1
Number of Real Columns: 2
Number of Columns: 2
Number of Selected Columns: 2
Column 0: value (INT64)
Column 1: timestamp (BYTE_ARRAY/UTF8)
--- Row Group: 0 ---
--- Total Bytes: 201 ---
--- Rows: 1 ---
Column 0
Values: 1, Min: 42, Max: 42, Null Values: 0, Distinct Values: 0
Compression: SNAPPY, Encodings: PLAIN RLE RLE_DICTIONARY
Uncompressed Size: 92, Compressed Size: 96
Column 1
Values: 1, Min: [49 55 49 48 54 56 51 54 48 56 49 52 51 50 50 56 54 57 50], Max: [49 55 49 48 54 56 51 54 48 56 49 52 51 50 50 56 54 57 50], Null Values: 0, Distinct Values: 0
Compression: SNAPPY, Encodings: PLAIN RLE RLE_DICTIONARY
Uncompressed Size: 109, Compressed Size: 113
--- Values ---
value |timestamp |
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x83502c7]
goroutine 1 [running]:
github.com/apache/arrow/go/v16/internal/utils.GetMinMaxInt32(...)
/home/powersj/test/arrow/go/internal/utils/min_max.go:190
github.com/apache/arrow/go/v16/parquet/internal/encoding.(*Int64DictConverter).IsValid(0x9492fe0, {0x9413270, 0x1, 0x1})
/home/powersj/test/arrow/go/parquet/internal/encoding/typed_encoder.gen.go:495 +0x27
github.com/apache/arrow/go/v16/parquet/internal/utils.(*RleDecoder).GetBatchWithDictInt64(0x94e0708, {0x8488974, 0x9492fe0}, {0x95b4c00, 0x1, 0x80})
/home/powersj/test/arrow/go/parquet/internal/utils/typed_rle_dict.gen.go:378 +0x228
github.com/apache/arrow/go/v16/parquet/internal/utils.(*RleDecoder).GetBatchWithDict(0x94e0708, {0x8488974, 0x9492fe0}, {0x839e4e0, 0x9511a84})
/home/powersj/test/arrow/go/parquet/internal/utils/rle.go:417 +0x14e
github.com/apache/arrow/go/v16/parquet/internal/encoding.(*dictDecoder).decode(...)
/home/powersj/test/arrow/go/parquet/internal/encoding/decoder.go:146
github.com/apache/arrow/go/v16/parquet/internal/encoding.(*DictInt64Decoder).Decode(0x95ae700, {0x95b4c00, 0x1, 0x80})
/home/powersj/test/arrow/go/parquet/internal/encoding/typed_encoder.gen.go:436 +0x75
github.com/apache/arrow/go/v16/parquet/file.(*Int64ColumnChunkReader).ReadBatch.func1(0x0, 0x1)
/home/powersj/test/arrow/go/parquet/file/column_reader_types.gen.go:93 +0xc3
github.com/apache/arrow/go/v16/parquet/file.(*columnChunkReader).readBatch(0x959c428, 0x80, {0x94f2300, 0x80, 0x80}, {0x94f2400, 0x80, 0x80}, 0x9511b6c)
/home/powersj/test/arrow/go/parquet/file/column_reader.go:514 +0x2ab
github.com/apache/arrow/go/v16/parquet/file.(*Int64ColumnChunkReader).ReadBatch(0x959c428, 0x80, {0x95b4c00, 0x80, 0x80}, {0x94f2300, 0x80, 0x80}, {0x94f2400, 0x80, ...})
/home/powersj/test/arrow/go/parquet/file/column_reader_types.gen.go:92 +0xa3
main.(*Dumper).readNextBatch(0x94a0820)
/home/powersj/test/arrow/go/parquet/cmd/parquet_reader/dumper.go:88 +0x27f
main.(*Dumper).Next(0x94a0820)
/home/powersj/test/arrow/go/parquet/cmd/parquet_reader/dumper.go:163 +0x61
main.main()
/home/powersj/test/arrow/go/parquet/cmd/parquet_reader/main.go:359 +0x2a69
exit status 2The parquet file I used was generated via the following:
#!/usr/bin/env python
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq
df = pandas.DataFrame({
'value': [42],
'timestamp': ["1710683608143228692"]
})
pyarrow.parquet.write_table(pyarrow.Table.from_pandas(df), "input.parquet")Component(s)
Go
Reactions are currently unavailable