Skip to content

[Go] 32-bit panic: utils.GetMinMaxInt32() #40672

@powersj

Description

@powersj

Describe the bug, including details regarding any error messages, version, and platform.

While working on a parquet file parser, I was running our tests on a 32-bit system and came across a panic. This is reproducable with the in-tree parquet-reader, using the the parquet file at the end of this report. First, it works on my 64-bit system as expected:

$ git clone git@github.com:apache/arrow
$ cd arrow/go/parquet/cmd/parquet_reader
$ cp ~/input.parquet .
$ go run . input.parquet 
File name: input.parquet
Version: v2.6
Created By: parquet-cpp-arrow version 15.0.1
Num Rows: 1
Number of RowGroups: 1
Number of Real Columns: 2
Number of Columns: 2
Number of Selected Columns: 2
Column 0: value (INT64)
Column 1: timestamp (BYTE_ARRAY/UTF8)
--- Row Group: 0  ---
--- Total Bytes: 201  ---
--- Rows: 1  ---
Column 0
 Values: 1, Min: 42, Max: 42, Null Values: 0, Distinct Values: 0
 Compression: SNAPPY, Encodings: PLAIN RLE RLE_DICTIONARY
 Uncompressed Size: 92, Compressed Size: 96
Column 1
 Values: 1, Min: [49 55 49 48 54 56 51 54 48 56 49 52 51 50 50 56 54 57 50], Max: [49 55 49 48 54 56 51 54 48 56 49 52 51 50 50 56 54 57 50], Null Values: 0, Distinct Values: 0
 Compression: SNAPPY, Encodings: PLAIN RLE RLE_DICTIONARY
 Uncompressed Size: 109, Compressed Size: 113
--- Values ---
value             |timestamp         |
42                |1710683608143228692|

Once I force 32-bit arch you can see the crash:

$ GOARCH=386 go run . input.parquet 
File name: input.parquet
Version: v2.6
Created By: parquet-cpp-arrow version 15.0.1
Num Rows: 1
Number of RowGroups: 1
Number of Real Columns: 2
Number of Columns: 2
Number of Selected Columns: 2
Column 0: value (INT64)
Column 1: timestamp (BYTE_ARRAY/UTF8)
--- Row Group: 0  ---
--- Total Bytes: 201  ---
--- Rows: 1  ---
Column 0
 Values: 1, Min: 42, Max: 42, Null Values: 0, Distinct Values: 0
 Compression: SNAPPY, Encodings: PLAIN RLE RLE_DICTIONARY
 Uncompressed Size: 92, Compressed Size: 96
Column 1
 Values: 1, Min: [49 55 49 48 54 56 51 54 48 56 49 52 51 50 50 56 54 57 50], Max: [49 55 49 48 54 56 51 54 48 56 49 52 51 50 50 56 54 57 50], Null Values: 0, Distinct Values: 0
 Compression: SNAPPY, Encodings: PLAIN RLE RLE_DICTIONARY
 Uncompressed Size: 109, Compressed Size: 113
--- Values ---
value             |timestamp         |
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x83502c7]

goroutine 1 [running]:
github.com/apache/arrow/go/v16/internal/utils.GetMinMaxInt32(...)
	/home/powersj/test/arrow/go/internal/utils/min_max.go:190
github.com/apache/arrow/go/v16/parquet/internal/encoding.(*Int64DictConverter).IsValid(0x9492fe0, {0x9413270, 0x1, 0x1})
	/home/powersj/test/arrow/go/parquet/internal/encoding/typed_encoder.gen.go:495 +0x27
github.com/apache/arrow/go/v16/parquet/internal/utils.(*RleDecoder).GetBatchWithDictInt64(0x94e0708, {0x8488974, 0x9492fe0}, {0x95b4c00, 0x1, 0x80})
	/home/powersj/test/arrow/go/parquet/internal/utils/typed_rle_dict.gen.go:378 +0x228
github.com/apache/arrow/go/v16/parquet/internal/utils.(*RleDecoder).GetBatchWithDict(0x94e0708, {0x8488974, 0x9492fe0}, {0x839e4e0, 0x9511a84})
	/home/powersj/test/arrow/go/parquet/internal/utils/rle.go:417 +0x14e
github.com/apache/arrow/go/v16/parquet/internal/encoding.(*dictDecoder).decode(...)
	/home/powersj/test/arrow/go/parquet/internal/encoding/decoder.go:146
github.com/apache/arrow/go/v16/parquet/internal/encoding.(*DictInt64Decoder).Decode(0x95ae700, {0x95b4c00, 0x1, 0x80})
	/home/powersj/test/arrow/go/parquet/internal/encoding/typed_encoder.gen.go:436 +0x75
github.com/apache/arrow/go/v16/parquet/file.(*Int64ColumnChunkReader).ReadBatch.func1(0x0, 0x1)
	/home/powersj/test/arrow/go/parquet/file/column_reader_types.gen.go:93 +0xc3
github.com/apache/arrow/go/v16/parquet/file.(*columnChunkReader).readBatch(0x959c428, 0x80, {0x94f2300, 0x80, 0x80}, {0x94f2400, 0x80, 0x80}, 0x9511b6c)
	/home/powersj/test/arrow/go/parquet/file/column_reader.go:514 +0x2ab
github.com/apache/arrow/go/v16/parquet/file.(*Int64ColumnChunkReader).ReadBatch(0x959c428, 0x80, {0x95b4c00, 0x80, 0x80}, {0x94f2300, 0x80, 0x80}, {0x94f2400, 0x80, ...})
	/home/powersj/test/arrow/go/parquet/file/column_reader_types.gen.go:92 +0xa3
main.(*Dumper).readNextBatch(0x94a0820)
	/home/powersj/test/arrow/go/parquet/cmd/parquet_reader/dumper.go:88 +0x27f
main.(*Dumper).Next(0x94a0820)
	/home/powersj/test/arrow/go/parquet/cmd/parquet_reader/dumper.go:163 +0x61
main.main()
	/home/powersj/test/arrow/go/parquet/cmd/parquet_reader/main.go:359 +0x2a69
exit status 2

The parquet file I used was generated via the following:

#!/usr/bin/env python
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq

df = pandas.DataFrame({
    'value': [42],
    'timestamp': ["1710683608143228692"]
})

pyarrow.parquet.write_table(pyarrow.Table.from_pandas(df), "input.parquet")

Component(s)

Go

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions