Skip to content

[Go][Parquet] panic when writing particular dataset encoded by DeltaBitPacked #37102

@Illyrix

Description

@Illyrix

Describe the bug, including details regarding any error messages, version, and platform.

version: v12.0.1

here is the testcase which cause panic:
parquet/internal/encoding/encoding_test.go

func TestWriteDeltaBitPackedInt64(t *testing.T) {
	column := schema.NewColumn(schema.NewInt64Node("int64", parquet.Repetitions.Required, -1), 0, 0)

	tests := []struct {
		name     string
		toencode []int64
		expected []byte
	}{
		{"panic data", []int64{
			0, 3000000000000000000, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
			0, 3000000000000000000, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
			0, 3000000000000000000, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
			0, 3000000000000000000, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
			0, 0,
		}, []byte{0, 0, 0, 0, 0}}, // ignore expected bytes
	}

	for _, tt := range tests {
		t.Run(tt.name, func(t *testing.T) {
			enc := encoding.NewEncoder(parquet.Types.Int64, parquet.Encodings.DeltaBinaryPacked, false, column, memory.DefaultAllocator)

			enc.(encoding.Int64Encoder).Put(tt.toencode)
			buf, _ := enc.FlushValues()        // <------------- panic on this line
			defer buf.Release()

			assert.Equal(t, tt.expected, buf.Bytes())

			dec := encoding.NewDecoder(parquet.Types.Int64, parquet.Encodings.DeltaBinaryPacked, column, memory.DefaultAllocator)

			dec.(encoding.Int64Decoder).SetData(len(tt.toencode), buf.Bytes())
			out := make([]int64, len(tt.toencode))
			dec.(encoding.Int64Decoder).Decode(out)
			assert.Equal(t, tt.toencode, out)
		})
	}

	// other subtests...
}

what I expect:

no panic and data has been packed into bytes array

what I actually get:

Running tool: /usr/local/go/bin/go test -timeout 30s -run ^TestWriteDeltaBitPackedInt64$ github.com/apache/arrow/go/v13/parquet/internal/encoding -v

=== RUN   TestWriteDeltaBitPackedInt64
=== RUN   TestWriteDeltaBitPackedInt64/panic_data
--- FAIL: TestWriteDeltaBitPackedInt64 (0.00s)
    --- FAIL: TestWriteDeltaBitPackedInt64/panic_data (0.00s)
panic: runtime error: slice bounds out of range [:1026] with capacity 1024 [recovered]
        panic: runtime error: slice bounds out of range [:1026] with capacity 1024

goroutine 19 [running]:
testing.tRunner.func1.2({0x103b6c000, 0x1400019c618})
        /usr/local/go/src/testing/testing.go:1526 +0x1c8
testing.tRunner.func1()
        /usr/local/go/src/testing/testing.go:1529 +0x364
panic({0x103b6c000, 0x1400019c618})
        /usr/local/go/src/runtime/panic.go:884 +0x1f4
github.com/apache/arrow/go/v13/parquet/internal/encoding.(*deltaBitPackEncoder).FlushValues(0x140001eec00)
        /Users/illyrix/Workspace/arrow/go/parquet/internal/encoding/delta_bit_packing.go:457 +0x33c
github.com/apache/arrow/go/v13/parquet/internal/encoding_test.TestWriteDeltaBitPackedInt64.func1(0x0?)
        /Users/illyrix/Workspace/arrow/go/parquet/internal/encoding/encoding_test.go:642 +0xac
testing.tRunner(0x14000185ba0, 0x14000401680)
        /usr/local/go/src/testing/testing.go:1576 +0x104
created by testing.(*T).Run
        /usr/local/go/src/testing/testing.go:1629 +0x370
FAIL    github.com/apache/arrow/go/v13/parquet/internal/encoding        0.976s

there is another issue about delta_bit_packing( #35718 ), but it may be a different bug. All values in our test case are non-null.

Component(s)

Go, Parquet

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions