Skip to content

Different decimal precision value when reading with parquet native reader v3 #87413

@Selfeer

Description

@Selfeer

Company or project name

No response

Describe what's wrong

When reading from a parquet file with a Decimal column, the precision is different with and without v3 native reader:

input_format_parquet_use_native_reader_v3 = 0

DESCRIBE TABLE file('int32_decimal_1.parquet')
FORMAT TabSeparated
SETTINGS input_format_parquet_use_native_reader_v3 = 0

value	Nullable(Decimal(4, 2))					

input_format_parquet_use_native_reader_v3 = 1

DESCRIBE TABLE file('int32_decimal_1.parquet')
FORMAT TabSeparated
SETTINGS input_format_parquet_use_native_reader_v3 = 1

value	Nullable(Decimal(9, 2))		

Does it reproduce on the most recent release?

Yes

How to reproduce

ClickHouse Version: 25.8.4.13
Settings: input_format_parquet_use_native_reader_v3=1

Here is the python code that generated the parquet file:

from decimal import Decimal, ROUND_HALF_EVEN
import pyarrow as pa
import pyarrow.parquet as pq
import os

OUT_PATH = "test_decimal/int32_decimal.parquet"

os.makedirs(os.path.dirname(OUT_PATH), exist_ok=True)

scale = Decimal("0.01")
values = [Decimal(i).quantize(scale, rounding=ROUND_HALF_EVEN) for i in range(1, 25)]


dtype = pa.decimal128(4, 2)
arr = pa.array(values, type=dtype)
table = pa.Table.from_arrays(
    [arr], schema=pa.schema([pa.field("value", dtype, nullable=True)])
)


pq.write_table(table, OUT_PATH, version="1.0", compression=None, use_dictionary=False)

Expected behavior

No response

Error message and/or stacktrace

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

bugConfirmed user-visible misbehaviour in official releasecomp-formatsInput/output formats (CSV/JSON/Parquet/ORC/Arrow/Protobuf/etc.).

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions