-
Notifications
You must be signed in to change notification settings - Fork 8.3k
Closed
Closed
Copy link
Labels
Description
Company or project name
No response
Describe what's wrong
Timezone info lost when reading DateTime columns from Glue DataLakeCatalog.
Example of reading same data with rest and glue catalog:
Glue:
SELECT * FROM database_59caa3c5_46be_11f0_a37e_159f99dc2bbb.\`namespace_59caa3c4_46be_11f0_a37e_159f99dc2bbb.table_59caa3c3_46be_11f0_a37e_159f99dc2bbb\` ORDER BY tuple(*) FORMAT TabSeparated2024-01-01 15:00:00.000000 2024-01-01 15:00:00.000000
SHOW CREATE TABLE database_59caa3c5_46be_11f0_a37e_159f99dc2bbb.\`namespace_59caa3c4_46be_11f0_a37e_159f99dc2bbb.table_59caa3c3_46be_11f0_a37e_159f99dc2bbb\`CREATE TABLE database_59caa3c5_46be_11f0_a37e_159f99dc2bbb.`namespace_59caa3c4_46be_11f0_a37e_159f99dc2bbb.table_59caa3c3_46be_11f0_a37e_159f99dc2bbb`\n(\n `timestamp` Nullable(DateTime64(6)),\n `timestamptz` Nullable(DateTime64(6))\n)\nENGINE = Iceberg(\'s3://glue***')
Rest:
SELECT * FROM database_92115b01_46be_11f0_a37e_159f99dc2bbb.\`namespace_92115b00_46be_11f0_a37e_159f99dc2bbb.table_92115aff_46be_11f0_a37e_159f99dc2bbb\` ORDER BY tuple(*) FORMAT TabSeparated2024-01-01 15:00:00.000000 2024-01-01 12:00:00.000000
SHOW CREATE TABLE database_92115b01_46be_11f0_a37e_159f99dc2bbb.\`namespace_92115b00_46be_11f0_a37e_159f99dc2bbb.table_92115aff_46be_11f0_a37e_159f99dc2bbb\`CREATE TABLE database_92115b01_46be_11f0_a37e_159f99dc2bbb.`namespace_92115b00_46be_11f0_a37e_159f99dc2bbb.table_92115aff_46be_11f0_a37e_159f99dc2bbb`\n(\n `timestamp` Nullable(DateTime64(6)),\n `timestamptz` Nullable(DateTime64(6, \'UTC\'))\n)\nENGINE = Iceberg(\'http://minio:9000/warehouse/data/\', \'admin\', \'[HIDDEN]\')
So I see that timezone is there Nullable(DateTime64(6, 'UTC'))
Does it reproduce on the most recent release?
Yes
How to reproduce
- Create iceberg table with TimestamptzType column. Insert data with timezone.
I use PyIceberg:
schema = Schema(
NestedField(
field_id=1, name="timestamp", field_type=TimestampType(), required=False
),
NestedField(
field_id=2,
name="timestamptz",
field_type=TimestamptzType(),
required=False,
),
)
data = [
{
"timestamp": datetime(2024, 1, 1, hour=12, minute=0, second=0, microsecond=0),
"timestamptz": datetime(
2024,
1,
1,
hour=12,
minute=0,
second=0,
microsecond=0,
tzinfo=pytz.timezone("UTC"),
),
}
]
df = pa.Table.from_pylist(data)
table.append(df)- Create database DataLakeCatalog with type=glue
- Select data from table
Expected behavior
No response
Error message and/or stacktrace
No response
Additional context
I use aws glue.
Reactions are currently unavailable