-
Notifications
You must be signed in to change notification settings - Fork 3.1k
[Java API] Rough edges when partitioning by time types #11899
Copy link
Copy link
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Apache Iceberg version
1.7.1 (latest release)
Query engine
Other
Please describe the bug 🐞
We've been developing an Iceberg connector at Apache Beam using the Java API, and I noticed some rough edges around partitioning by time types (i.e. year, month, day or hour).
See the following code:
org.apache.iceberg.Schema schema =
new org.apache.iceberg.Schema(
Types.NestedField.required(1, "year", Types.TimestampType.withoutZone()),
Types.NestedField.required(2, "day", Types.TimestampType.withoutZone()));
PartitionSpec spec = PartitionSpec.builderFor(schema)
.year("year")
.day("day").build();
Table table = catalog.createTable(TableIdentifier.parse("db.table"), schema, spec);
PartitionKey pk = new PartitionKey(spec, schema);
LocalDateTime val = LocalDateTime.parse("2024-10-08T13:18:20.053");
Record rec = GenericRecord.create(schema).copy(
ImmutableMap.of(
"year", val,
"day", val));
pk.partition(rec);I'm applying a simple partition to my original record and would expect it to work normally, but the last line fails with the following error:
java.lang.IllegalStateException: Not an instance of java.lang.Long: 2024-10-08T13:18:20.053
at org.apache.iceberg.data.GenericRecord.get(GenericRecord.java:123)
at org.apache.iceberg.Accessors$PositionAccessor.get(Accessors.java:71)
at org.apache.iceberg.Accessors$PositionAccessor.get(Accessors.java:58)
at org.apache.iceberg.StructTransform.wrap(StructTransform.java:78)
at org.apache.iceberg.PartitionKey.wrap(PartitionKey.java:30)
at org.apache.iceberg.PartitionKey.partition(PartitionKey.java:64)
We've been able to work around it with this logic, replicated below:
Work-around
private Record getPartitionableRecord(
Record record, PartitionSpec spec, org.apache.iceberg.Schema schema) {
if (spec.isUnpartitioned()) {
return record;
}
Record output = GenericRecord.create(schema);
for (PartitionField partitionField : spec.fields()) {
Transform<?, ?> transform = partitionField.transform();
Types.NestedField field = schema.findField(partitionField.sourceId());
String name = field.name();
Object value = record.getField(name);
@Nullable Literal<Object> literal = Literal.of(value.toString()).to(field.type());
if (literal == null || transform.isVoid() || transform.isIdentity()) {
output.setField(name, value);
} else {
output.setField(name, literal.value());
}
}
return output;
}So that instead we have this:
Record partitionableRec = getPartitionableRecord(rec, spec, schema);
pk.partition(partitionableRec);This feels a little hacky and I would expect the Iceberg API to handle this by itself. Let me know if I'm missing something!
Willingness to contribute
- I can contribute a fix for this bug independently
- I would be willing to contribute a fix for this bug with guidance from the Iceberg community
- I cannot contribute a fix for this bug at this time
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working