-
Notifications
You must be signed in to change notification settings - Fork 4.1k
[Java] Dictionary decoding not using the compression factory from the ArrowReader #37841
Description
I am trying to decode in Java records generated in Go (simple type + dictionaries) using ZSTD compression (using Arrow 13.0.0)
Although this is working fine for the simple types, I am getting this error when decoding dictionaries
java.lang.IllegalArgumentException: Please add arrow-compression module to use CommonsCompressionFactory for ZSTD
at org.apache.arrow.vector.compression.NoCompressionCodec$Factory.createCodec(NoCompressionCodec.java:69)
at org.apache.arrow.vector.VectorLoader.load(VectorLoader.java:82)
at org.apache.arrow.vector.ipc.ArrowReader.load(ArrowReader.java:256)
at org.apache.arrow.vector.ipc.ArrowReader.loadDictionary(ArrowReader.java:247)
at org.apache.arrow.vector.ipc.ArrowStreamReader.loadNextBatch(ArrowStreamReader.java:167)
The Go part is essentially
dtyp := &arrow.DictionaryType{
IndexType: arrow.PrimitiveTypes.Int8,
ValueType: arrow.BinaryTypes.LargeString,
}
bldrDictString := arrowarray.NewDictionaryBuilder(memory.DefaultAllocator, dtyp)
defer bldrDictString.Release()
bldrDictString.(*arrowarray.BinaryDictionaryBuilder).AppendString("foo")
columnTypes := make([]arrow.Field, 0, 1)
columnArrays := make([]arrow.Array, 0, 1)
columnArrays = append(columnArrays, bldrDictString.NewArray())
columnTypes = append(columnTypes, arrow.Field{Name: k.key, Type: dtyp, Nullable: nulls.Any()})
schema := arrow.NewSchema(columnTypes, nil)
rec := arrowarray.NewRecord(schema, columnArrays, int64(size))
var buf bytes.Buffer
writer := ipc.NewWriter(&buf, ipc.WithSchema(schema), ipc.WithZstd())
err := writer.Write(rec)
err = writer.Close()And the Java side
import org.apache.arrow.compression.CommonsCompressionFactory;
try (ArrowStreamReader reader =
new ArrowStreamReader(
new ByteArrayInputStream(format.getArrow().toByteArray()),
bufferAllocator,
CommonsCompressionFactory.INSTANCE)) {
reader.loadNextBatch();
...
} catch (IOException e) {
throw new RuntimeException(e);
}I am able to get it to not throw by making the VectorLoader used when loading the dictionary use the compression factory defined in the reader (it is currently defaulting to NoCompression)
see this change, note I was not able to make it fail using the java arrow test.
I am probably doing something wrong, and also wondering if dictionaries are compressed the same in go and java writers which could explain why the java test is not failing ?
Anyhow, unless I am doing something wrong, this looks like a bug.
Thanks !
Component(s)
Java