Skip to content

[Java] Dictionary decoding not using the compression factory from the ArrowReader  #37841

@freakyzoidberg

Description

@freakyzoidberg

I am trying to decode in Java records generated in Go (simple type + dictionaries) using ZSTD compression (using Arrow 13.0.0)

Although this is working fine for the simple types, I am getting this error when decoding dictionaries

java.lang.IllegalArgumentException: Please add arrow-compression module to use CommonsCompressionFactory for ZSTD
	at org.apache.arrow.vector.compression.NoCompressionCodec$Factory.createCodec(NoCompressionCodec.java:69)
	at org.apache.arrow.vector.VectorLoader.load(VectorLoader.java:82)
	at org.apache.arrow.vector.ipc.ArrowReader.load(ArrowReader.java:256)
	at org.apache.arrow.vector.ipc.ArrowReader.loadDictionary(ArrowReader.java:247)
	at org.apache.arrow.vector.ipc.ArrowStreamReader.loadNextBatch(ArrowStreamReader.java:167)

The Go part is essentially

dtyp := &arrow.DictionaryType{
	IndexType: arrow.PrimitiveTypes.Int8,
	ValueType: arrow.BinaryTypes.LargeString,
}
bldrDictString := arrowarray.NewDictionaryBuilder(memory.DefaultAllocator, dtyp)
defer bldrDictString.Release()

bldrDictString.(*arrowarray.BinaryDictionaryBuilder).AppendString("foo")

columnTypes := make([]arrow.Field, 0, 1)
columnArrays := make([]arrow.Array, 0, 1)

columnArrays = append(columnArrays, bldrDictString.NewArray())
columnTypes = append(columnTypes, arrow.Field{Name: k.key, Type: dtyp, Nullable: nulls.Any()})

schema := arrow.NewSchema(columnTypes, nil)
rec := arrowarray.NewRecord(schema, columnArrays, int64(size))

var buf bytes.Buffer
writer := ipc.NewWriter(&buf, ipc.WithSchema(schema), ipc.WithZstd())
err := writer.Write(rec)
err = writer.Close()

And the Java side

import org.apache.arrow.compression.CommonsCompressionFactory;


try (ArrowStreamReader reader =
         new ArrowStreamReader(
             new ByteArrayInputStream(format.getArrow().toByteArray()),
             bufferAllocator,
             CommonsCompressionFactory.INSTANCE)) {
  reader.loadNextBatch();
  ...
} catch (IOException e) {
  throw new RuntimeException(e);
}

I am able to get it to not throw by making the VectorLoader used when loading the dictionary use the compression factory defined in the reader (it is currently defaulting to NoCompression)

see this change, note I was not able to make it fail using the java arrow test.

I am probably doing something wrong, and also wondering if dictionaries are compressed the same in go and java writers which could explain why the java test is not failing ?

Anyhow, unless I am doing something wrong, this looks like a bug.

Thanks !

Component(s)

Java

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions