Skip to content

[Go] Allow access to the underlying MemoTable of a dictionary builder #38988

@ella-chao

Description

@ella-chao

Describe the enhancement requested

I have a case where knowing the size of the dictionary as values get appended to the dictionary builder will be useful. Specifically, I am indexing data where the number of unique values is unknown. As the number of unique values is more likely to be relatively small in this case, a BinaryDictionaryBuilder is used and only when it is detected that the dictionary will be too big do I fall back to a LargeStringBuilder.

The issue is that there is no easy way to figure out the size of the dictionary in a BinaryDictionaryBuilder today. As a workaround, after each AppendString to the BinaryDictionaryBuilder I do the following

lastDictIndex := bldrDictString.(*arrowarray.BinaryDictionaryBuilder).GetValueIndex(i)
if lastDictIndex+1 > cardinality {
    cardinality = lastDictIndex + 1
}

where i is the index of the value appended.

It would be more convenient and potentially less costly if the MemoTable or even just the size of the dictionary is exposed. Do you think this is something that you will be open to? I will be happy to open a PR if so.

Component(s)

Go

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions