-
Notifications
You must be signed in to change notification settings - Fork 4.1k
[Go] Allow access to the underlying MemoTable of a dictionary builder #38988
Description
Describe the enhancement requested
I have a case where knowing the size of the dictionary as values get appended to the dictionary builder will be useful. Specifically, I am indexing data where the number of unique values is unknown. As the number of unique values is more likely to be relatively small in this case, a BinaryDictionaryBuilder is used and only when it is detected that the dictionary will be too big do I fall back to a LargeStringBuilder.
The issue is that there is no easy way to figure out the size of the dictionary in a BinaryDictionaryBuilder today. As a workaround, after each AppendString to the BinaryDictionaryBuilder I do the following
lastDictIndex := bldrDictString.(*arrowarray.BinaryDictionaryBuilder).GetValueIndex(i)
if lastDictIndex+1 > cardinality {
cardinality = lastDictIndex + 1
}
where i is the index of the value appended.
It would be more convenient and potentially less costly if the MemoTable or even just the size of the dictionary is exposed. Do you think this is something that you will be open to? I will be happy to open a PR if so.
Component(s)
Go