[DRAFT] plumbing: fully support TREE, REUC, LINK, UNTR, EOIE, FSMN, IEOT index extensions#1622
[DRAFT] plumbing: fully support TREE, REUC, LINK, UNTR, EOIE, FSMN, IEOT index extensions#1622christian-roggia wants to merge 12 commits into
Conversation
|
NOTE: The TREE extension decoder has been updated to match the behavior of the C implementation. The decoder should continue reading the entry value until a newline is encountered to ensure the buffer advances correctly to the next entry. Additionally, invalidated TREE entries should be preserved rather than discarded as in the original logic. Preserving these entries retains valuable information and enables re-encoding the index byte-for-byte exactly as intended. |
|
NOTE: The REUC decoder has been updated to correctly decode stages in the intended order. Previously, iterating over the map caused a random order since maps are unordered and iteration order is not guaranteed. |
pjbgf
left a comment
There was a problem hiding this comment.
@christian-roggia thanks for looking into this. The changes are looking good, although I need to take a closer look around the extensions on a follow-up review.
Please add some tests around the ewah code and rebase the PR.
| ) | ||
|
|
||
| func ReadFrom(r io.Reader) (*Bitmap, error) { | ||
| var bits uint32 |
| RLWLargestLiteralCount = (1 << RLWLiteralBits) - 1 | ||
| ) | ||
|
|
||
| func GetRunBit(rlw uint64) bool { |
There was a problem hiding this comment.
| func GetRunBit(rlw uint64) bool { | |
| // RunBit returns whether the run bit in rlw is set. | |
| func RunBit(rlw uint64) bool { |
| return rlw&1 != 0 | ||
| } | ||
|
|
||
| func GetRunningLen(rlw uint64) uint64 { |
There was a problem hiding this comment.
| func GetRunningLen(rlw uint64) uint64 { | |
| // RunningLen extracts rlw's running length. | |
| func RunningLen(rlw uint64) uint64 { |
| return uint64((rlw >> 1) & RLWLargestRunningCount) | ||
| } | ||
|
|
||
| func GetLiteralWords(rlw uint64) uint64 { |
There was a problem hiding this comment.
| func GetLiteralWords(rlw uint64) uint64 { | |
| // LiteralWords extracts the number of literal words in rlw. | |
| func LiteralWords(rlw uint64) uint64 { |
| return false | ||
| } | ||
|
|
||
| // ForEach calls fn() for each set bit. |
There was a problem hiding this comment.
| // ForEach calls fn() for each set bit. | |
| // ForEach calls fn() for each set bit. | |
| // The returning bool from fn defines whether iteration should continue. |
| } | ||
| } | ||
|
|
||
| func (b *Bitmap) NumBits() uint64 { |
There was a problem hiding this comment.
What's the core difference between NumBits and Bits? Please document both funcs.
| return uint64(rlw >> (1 + RLWRunningBits)) | ||
| } | ||
|
|
||
| func (b *Bitmap) Get(pos uint64) bool { |
There was a problem hiding this comment.
Wouldn't At be a better name here? I'm assuming this checks whether a bit is set at a given position. Is that right?
| func (b *Bitmap) Get(pos uint64) bool { | |
| func (b *Bitmap) At(pos uint64) bool { |
Please document this func.
| if idx.ResolveUndo != nil { | ||
| if err := e.encodeREUC(idx.ResolveUndo); err != nil { | ||
| return err | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
Is this not duplicated with L233-L237?
| if idx.ResolveUndo != nil { | |
| if err := e.encodeREUC(idx.ResolveUndo); err != nil { | |
| return err | |
| } | |
| } |
This pull request introduces full support for the TREE, REUC, LINK, UNTR, FSMN, IEOT, and EOIE index extensions. Partial decoding support for the TREE, EOIE and REUC extensions already existed, but encoding was missing. There are a few other official extensions not yet implemented, which can be added in future updates.
I would appreciate an initial review of these changes as I continue testing and validating support for the new index extensions in our environment.