Skip to content

Speed up flush of softdelete by intoBitset#14552

Merged
gf2121 merged 25 commits intoapache:mainfrom
gf2121:soft_delete_flush
May 9, 2025
Merged

Speed up flush of softdelete by intoBitset#14552
gf2121 merged 25 commits intoapache:mainfrom
gf2121:soft_delete_flush

Conversation

@gf2121
Copy link
Contributor

@gf2121 gf2121 commented Apr 24, 2025

Speed up the flush of softdelete by intoBitset.

relates: #14521

@gf2121 gf2121 marked this pull request as draft April 24, 2025 18:01
@gf2121
Copy link
Contributor Author

gf2121 commented Apr 25, 2025

I managed to get some numbers on this change:

Baseline: Soft delete flush total took: 30311ms, IndexedDISI#writeBitset total took: 8324ms.
Candidate: Soft delete flush total took: 22635ms, IndexedDISI#writeBitset total took: 964ms.

BTW, i added some docvalue fields in my benchmark. Then i'm seeing rather huge cost of virtual calls represented by vtable stub when computing min/max/gcd.

image

@gf2121 gf2121 marked this pull request as ready for review April 25, 2025 06:40
@gf2121
Copy link
Contributor Author

gf2121 commented Apr 28, 2025

I made some effort to simplify the change, this is ready for review now.

@gf2121
Copy link
Contributor Author

gf2121 commented Apr 28, 2025

Hi @jpountz , may i have your opinion on this optimization?

@gf2121 gf2121 requested a review from jpountz April 28, 2025 21:30
Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some small comments but the change makes sense to me, nice optimization.

// we need a scratch bitset because the param bitset doesn't allow bits to be cleared.
if (scratch == null || scratch.length() < upTo - offset) {
// intoBitSet is usually called with fixed window size so we do not do overSize here.
scratch = new FixedBitSet(upTo - offset);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use FixedBitSet#ensureCapacity?

}

// we need a scratch bitset because the param bitset doesn't allow bits to be cleared.
if (scratch == null || scratch.length() < upTo - offset) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we compare scratch.length() with bitSet.length() rather than upTo - offset? My concern is that there are multiple call sites that call intoBitSet with upTo=NO_MORE_DOCS, so this could create a giant bit set?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, nice catch!

@Override
public int docIDRunEnd() throws IOException {
return in.docIDRunEnd();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe extract this change to a separate PR and cover all wrappers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I filed #14618 to cover SingletonSortedNumericDocValues and SingletonSortedSetDocValues. I think all these work that implement methods for wrappers could be avoided after #14475.

@@ -45,11 +45,12 @@ public final class FixedBitSet extends BitSet {

/**
* If the given {@link FixedBitSet} is large enough to hold {@code numBits+1}, returns the given
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is typically called with pattern like:

FixedBitSet bits = FixedBitSet.ensureCapacity(bits, doc);
bits.set(doc):

So the returned bitSet is required to hold 0(inclusive) to numBits(inclusive), which is numBits+1 bits.

@gf2121 gf2121 merged commit 7494bd1 into apache:main May 9, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants