Roaring bitmap to JDK bit set conversion #661

xtonik · 2023-10-29T09:10:11Z

SUMMARY

Added missing routine for conversion RoaringBitmap to BitSet from JDK. Done in very simple - iterating value by value, it is far from optimal possible solution.

Automated Checks

I have run ./gradlew test and made sure that my PR does not break any unit test.
I have run ./gradlew checkstyleMain or the equivalent and corrected the formatting warnings reported.

lemire · 2023-10-30T21:43:05Z

RoaringBitmap/src/main/java/org/roaringbitmap/BitSetUtil.java

+   * @param bitmap original bitmap
+   * @return bit set equivalent to roaring bitmap
+   */
+  public static BitSet bitSetOf(ImmutableRoaringBitmap bitmap) {


If we implement this for ImmutableRoaringBitmap, we should implement it for RoaringBitmap.

What about to use their common ancestor ImmutableBitmapDataProvider?

ImmutableBitmapDataProvider is an interface.

I mean use one of OOP principle "code against interface, not implementation", but I you suggest, I will duplicate the code for RoaringBitmap.

RoaringBitmap/src/main/java/org/roaringbitmap/BitSetUtil.java

RoaringBitmap/src/test/java/org/roaringbitmap/buffer/TestBitSetUtil.java

lemire · 2023-10-30T21:47:10Z

RoaringBitmap/src/main/java/org/roaringbitmap/BitSetUtil.java

+
+  /**
+   * Converts an immutable roaring bitmap to JDK bit set.
+   *


This should be documented as being an inefficient implementation, until we optimize it. I am fine with a 'convenient' but slow routine, but we need to tell the users that it is slow.

I have created some benchmark to optimize conversion. There is trade off between too much code and performance. What variant do you suggest to choose? Should I create separate PR for this or it can be included here?

Benchmark inputData Mode Cnt Score Error Units

iterateThroughContainers CONSECUTIVE avgt 5 52.782 ±3.963 ms/op

iterateThroughContainers RANDOM avgt 5 66.945 ±8.262 ms/op

negativeValuesSeparately CONSECUTIVE avgt 5 62.152 ±1.104 ms/op

negativeValuesSeparately RANDOM avgt 5 85.505 ±12.954 ms/op

negativeValuesSeparatelyRegisterPreviousValue CONSECUTIVE avgt 5 49.961 ±8.429 ms/op

negativeValuesSeparatelyRegisterPreviousValue RANDOM avgt 5 185.567 ±5.139 ms/op

preallocated CONSECUTIVE avgt 5 67.221 ±1.658 ms/op

preallocated RANDOM avgt 5 78.673 ±5.566 ms/op

original CONSECUTIVE avgt 5 75.124 ±27.983 ms/op

original RANDOM avgt 5 84.717 ±3.251 ms/op

streamingApi CONSECUTIVE avgt 5 38 ± ms/op

streamingApi RANDOM avgt 5 85 ± ms/op

The variant using streaming API inspired by #556 was added afterwards and it is magically the fastest for consecutive values.

There is trade off between too much code and performance. What variant do you suggest to choose?

My concern here was that the library user should know what to expect. If we provide a convenience function that is possibly slow, we should warn them... otherwise it can be considered a performance bug.

Not everything needs to be super optimized with complex code, but the user should get some warning if the performance can be poor.

And what variant you suggest we should use? As I wrote earlier, I have created benchmarks for this issue, where are already optimized implementations. Why I ask you what to choose? If you look on benchmark results, there is no solution dominating for both tested cases - for consecutive values is the best variant using streamingApi and for random values iterateThroughContainers. Your suggestions can be included also. After that I can merge chosen one into this PR and do some code clean up - at least comments as they are wrong if I remember right.

Ok. The issue is that your implementation is fine, but you do bit-by-bit conversion.
If the RoaringBitmap contains a bitset container, for example. Then you can just use valueOf(long[] longs) to map directly, at a highspeed, the content of the Roaring container to a JDK BitSet. You are obviously aware of this because you have it in a comment.

I don't think we even need benchmarks, it is going to be much faster...

Now. It is fine not implementing this, but I would request that you then document the performance in the document of the function. (E.g., "this function may set the bits one by one and may be computationally intensive".) That's important because some users might decide to systematically convert their Roaring bitmaps to BitSet. We need to warn them about the performance implications.

The Go roaring implementation and the C Roaring implementations have equivalent functionality btw.

Here is the C implementation: https://github.com/RoaringBitmap/CRoaring/blob/daad140f013cbb4e8b515da264403843d8dc1755/src/roaring.c#L3306

Here is the Go implementation:
https://github.com/RoaringBitmap/roaring/blob/88e9207dc61ee814891a2a7cce4aa8abf1fbac1f/roaring.go#L186

I am not asking you to use these sorts of implementations. They require more testing and more effort. I am just asking you to send the users a signal regarding how it is currently implemented.

RoaringBitmap/src/main/java/org/roaringbitmap/BitSetUtil.java

lemire

We could merge this, but I'd like you to add a comment (maybe just one sentence) in the function description, so that the users know that this do bit-by-bit construction. Further, we should warn them that the memory usage of the result could be significant.

Further, we would want to have the equivalent for ImmutableRoaringBitmap.

lemire · 2024-01-28T17:09:16Z

@xtonik Let me repeat my request. If I were a user of this library, I would assume that a library-provided function that converts a roaring bitmap into a BitSet would try, at least in some cases, to be highly efficient. The implementation you propose is fine, but it will have quite low efficiency in some instances. We need to warn the users so that they cannot complain later if their performance is poor.

lemire · 2024-03-25T13:16:51Z

We are going to go with PR #713 as it provides better performance. Thanks for all your contributions, and keep them coming !!!

roaring bitmap to JDK bit set conversion

712ba9f

lemire reviewed Oct 30, 2023

View reviewed changes

RoaringBitmap/src/main/java/org/roaringbitmap/BitSetUtil.java Outdated Show resolved Hide resolved

lemire reviewed Oct 30, 2023

View reviewed changes

RoaringBitmap/src/test/java/org/roaringbitmap/buffer/TestBitSetUtil.java Outdated Show resolved Hide resolved

lemire reviewed Oct 30, 2023

View reviewed changes

JDK BitSet does not support unsigned int values

56e8e6a

lemire reviewed Nov 2, 2023

View reviewed changes

RoaringBitmap/src/main/java/org/roaringbitmap/BitSetUtil.java Outdated Show resolved Hide resolved

lemire reviewed Nov 2, 2023

View reviewed changes

RoaringBitmap/src/main/java/org/roaringbitmap/BitSetUtil.java Show resolved Hide resolved

xtonik added 2 commits November 5, 2023 19:13

two bitsets are always created

372c0b5

Merge branch 'RoaringBitmap:master' into ROAR_BITMAP_TO_BITSET

06d2073

lemire requested changes Nov 17, 2023

View reviewed changes

lemire added the Proposal made to submitter / waiting for submitter label Jan 27, 2024

lemire mentioned this pull request Mar 23, 2024

RoaringBitmap to BitSet/long[]/byte[] #713

Merged

lemire closed this Mar 25, 2024

Benchmark	inputData	Mode	Cnt	Score	Error	Units
iterateThroughContainers	CONSECUTIVE	avgt	5	52.782	±3.963	ms/op
iterateThroughContainers	RANDOM	avgt	5	66.945	±8.262	ms/op
negativeValuesSeparately	CONSECUTIVE	avgt	5	62.152	±1.104	ms/op
negativeValuesSeparately	RANDOM	avgt	5	85.505	±12.954	ms/op
negativeValuesSeparatelyRegisterPreviousValue	CONSECUTIVE	avgt	5	49.961	±8.429	ms/op
negativeValuesSeparatelyRegisterPreviousValue	RANDOM	avgt	5	185.567	±5.139	ms/op
preallocated	CONSECUTIVE	avgt	5	67.221	±1.658	ms/op
preallocated	RANDOM	avgt	5	78.673	±5.566	ms/op
original	CONSECUTIVE	avgt	5	75.124	±27.983	ms/op
original	RANDOM	avgt	5	84.717	±3.251	ms/op
streamingApi	CONSECUTIVE	avgt	5	38	±	ms/op
streamingApi	RANDOM	avgt	5	85	±	ms/op

Roaring bitmap to JDK bit set conversion #661

Roaring bitmap to JDK bit set conversion #661

Uh oh!

Conversation

xtonik commented Oct 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

SUMMARY

Automated Checks

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xtonik Nov 5, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xtonik Jan 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

lemire left a comment

Choose a reason for hiding this comment

Uh oh!

lemire commented Jan 28, 2024

Uh oh!

lemire commented Mar 25, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

xtonik commented Oct 29, 2023 •

edited

Loading

xtonik Nov 5, 2023 •

edited

Loading

xtonik Jan 27, 2024 •

edited

Loading