[nomerge] faster ++ and union for HashSet#8589
Conversation
7b93385 to
e68f694
Compare
|
Note - perf data is outdated - waiting on a new run, after improvements - but for info the old results were in https://gist.github.com/mkeskells/a3ec74d3dd79c6564ac957c8387c8fda as summary here though, some or it unexpected based on these benchmarks this PR is generally faster, for this ++ after for after for after
after for after
after |
ab40692 to
81ff240
Compare
81ff240 to
df747da
Compare
|
improved, and rebased |
1027a00 to
b480b04
Compare
|
@joshlemer can you have a look as you did lots of work on the 2.13 maps and sets |
644b86d to
01a4794
Compare
|
I'm surprised that adding overlapping sets without structural sharing is 50% slower whereas adding distinct sets is 3x faster. This doesn't sound right. Why would they behave differently if the difference lies in the treatment of structural sharing (which doesn't apply in either case)? |
|
Overlapping set contain the same values, so hit the same hashcodes and therefore HashSet1 merge, whereas the distinct just create merge HashSet1 into a new Trie, or sometime merge a whole Trie because none of the hashes collide in any of its children The benchmarks are out of date though, and I did some work on the merging of the distinct cases to improve the performance. I should have the results in the 24 hours |
|
@mkeskells Do you have the new benchmark results? |
|
@szeiger I have some, but from running on a laptop and some are less stable then I hoped the results and a comparison are On sheet 3 - change the dataset and the operation to see the effect I am less sure on how to visualise the data, what we have is a range of different sizes and datasets, and operations, and CPU/allocation to compare the operations tested are '+' both of the above operate for different amounts of overlap opContainedSharedWithData - subset with a superset (e.g. smallSet ++ superset), with structural sharing The tests allow to data that heavily collides on hashcode , or doesn't, but I don't have the data for colliding data yet so for ++, opWithOverlapShared data the results for different degrees of sharing are
|
|
all of the ++ operation look good to me, with one exception dataWithContainedShared/unshared
generally there is a marked improvement for '++' sameElements, but there are some anomalies e.g. for diff on large sets on opWithOverlapShared e.g.
|
szeiger
left a comment
There was a problem hiding this comment.
Performance looks good. I couldn't figure out how to use the Excel sheet (there's one data set at the top of sheet 3 and a lot of invalid fields, strangely enough with error messages in German, below) but I prefer a summary of the interesting cases anyway.
| var res = HashSet.this | ||
| override def apply(v1: A): Unit = res += v1 | ||
| } | ||
| that foreach acc |
There was a problem hiding this comment.
foreach can run in parallel for a ParSet. This implementation should be restricted to Set.
There was a problem hiding this comment.
reworked as
if (that.isInstanceOf[Set[A]])
that foreach acc
else
that.iterator foreach acc
|
to use the spreadsheet - there are 2 cells at the top or sheet 3 with a border around them When you only get one value, its a test which doesn't have the degree of sharing, e.g. opWithDistinct - if the data sets are distinct, there is no sharing |
|
@mkeskells If I enable editing by copying the file to my account, I can open the dropdown menus but whatever I change them to, I get "illegal value" errors (with the error message in German, despite my account and my system settings all being set to US-English) in all fields that previously contained values. |
a0a2e6a to
1182fa7
Compare
|
Haha if you are confused by it it makes me feel less stupid. |
1182fa7 to
1075a9c
Compare
szeiger
left a comment
There was a problem hiding this comment.
Needs to be squashed and rebased but apart from that it looks good to go.
An additional scalacheck test might be a good idea. While there are tests for all the individual cases in the implementation, there is no guarantee that the test data actually utilizes the intended branch. This is a situation where I would go for property-based testing with generated data.
e95f30f to
a70006b
Compare
EmptySet and SetBuilder adjusted to take advantage of ++ optimisation for simple and common cases eliminate unneeded allocations for HashSets where the result is already built for - subSet ++ superSet subSet union superSet superSet union subSet superSet ++ subSet make a fast path when there is structural sharing in the HashSet for union, guarantee internal operations will only return a new HashSet if one of the existing HashSet parameters or internal values cant be used use System.arraycopy rather than Array.copy as it avoid JVM nulling the array add missing `eq` fast path to intersect0 and diff0 minor improvements to + to reduce allocations reduce calls to HashSet.size which can be a bottleneck reduce allocations in ListSet ++, + and - no allocations in ++ or + if the data is contained in the original sets HashSetCollision1 stores its length so as to avoid calls to ListSet.size which is O(n) take advantage of ListSet guarantees on identity for return or + or - where we can avoid calling ListSet.size where we can avoid it
a70006b to
2613eda
Compare
Regressed in scala#8589
Regressed in scala#8589
Regressed in scala#8589
|
We've just realised that this optimization is wrong for subclasses of /cc @mkeskells |
for 2.12 only - this will not merge directly into 2.13. A separate PR will be needed if any of these changes need to be ported to 2.13
faster operations (mostly union and ++) for HashSet
EmptySet and SetBuilder adjusted to take advantage of ++ optimisation for simple and common cases
eliminate unneeded allocations for HashSets where the result is already built for -
subSet ++ superSet
subSet union superSet
superSet union subSet
superSet ++ subSet
make a fast path when there is structural sharing in the HashSet
for union, guarantee internal operations will only return a new HashSet if one of the existing HashSet parameters or internal values cant be used
use System.arraycopy rather than Array.copy as it avoid JVM nulling the array
add missing
eqfast path to intersect0 and diff0minor improvements to + to reduce allocations
reduce calls to size which is expensive, especially when we know the size beforehand