Skip to content

Improved performance of @UniqueElements(by=NOT_SET)`#491

Merged
jlink merged 1 commit intojqwik-team:mainfrom
DirkToewe:main
Jun 18, 2023
Merged

Improved performance of @UniqueElements(by=NOT_SET)`#491
jlink merged 1 commit intojqwik-team:mainfrom
DirkToewe:main

Conversation

@DirkToewe
Copy link
Copy Markdown
Contributor

Overview

This PR aims at significantly improving the performance of @UniqueElements(by=NOT_SET), mainly by replacing calls to uniqueElements(x->x) by uniqueElements() wherever possible.

Details

If we look at the following two property tests:

@PropertyDefaults( tries = 100_000 )
public class UniqueElementsTest
{
  @Property void testA( @ForAll @UniqueElements int[] a ) {
    Arrays.sort(a);
    for( int i=1; i < a.length; i++ )
      assert a[i-1] <= a[i];
  }

  @Provide Arbitrary<int[]> arrays() {
    return Arbitraries.integers().array(int[].class).uniqueElements();
  }

  @Property void testB( @ForAll("arrays") int[] a ) {
    Arrays.sort(a);
    for( int i=1; i < a.length; i++ )
      assert a[i-1] <= a[i];
  }
}

On my machine, it takes 20 seconds for testA to finish while testB only takes 2 seconds. The problem becomes much more emphasized with larger array sizes.


I hereby agree to the terms of the jqwik Contributor Agreement.

// instead of `items instanceof HashSet`, because subclasses
// of HashSet may break Set conventions.
return items -> items.getClass().equals(HashSet.class)
|| items.size() == items.stream().distinct().count();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a suboptimal way to check uniqueness. You can stop as soon as you detect the first non-unique element.

At the same time, elements in hashset are always unique, aren't they?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh You're right, it would be much faster to stop early. I will fix that.

Yes that's why added the HashSet clause. It might speed things up sometimes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the logic. Now it should stop early.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. There's one more case doing stream.distinct.count:

long uniqueCount = elements.stream().map(this::applySafe).distinct().count();

I wonder if it makes sense to unify them.

WDYT?

@jlink
Copy link
Copy Markdown
Collaborator

jlink commented Jun 17, 2023

@DirkToewe As soon as you deem it ready, just rebase it, so that I can merge it.

@DirkToewe
Copy link
Copy Markdown
Contributor Author

I did a soft reset and re-committed the changes. All merge conflicts should be gone now.

The PR now also includes the change to FeatureExtractor::areUnique suggested by @vlsi. It felt like it fits the theme of the PR. I hope that's okay.

@jlink jlink merged commit b87a3ed into jqwik-team:main Jun 18, 2023
@jlink
Copy link
Copy Markdown
Collaborator

jlink commented Jun 18, 2023

Published in 1.7.4-SNAPSHOT

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants