Improved performance of @UniqueElements(by=NOT_SET)` by DirkToewe · Pull Request #491 · jqwik-team/jqwik

DirkToewe · 2023-06-17T14:41:14Z

Overview

This PR aims at significantly improving the performance of @UniqueElements(by=NOT_SET), mainly by replacing calls to uniqueElements(x->x) by uniqueElements() wherever possible.

Details

If we look at the following two property tests:

@PropertyDefaults( tries = 100_000 )
public class UniqueElementsTest
{
  @Property void testA( @ForAll @UniqueElements int[] a ) {
    Arrays.sort(a);
    for( int i=1; i < a.length; i++ )
      assert a[i-1] <= a[i];
  }

  @Provide Arbitrary<int[]> arrays() {
    return Arbitraries.integers().array(int[].class).uniqueElements();
  }

  @Property void testB( @ForAll("arrays") int[] a ) {
    Arrays.sort(a);
    for( int i=1; i < a.length; i++ )
      assert a[i-1] <= a[i];
  }
}

On my machine, it takes 20 seconds for testA to finish while testB only takes 2 seconds. The problem becomes much more emphasized with larger array sizes.

I hereby agree to the terms of the jqwik Contributor Agreement.

vlsi · 2023-06-17T14:48:06Z

+			// instead of `items instanceof HashSet`, because subclasses
+			// of HashSet may break Set conventions.
+			return items -> items.getClass().equals(HashSet.class)
+								|| items.size() == items.stream().distinct().count();


This is a suboptimal way to check uniqueness. You can stop as soon as you detect the first non-unique element.

At the same time, elements in hashset are always unique, aren't they?

Oh You're right, it would be much faster to stop early. I will fix that.

Yes that's why added the HashSet clause. It might speed things up sometimes.

Changed the logic. Now it should stop early.

Nice. There's one more case doing stream.distinct.count:

jqwik/engine/src/main/java/net/jqwik/engine/properties/FeatureExtractor.java

Line 32 in 124d7af

long uniqueCount = elements.stream().map(this::applySafe).distinct().count();

I wonder if it makes sense to unify them.

WDYT?

jlink · 2023-06-17T17:48:56Z

@DirkToewe As soon as you deem it ready, just rebase it, so that I can merge it.

DirkToewe · 2023-06-17T23:42:07Z

I did a soft reset and re-committed the changes. All merge conflicts should be gone now.

The PR now also includes the change to FeatureExtractor::areUnique suggested by @vlsi. It felt like it fits the theme of the PR. I hope that's okay.

jlink · 2023-06-18T07:24:00Z

Published in 1.7.4-SNAPSHOT

vlsi reviewed Jun 17, 2023

View reviewed changes

Improve performance of @UniqueElements(by=NOT_SET)

02876ed

DirkToewe force-pushed the main branch from 21417b3 to 02876ed Compare June 17, 2023 23:14

jlink merged commit b87a3ed into jqwik-team:main Jun 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improved performance of @UniqueElements(by=NOT_SET)`#491

Improved performance of @UniqueElements(by=NOT_SET)`#491
jlink merged 1 commit intojqwik-team:mainfrom
DirkToewe:main

DirkToewe commented Jun 17, 2023

Uh oh!

vlsi Jun 17, 2023

Uh oh!

DirkToewe Jun 17, 2023

Uh oh!

DirkToewe Jun 17, 2023

Uh oh!

vlsi Jun 17, 2023

Uh oh!

jlink commented Jun 17, 2023

Uh oh!

DirkToewe commented Jun 17, 2023

Uh oh!

jlink commented Jun 18, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

DirkToewe commented Jun 17, 2023

Overview

Details

Uh oh!

vlsi Jun 17, 2023

Choose a reason for hiding this comment

Uh oh!

DirkToewe Jun 17, 2023

Choose a reason for hiding this comment

Uh oh!

DirkToewe Jun 17, 2023

Choose a reason for hiding this comment

Uh oh!

vlsi Jun 17, 2023

Choose a reason for hiding this comment

Uh oh!

jlink commented Jun 17, 2023

Uh oh!

DirkToewe commented Jun 17, 2023

Uh oh!

jlink commented Jun 18, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants