Skip to content

uniqueItems check only validates the first two entries for types with a __len__ implementation  #866

@omaskery

Description

@omaskery

After updating to a newer version of jsonschema, we now find that it does not notice violations of the uniqueItems constraint on lists.

By stepping through the code, I think I've found the issue in this code (originally introduced in this commit in this PR, but now found here):

        sort = sorted(unbool(i) for i in container)
        sliced = itertools.islice(sort, 1, None)

        for i, j in zip(sort, sliced):
            return not _sequence_equal(i, j)

Notice that the for loop will always return on the first iteration, without examining subsequent values.

This code looks like it's supposed to be seeing if any two successive elements is the same, but with the return statement it would only detect unique constraint violations between the first two elements.

I think this bug has escaped notice so far for two reasons:

  • I believe that all of the unit tests for this code only test unique constraint violations in the first two elements
  • This issue only presents if the elements of the container have a __len__ implementation. If a TypeError or NotImplementedError is raised then the "slow, brute force" path runs - and it does not have the same bug.

I wonder if this code is instead supposed to be:

         sort = sorted(unbool(i) for i in container)
         sliced = itertools.islice(sort, 1, None)
 
         for i, j in zip(sort, sliced):
-            return not _sequence_equal(i, j)
+            if _sequence_equal(i, j):
+                return False

For what it's worth, here's some noddy test code I put together to explore the issue:

def test(container):
    print(f"uniq({container}) = {uniq(container)}")


TEST_CASES = [
    [],
    [1],
    [1, 2],
    [1, 2, 3, 4, 5],
    [1, 2, 3, 3, 5],
    [1, 1, 3, 4, 5],

    ["a"],
    ["a", "b"],
    ["a", "b", "c", "d", "e"],
    ["a", "b", "c", "c", "e"],
    ["a", "a", "c", "d", "e"],
]


for test_case in TEST_CASES:
    test(test_case)

And here is the output I got:

uniq([]) = True
uniq([1]) = True
uniq([1, 2]) = True
uniq([1, 2, 3, 4, 5]) = True
uniq([1, 2, 3, 3, 5]) = False
uniq([1, 1, 3, 4, 5]) = False
uniq(['a']) = True
uniq(['a', 'b']) = True
uniq(['a', 'b', 'c', 'd', 'e']) = True
uniq(['a', 'b', 'c', 'c', 'e']) = True
uniq(['a', 'a', 'c', 'd', 'e']) = False

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugSomething doesn't work the way it should.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions