Skip to content

Clusivity, coverage and performance #113

@curiousleo

Description

@curiousleo

Recent discussions about pseudo-random floating point numbers have revealed potential shortcomings of the current API in two aspects.

Clusivity describes whether a range is inclusive or exclusive in its bounds.

Coverage describes how many representable values in a range can be reached.

Integral types

For integral types, coverage is not a problem: all Uniform and UniformRange instances for integral types we currently have generate all representable values in the relevant ranges.

Clusivity is also simpler for integral types. It is straightforward to implement express exclusive ranges in terms of inclusive ranges and vice versa. Concretely, (a, b) == [a+1, b-1], for example.

Non-integral types

For non-integral types, full coverage for arbitrary ranges is difficult. The only project I know of that achieves this is rademacher-fpl.

We can improve coverage by generating a floating point number in the unit interval with full coverage and then translating it into the target range. This is the approach taken in #102, more details in #105.

The performance overhead to improve coverage in this way is small but non-zero, see #102.

Clusivity is also more complicated for non-integral types.

We have [a, b) == (a, b] + x and (a, b] == [a, b) - x for some x. To go from an inclusive-inclusive range to any other range, I believe that rejection sampling is necessary.

In addition, I know of no simple method (without full coverage) that generates an inclusive-inclusive range.

To summarise which methods generate which clusivity:

  • Downey (full coverage in unit interval): inclusive-inclusive; other clusivities via rejection sampling
  • simple methods (sparse coverage): inclusive-exclusive / exclusive-inclusive

As a result, clusivity and coverage appear to be linked for non-integral types.

Proposals and questions

Q: Do users benefit from being able to control coverage? Reduced coverage in itself is never useful, but some users may want the performance boost. See #102 for benchmark results.

#104 as it currently stands exposes clusivity in the API.

#104 (comment) (snippet no. 2) suggests a more general way to parameterise ranges. AFAICT this could be used to expose different coverages too, as well as more general "intervals" like open / closed complex disk.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions