Unstable benchmarks for getall() and pure python implementation

`getall` benchmarks like `test_cimultidict_getall_istr_hit[py]`, `test_multidict_getall_str_hit[ci-py]`, `test_multidict_getall_str_miss[cs-py]`, and others has very unstable execution time when pure python implementation is tested.

It generates misleading noise for all multidict PRs, even not related to changes in python implementation.

I have no clear strategy how to mitigate this. 

From my understanding, the source of unstable execution time is Python hashseed generator.
Because of it, hashes of Python objects used in tests are different every CI run. Sometimes it could lead to hash collisions when methods like `_HtKeys.iter_hash()` are called, sometimes hashes don't collide.

In case of collision, hash iterator needs more loops to do the job.
C implementation shares the same algorithm, but it is executed much faster than py version. 
As the result, time of C benchmarks are stable but some python tests varies a lot.

I don't very care about python execution time, in the real world C version should be used anyway.
But we should do something with our benchmarks.

The simplest solution could be setting PYTHONHASHSEED envvar to any constant value. Another option is rewriting benchmark to run them not against the constant key but a set of keys. Some of them might collide but the amortized time should be more or less the same.

Perhaps PYTHONHASHSEED is the easiest think that we could apply.

I anybody has an idea -- please share it here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unstable benchmarks for getall() and pure python implementation #1201

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Unstable benchmarks for getall() and pure python implementation #1201

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions