-
-
Notifications
You must be signed in to change notification settings - Fork 121
Description
getall benchmarks like test_cimultidict_getall_istr_hit[py], test_multidict_getall_str_hit[ci-py], test_multidict_getall_str_miss[cs-py], and others has very unstable execution time when pure python implementation is tested.
It generates misleading noise for all multidict PRs, even not related to changes in python implementation.
I have no clear strategy how to mitigate this.
From my understanding, the source of unstable execution time is Python hashseed generator.
Because of it, hashes of Python objects used in tests are different every CI run. Sometimes it could lead to hash collisions when methods like _HtKeys.iter_hash() are called, sometimes hashes don't collide.
In case of collision, hash iterator needs more loops to do the job.
C implementation shares the same algorithm, but it is executed much faster than py version.
As the result, time of C benchmarks are stable but some python tests varies a lot.
I don't very care about python execution time, in the real world C version should be used anyway.
But we should do something with our benchmarks.
The simplest solution could be setting PYTHONHASHSEED envvar to any constant value. Another option is rewriting benchmark to run them not against the constant key but a set of keys. Some of them might collide but the amortized time should be more or less the same.
Perhaps PYTHONHASHSEED is the easiest think that we could apply.
I anybody has an idea -- please share it here.