Skip to content

running_median() with windowed data#1040

Merged
bbayles merged 13 commits intomore-itertools:masterfrom
rhettinger:windowed_median
Aug 3, 2025
Merged

running_median() with windowed data#1040
bbayles merged 13 commits intomore-itertools:masterfrom
rhettinger:windowed_median

Conversation

@rhettinger
Copy link
Copy Markdown
Contributor

@rhettinger rhettinger commented Aug 1, 2025

Open questions:

  • Possibly rename the iterable parameter to data, consistent with statistics.median? Or keep as-is to match more_itertools conventions and to emphasize the lazy evaluation which is the principal use case for running_median()?
  • Happy with the O(n) but mostly fast steps to maintain a sorted window? Or add more complex O(log n) code (IndexableSkiplist, blist, SortedContainers, etc)? Based on Grant Jenks' notes, the current list insort/bisect/del technique can be expected to win for window sizes up to several thousand.

Solved questions:

  • Named the size parameter maxlen because it caps the size of the window and also allows smaller sizes, like the maxlen parameter for deque.
  • Let running_median start yielding values before the window is full. This is more convenient to use. Also, it invariant that the unwindowed case gives the same result as having a window larger than the input data.

@rhettinger rhettinger changed the title Draft of running_median() with windowed data running_median() with windowed data Aug 2, 2025
@rhettinger
Copy link
Copy Markdown
Contributor Author

rhettinger commented Aug 2, 2025

For the record, the issue with Decimal context rounding is that double negation does not always round-trip:

>>> from decimal import Decimal
>>> from math import pi
>>> Decimal(pi)
Decimal('3.141592653589793115997963468544185161590576171875')
>>> - - Decimal(pi)
Decimal('3.141592653589793115997963469')
>>> Decimal(pi) == - - Decimal(pi)
False

So a value that goes into the negated lo heap may not exactly match that value that comes out. This only happens when the inputs have more precision than the current context precision. That arises when converting binary-floats to decimal-floats.

@bbayles
Copy link
Copy Markdown
Collaborator

bbayles commented Aug 3, 2025

Possibly rename the iterable parameter to data, consistent with statistics.median? Or keep as-is to match more_itertools conventions and to emphasize the lazy evaluation which is the principal use case for running_median()?

+0 for as-is on this one.

Happy with the O(n) but mostly fast steps to maintain a sorted window? Or add more complex O(log n) code (IndexableSkiplist, blist, SortedContainers, etc)? Based on Grant Jenks' notes, the current list insort/bisect/del technique can be expected to win for window sizes up to several thousand.

+1 on the current implementation - it's reasonably clear what's going on, and the other options are probably overkill for this general purpose library.

@rhettinger
Copy link
Copy Markdown
Contributor Author

Okay, I think we're good to go.

@bbayles
Copy link
Copy Markdown
Collaborator

bbayles commented Aug 3, 2025

This is great, thanks - can't wait to find a place to use it.

@bbayles bbayles merged commit aea2ffb into more-itertools:master Aug 3, 2025
6 checks passed
@rhettinger rhettinger deleted the windowed_median branch August 15, 2025 14:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants