Conversation
| """Return a *k*-length list of elements chosen (without replacement) | ||
| from the *iterable*. Like :func:`random.sample`, but works on iterables | ||
| of unknown length. | ||
| from the *iterable*. Similar to :func:`random.sample`, but works on |
There was a problem hiding this comment.
Consider documenting that if the input iterable is sized k or smaller, more_itertools.sample() will return all of the input in shuffled order, possibly giving a result length smaller than k.
Alternatively, consider adding a strict option to raise an exception as random.sample() does. That would let the user decide whether it is an error to say, "choose 100 distinct notes on an 88-key piano."
That would be similar to what we did for batched() where only the user could know whether a short batch would be acceptable.
|
Overall, this looks like an excellent set of edits. Thank you. |
|
Please also fix the bad variable name starting in Both private functions ONLY work with iterators, not generic iterables because the |
|
I added in the I didn't change the input parameter names on the main function (don't want to break the theoretical person who's using it as a keyword), but did on the private functions. |
|
Unless I'm missing something, the variable names inside the main function can be changed (leaving the user visible API unchanged). Something like this: Would it make sense to add stubs for the private functions to let a type checker verify that iterators are passed in? |
|
The explanation of the weights parameter seems way off, "The relative weight of each item determines the probability that it appears late in the permutation." The wording in the Efraimidis paper is better, "In weighted random sampling (WRS) the items are weighted and the probability of each item to be selected is determined by its relative weight." However even that only explains the probability of being added to the reservoir. From a users point of view, it misses an essential point: when an item with high weight is selected, the probabilities of the remaining items increase. Ideally, there should be some wording that allows a user to predict this outcome: |
|
I'm open to suggestions on how much detail is appropriate. I could link to the papers, perhaps? |
|
The papers are a difficult read. They only explain the algorithm and don't elaborate on the goal. Some documentation edit does need to be made because the current sentence is just wrong. There is no notion of "appearing late in the permutation". Maybe this:
|
[](https://renovatebot.com) This PR contains the following updates: | Package | Change | Age | Adoption | Passing | Confidence | |---|---|---|---|---|---| | [cachetools](https://togithub.com/tkem/cachetools) | `==5.4.0` -> `==5.5.0` | [](https://docs.renovatebot.com/merge-confidence/) | [](https://docs.renovatebot.com/merge-confidence/) | [](https://docs.renovatebot.com/merge-confidence/) | [](https://docs.renovatebot.com/merge-confidence/) | | [cffi](http://cffi.readthedocs.org) ([source](https://togithub.com/python-cffi/cffi), [changelog](https://cffi.readthedocs.io/en/latest/whatsnew.html)) | `==1.16.0` -> `==1.17.0` | [](https://docs.renovatebot.com/merge-confidence/) | [](https://docs.renovatebot.com/merge-confidence/) | [](https://docs.renovatebot.com/merge-confidence/) | [](https://docs.renovatebot.com/merge-confidence/) | | [google-auth](https://togithub.com/googleapis/google-auth-library-python) | `==2.32.0` -> `==2.34.0` | [](https://docs.renovatebot.com/merge-confidence/) | [](https://docs.renovatebot.com/merge-confidence/) | [](https://docs.renovatebot.com/merge-confidence/) | [](https://docs.renovatebot.com/merge-confidence/) | | [idna](https://togithub.com/kjd/idna) ([changelog](https://togithub.com/kjd/idna/blob/master/HISTORY.rst)) | `==3.7` -> `==3.8` | [](https://docs.renovatebot.com/merge-confidence/) | [](https://docs.renovatebot.com/merge-confidence/) | [](https://docs.renovatebot.com/merge-confidence/) | [](https://docs.renovatebot.com/merge-confidence/) | | [importlib-metadata](https://togithub.com/python/importlib_metadata) | `==8.2.0` -> `==8.4.0` | [](https://docs.renovatebot.com/merge-confidence/) | [](https://docs.renovatebot.com/merge-confidence/) | [](https://docs.renovatebot.com/merge-confidence/) | [](https://docs.renovatebot.com/merge-confidence/) | | [jaraco-context](https://togithub.com/jaraco/jaraco.context) | `==5.3.0` -> `==6.0.1` | [](https://docs.renovatebot.com/merge-confidence/) | [](https://docs.renovatebot.com/merge-confidence/) | [](https://docs.renovatebot.com/merge-confidence/) | [](https://docs.renovatebot.com/merge-confidence/) | | [keyring](https://togithub.com/jaraco/keyring) | `==25.2.1` -> `==25.3.0` | [](https://docs.renovatebot.com/merge-confidence/) | [](https://docs.renovatebot.com/merge-confidence/) | [](https://docs.renovatebot.com/merge-confidence/) | [](https://docs.renovatebot.com/merge-confidence/) | | [more-itertools](https://togithub.com/more-itertools/more-itertools) | `==10.3.0` -> `==10.4.0` | [](https://docs.renovatebot.com/merge-confidence/) | [](https://docs.renovatebot.com/merge-confidence/) | [](https://docs.renovatebot.com/merge-confidence/) | [](https://docs.renovatebot.com/merge-confidence/) | | [zipp](https://togithub.com/jaraco/zipp) | `==3.19.2` -> `==3.20.0` | [](https://docs.renovatebot.com/merge-confidence/) | [](https://docs.renovatebot.com/merge-confidence/) | [](https://docs.renovatebot.com/merge-confidence/) | [](https://docs.renovatebot.com/merge-confidence/) | --- ### Release Notes <details> <summary>tkem/cachetools (cachetools)</summary> ### [`v5.5.0`](https://togithub.com/tkem/cachetools/blob/HEAD/CHANGELOG.rst#v550-2024-08-18) [Compare Source](https://togithub.com/tkem/cachetools/compare/v5.4.0...v5.5.0) \=================== - `TTLCache.expire()` returns iterable of expired `(key, value)` pairs. - `TLRUCache.expire()` returns iterable of expired `(key, value)` pairs. - Documentation improvements. - Update CI environment. </details> <details> <summary>python-cffi/cffi (cffi)</summary> ### [`v1.17.0`](https://togithub.com/python-cffi/cffi/releases/tag/v1.17.0) [Compare Source](https://togithub.com/python-cffi/cffi/compare/v1.16.0...v1.17.0) - Add support for Python 3.13. - Free-threaded CPython builds (i.e. `python3.13t` and the `3.13t` ABI) are not currently supported. - In API mode, when you get a function from a C library by writing `fn = lib.myfunc`, you get an object of a special type for performance reasons, instead of a `<cdata 'C-function-type'>`. Before version 1.17 you could only call such objects. You could write `ffi.addressof(lib, "myfunc")` in order to get a real `<cdata>` object, based on the idea that in these cases in C you'd usually write `&myfunc` instead of `myfunc`. In version 1.17, the special object `lib.myfunc` can now be passed in many places where CFFI expects a regular `<cdata>` object. For example, you can now pass it as a callback to a C function call, or write it inside a C structure field of the correct pointer-to-function type, or use `ffi.cast()` or `ffi.typeof()` on it. **Full Changelog**: python-cffi/cffi@v1.16.0...v1.17.0 </details> <details> <summary>googleapis/google-auth-library-python (google-auth)</summary> ### [`v2.34.0`](https://togithub.com/googleapis/google-auth-library-python/blob/HEAD/CHANGELOG.md#2340-2024-08-13) [Compare Source](https://togithub.com/googleapis/google-auth-library-python/compare/v2.33.0...v2.34.0) ##### Features - **auth:** Update get_client_ssl_credentials to support X.509 workload certs ([#​1558](https://togithub.com/googleapis/google-auth-library-python/issues/1558)) ([18c2ec1](https://togithub.com/googleapis/google-auth-library-python/commit/18c2ec1b571d506c0dbcffc483aa5e7b95e1b246)) ##### Bug Fixes - Retry token request on retryable status code ([#​1563](https://togithub.com/googleapis/google-auth-library-python/issues/1563)) ([f858a15](https://togithub.com/googleapis/google-auth-library-python/commit/f858a151cb7e29d34578e03c9e3fd4110c6bc258)) ### [`v2.33.0`](https://togithub.com/googleapis/google-auth-library-python/blob/HEAD/CHANGELOG.md#2330-2024-08-06) [Compare Source](https://togithub.com/googleapis/google-auth-library-python/compare/v2.32.0...v2.33.0) ##### Features - Implement async `StaticCredentials` using access tokens ([#​1559](https://togithub.com/googleapis/google-auth-library-python/issues/1559)) ([dc17dfc](https://togithub.com/googleapis/google-auth-library-python/commit/dc17dfc3fb65c87f2912300f0d11f79781240e78)) - Implement base classes for credentials and request sessions ([#​1551](https://togithub.com/googleapis/google-auth-library-python/issues/1551)) ([036dac4](https://togithub.com/googleapis/google-auth-library-python/commit/036dac43018b8cc26b5608e1bb21d6e3ee62a282)) ##### Bug Fixes - **metadata:** Enhance retry logic for metadata server access in \_metadata.py ([#​1545](https://togithub.com/googleapis/google-auth-library-python/issues/1545)) ([61c2432](https://togithub.com/googleapis/google-auth-library-python/commit/61c24321e52f6e017eecee211e11260d621c909b)) ##### Documentation - Update argument for Credentials initialization ([#​1557](https://togithub.com/googleapis/google-auth-library-python/issues/1557)) ([40b9ed9](https://togithub.com/googleapis/google-auth-library-python/commit/40b9ed91a6b01948561cfc71edaaabdd7f362f17)) </details> <details> <summary>kjd/idna (idna)</summary> ### [`v3.8`](https://togithub.com/kjd/idna/releases/tag/v3.8) [Compare Source](https://togithub.com/kjd/idna/compare/v3.7...v3.8) #### What's Changed - Fix regression where IDNAError exception was not being produced for certain inputs. - Add support for Python 3.13, drop support for Python 3.5 as it is no longer testable. - Documentation improvements - Updates to package testing using Github actions Thanks to Hugo van Kemenade for contributions to this release. **Full Changelog**: kjd/idna@v3.7...v3.8 </details> <details> <summary>python/importlib_metadata (importlib-metadata)</summary> ### [`v8.4.0`](https://togithub.com/python/importlib_metadata/compare/v8.3.0...v8.4.0) [Compare Source](https://togithub.com/python/importlib_metadata/compare/v8.3.0...v8.4.0) ### [`v8.3.0`](https://togithub.com/python/importlib_metadata/compare/v8.2.0...v8.3.0) [Compare Source](https://togithub.com/python/importlib_metadata/compare/v8.2.0...v8.3.0) </details> <details> <summary>jaraco/jaraco.context (jaraco-context)</summary> ### [`v6.0.1`](https://togithub.com/jaraco/jaraco.context/compare/v6.0.0...v6.0.1) [Compare Source](https://togithub.com/jaraco/jaraco.context/compare/v6.0.0...v6.0.1) ### [`v6.0.0`](https://togithub.com/jaraco/jaraco.context/compare/v5.3.0...v6.0.0) [Compare Source](https://togithub.com/jaraco/jaraco.context/compare/v5.3.0...v6.0.0) </details> <details> <summary>jaraco/keyring (keyring)</summary> ### [`v25.3.0`](https://togithub.com/jaraco/keyring/compare/v25.2.1...v25.3.0) [Compare Source](https://togithub.com/jaraco/keyring/compare/v25.2.1...v25.3.0) </details> <details> <summary>more-itertools/more-itertools (more-itertools)</summary> ### [`v10.4.0`](https://togithub.com/more-itertools/more-itertools/releases/tag/v10.4.0): Version 10.4.0 [Compare Source](https://togithub.com/more-itertools/more-itertools/compare/v10.3.0...v10.4.0) ##### What's Changed - Issue 854: sample improvements by [@​bbayles](https://togithub.com/bbayles) in [https://github.com/more-itertools/more-itertools/pull/855](https://togithub.com/more-itertools/more-itertools/pull/855) - Issue 858: Use chain and starmap in run_length.decode by [@​bbayles](https://togithub.com/bbayles) in [https://github.com/more-itertools/more-itertools/pull/861](https://togithub.com/more-itertools/more-itertools/pull/861) - Issue 859: Update totient recipe by [@​bbayles](https://togithub.com/bbayles) in [https://github.com/more-itertools/more-itertools/pull/860](https://togithub.com/more-itertools/more-itertools/pull/860) - Distinct permutations of incomparable items by [@​JamesParrott](https://togithub.com/JamesParrott) in [https://github.com/more-itertools/more-itertools/pull/834](https://togithub.com/more-itertools/more-itertools/pull/834) - Clarify seekable.relative_seek behavior by [@​bbayles](https://togithub.com/bbayles) in [https://github.com/more-itertools/more-itertools/pull/863](https://togithub.com/more-itertools/more-itertools/pull/863) - Issue 864: Improve \_sample_unweighted by [@​bbayles](https://togithub.com/bbayles) in [https://github.com/more-itertools/more-itertools/pull/865](https://togithub.com/more-itertools/more-itertools/pull/865) - Use log1p for \_sample_unweighted by [@​bbayles](https://togithub.com/bbayles) in [https://github.com/more-itertools/more-itertools/pull/868](https://togithub.com/more-itertools/more-itertools/pull/868) - Issue 862: change relative_seek() behaviour by [@​dkrikun](https://togithub.com/dkrikun) in [https://github.com/more-itertools/more-itertools/pull/866](https://togithub.com/more-itertools/more-itertools/pull/866) - Issue 876: is_sorted clarifications by [@​bbayles](https://togithub.com/bbayles) in [https://github.com/more-itertools/more-itertools/pull/877](https://togithub.com/more-itertools/more-itertools/pull/877) - Issue 870: counts parameter for sample by [@​bbayles](https://togithub.com/bbayles) in [https://github.com/more-itertools/more-itertools/pull/875](https://togithub.com/more-itertools/more-itertools/pull/875) - Issue 869: Add a steps argument to circular_shifts by [@​bbayles](https://togithub.com/bbayles) in [https://github.com/more-itertools/more-itertools/pull/874](https://togithub.com/more-itertools/more-itertools/pull/874) - Issue 871: Add a fast path for sliding_window by [@​bbayles](https://togithub.com/bbayles) in [https://github.com/more-itertools/more-itertools/pull/873](https://togithub.com/more-itertools/more-itertools/pull/873) - type annotation of `windowed_complete` corrected by [@​m472](https://togithub.com/m472) in [https://github.com/more-itertools/more-itertools/pull/881](https://togithub.com/more-itertools/more-itertools/pull/881) - \[Docs] Fix strictly_n missing the n parameter by [@​fakuivan](https://togithub.com/fakuivan) in [https://github.com/more-itertools/more-itertools/pull/886](https://togithub.com/more-itertools/more-itertools/pull/886) - Standardize type hints for isinstance's second argument by [@​jbosboom](https://togithub.com/jbosboom) in [https://github.com/more-itertools/more-itertools/pull/887](https://togithub.com/more-itertools/more-itertools/pull/887) - Issue 883: change type hint by [@​akisatoon1](https://togithub.com/akisatoon1) in [https://github.com/more-itertools/more-itertools/pull/884](https://togithub.com/more-itertools/more-itertools/pull/884) - Add type overloads for `zip_broadcast` by [@​Pandede](https://togithub.com/Pandede) in [https://github.com/more-itertools/more-itertools/pull/888](https://togithub.com/more-itertools/more-itertools/pull/888) - Issue 889: Optimize triplewise by [@​bbayles](https://togithub.com/bbayles) in [https://github.com/more-itertools/more-itertools/pull/891](https://togithub.com/more-itertools/more-itertools/pull/891) - Add option `strict` to `sort_together` by [@​Pandede](https://togithub.com/Pandede) in [https://github.com/more-itertools/more-itertools/pull/892](https://togithub.com/more-itertools/more-itertools/pull/892) - Updates for version 10.4.0 by [@​bbayles](https://togithub.com/bbayles) in [https://github.com/more-itertools/more-itertools/pull/893](https://togithub.com/more-itertools/more-itertools/pull/893) ##### New Contributors - [@​JamesParrott](https://togithub.com/JamesParrott) made their first contribution in [https://github.com/more-itertools/more-itertools/pull/834](https://togithub.com/more-itertools/more-itertools/pull/834) - [@​dkrikun](https://togithub.com/dkrikun) made their first contribution in [https://github.com/more-itertools/more-itertools/pull/866](https://togithub.com/more-itertools/more-itertools/pull/866) - [@​m472](https://togithub.com/m472) made their first contribution in [https://github.com/more-itertools/more-itertools/pull/881](https://togithub.com/more-itertools/more-itertools/pull/881) - [@​fakuivan](https://togithub.com/fakuivan) made their first contribution in [https://github.com/more-itertools/more-itertools/pull/886](https://togithub.com/more-itertools/more-itertools/pull/886) - [@​jbosboom](https://togithub.com/jbosboom) made their first contribution in [https://github.com/more-itertools/more-itertools/pull/887](https://togithub.com/more-itertools/more-itertools/pull/887) - [@​akisatoon1](https://togithub.com/akisatoon1) made their first contribution in [https://github.com/more-itertools/more-itertools/pull/884](https://togithub.com/more-itertools/more-itertools/pull/884) - [@​Pandede](https://togithub.com/Pandede) made their first contribution in [https://github.com/more-itertools/more-itertools/pull/888](https://togithub.com/more-itertools/more-itertools/pull/888) **Full Changelog**: more-itertools/more-itertools@v10.3.0...v10.4.0 </details> <details> <summary>jaraco/zipp (zipp)</summary> ### [`v3.20.0`](https://togithub.com/jaraco/zipp/compare/v3.19.3...v3.20.0) [Compare Source](https://togithub.com/jaraco/zipp/compare/v3.19.3...v3.20.0) ### [`v3.19.3`](https://togithub.com/jaraco/zipp/compare/v3.19.2...v3.19.3) [Compare Source](https://togithub.com/jaraco/zipp/compare/v3.19.2...v3.19.3) </details> --- ### Configuration 📅 **Schedule**: Branch creation - "before 4am on Monday" (UTC), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 👻 **Immortal**: This PR will be recreated if closed unmerged. Get [config help](https://togithub.com/renovatebot/renovate/discussions) if that's undesired. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR was generated by [Mend Renovate](https://www.mend.io/free-developer-tools/renovate/). View the [repository job log](https://developer.mend.io/github/googleapis/google-cloud-python). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOC4yNi4xIiwidXBkYXRlZEluVmVyIjoiMzguMjYuMSIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOltdfQ==--> Co-authored-by: Anthonios Partheniou <partheniou@google.com>

This PR makes some changes to sample:
shuffleis called before returning the outputrandom.sampleis added to the docstringValueErroris raised for negativekvaluesCloses #854