ENH: Extend peak finding capabilities in scipy.signal#8264
ENH: Extend peak finding capabilities in scipy.signal#8264larsoner merged 9 commits intoscipy:masterfrom
Conversation
scipy/signal/_peak_finding.py
Outdated
| if wmax is not None: | ||
| keep &= (widths <= wmax) | ||
| peaks = peaks[keep] | ||
| if full_output: |
There was a problem hiding this comment.
I almost wonder if the dictionary data structure should just be used directly for storing values, if the extra return data remains part of the API.
There was a problem hiding this comment.
Do you suggest using, for example, infodict['peaks'] instead of peaks? I think that would be possible. In that case infodict would always be used internally but only returned if full_output is True. However I think the trade off would be a slight decrease in readability because the line length would grow significantly...
Note: I have copied this style from scipy.integrate.odeint.
There was a problem hiding this comment.
Yeah, something like that. I just wanted to make sure the point was brought up & trade-offs at least briefly considered. It looks like there was already quite some discussion of API matters in the original issue you linked.
scipy/signal/_peak_finding.py
Outdated
| if height is not None: | ||
| hmin, hmax = _parse_filter_args(height, vector, peaks) | ||
| heights = vector[peaks] | ||
| keep = np.ones(peaks.size, dtype=bool) |
There was a problem hiding this comment.
Is the initialization of keep needed here? I'm just thinking that the first comparison below would automatically generate a bool or mask array anyway.
There was a problem hiding this comment.
Yes it is needed. Look at the possible valid inputs for the argument height:
height=hmin->hmaxisNone, only first comparison is evaluatedheight=(None, hmax)->hminisNone, only second comparison is evaluatedheight=(hmin, hmax)-> both comparisons are evaluatedheight=(None, None)-> no comparison is evaluated (has no effect but is valid)
Case 3 dictates that the second evaluation must use &= (keep must be initialized). Considering the 2nd case one can easily see that initialization is not guarantied to happen in the first comparison.
There was a problem hiding this comment.
Alright, looks like a lot of input permutation handling is indeed needed.
scipy/signal/_peak_finding.py
Outdated
| if hmin is not None: | ||
| keep &= (vector[peaks] >= hmin) | ||
| if hmax is not None: | ||
| keep &= (vector[peaks] <= hmax) |
There was a problem hiding this comment.
Since this logic is repeated, I wonder if there's a way to streamline things a bit? Not sure if it is worth abstracting to i.e., a function or something or if this can be condensed to do both checks at once (maybe not because of the None pre-check requirement).
- argument ret_bases makes the function return the position of peak bases as well. Useful for further analysis and validation of calculated prominences - extended notes to summarize algorithm for prominence calculation - if no peaks are supplied empty arrays are returned - remove upper boundary for wlen which is not necessary - fix typo wleft -> wright (L. 338) See scipy#8264.
- Maximal peak width is from base to base. This also allows to reduce the window in which for intersection points is searched, improving performance. - Exact intersection point is found with linear interpolation. - Basic algorithm is now described in docstring. - Use kwargs and "Other Parameters" for optional pre-computed arguments. - If no peaks are given the empty arrays are returned. - rel_height can be zero. See scipy#8264.
Introduces major changes in the structure of the respective function. This was partly necessary to comply with the changes to the underlying functions peak_prominences and peak_widths. The most important changes of its behavior are: - The height for width calculation is now given as a ratio making it compliant with peak_widths. This approach is more flexible than the former one. - Input and error handling was generally improved. - _filter_generic is now used where applicable which removes duplicated code. - The properties dict (former infodict) now returns more intermediate results. All results of peak_prominences and peak_widths are included. See scipy#8264.
Adds basic tests (full coverage) for the functions: - scipy.signal.peak_promineces - scipy.signal.peak_widths - scipy.signal._peak_finding._unpack_filter_args - scipy.signal.find_peaks Also add notice to THANKS.txt and use better function order in signals toctree. See scipy#8264.
| @@ -420,7 +1000,7 @@ def filt_func(line): | |||
| def find_peaks_cwt(vector, widths, wavelet=None, max_distances=None, | |||
There was a problem hiding this comment.
I find this function (find_peaks_cwt) particularly confusing. It mixes the signal pre-processing and the find peaks function.
Maybe it is worth including some kind of note mentioning that the (new) function find_peaks provide an easier, matlab-like, interface. This would probably make the life of someone looking for it easier.
In the long term, I think we should even considering deprecating this function.
There was a problem hiding this comment.
I agree with these issues to some extend. The note could be a good idea.
But I don't like the idea of removing the function find_peaks_cwt entirely. Wavelet transformation can be a really powerful tool for noisy signals and fills a niche that the new find_peaks can't fill the same way. Improving the API and / or documentation of find_peaks_cwt could allevate the the issues you mentioned above.
ilayn
left a comment
There was a problem hiding this comment.
In general, I would really recommend staying away from the ret_xxxx type output arguments. They are not shortening substantially anyways so you type almost everything hence return_xxxxx is better. But the more important issue is that the output argument number is changing according to bool switches. I would argue that this is causing more trouble than it actually solves.
I ran out of steam and will try to check the rest asap.
scipy/signal/_peak_finding.py
Outdated
| @@ -257,24 +257,39 @@ def peak_prominences(vector, peaks, wlen=None): | |||
| Indices of peaks in `vector`. | |||
| wlen : int, optional | |||
| A window lenght in samples that limits the search for the lowest | |||
scipy/signal/_peak_finding.py
Outdated
| contour line two a symmetric interval around the evaluated peak. If | ||
| not given the entire vector is used. Use this parameter to speed up | ||
| the calculation significantly for large vectors. | ||
| contour line two a symmetric interval around the evaluated peak. If not |
There was a problem hiding this comment.
sentence is not complete I guess, "for the lowest contour line two a symmetric interval around [...]" ?
There was a problem hiding this comment.
Ah, that's a typo. The sentence should be "[...] for the lowest contour line to a symmetric interval around the evaluated peak."
scipy/signal/_peak_finding.py
Outdated
| the calculation significantly for large vectors. | ||
| contour line two a symmetric interval around the evaluated peak. If not | ||
| given the entire vector is used. Use this parameter to speed up the | ||
| calculation significantly for large vectors. |
There was a problem hiding this comment.
When is it large? Why don't we always use it then ? and so on. I get what you are saying but it's not helping a standard user.
There was a problem hiding this comment.
This is vague on purpose because I can't exactly pinpoint an array size after which there is a significant speedup. Maybe I could try to reword or expand on this in the notes? Something like:
"Searching for the peak's bases can be slow for large vector because the full vector needs to be evaluated for each peak. This evaluation area can be limited with the parameter wlen which restricts the algorithm to a window around the current peak and can shorten the calculation time significantly if wlen is significantly smaller than vector. However this may stop the algorithm from finding the true global contour line if the peak's bases are outside this window. Instead a higher contour line is found within the restricted window leading to a smaller calculated prominence. In practice this is only relevant for the largest set of peaks in vector. This behavior may even be used intentionally to calculate "local" prominences.
There was a problem hiding this comment.
+1 for saying see Notes somewhere here and adding this note below, I agree it depends on a lot of factors so we can't be specific
There was a problem hiding this comment.
yes see notes would be indeed better
scipy/signal/_peak_finding.py
Outdated
| contour line two a symmetric interval around the evaluated peak. If not | ||
| given the entire vector is used. Use this parameter to speed up the | ||
| calculation significantly for large vectors. | ||
| ret_bases : bool |
There was a problem hiding this comment.
3 more letters and we have full readability return_bases. Otherwise compute_bases feels like more scipy-ish. Also note that you are already allocating these arrays so there is nothing gained from not returning them. Had you not even allocated or touched any of those arrays then maybe for very large problems it would have made a change.
Now it's just a variable number of outputs ala matlab which drove some of us crazy. Instead you might want to just always return these and remove the bool.
There was a problem hiding this comment.
Okay, I can see why this is a code smell. When thinking about it the only disadvantage / annoyance I can think of is in when a function returns many values and the user is only interested in one. In Python 2.7 the syntax
widths, *_ = peak_widths(...)
seems to be unsupported which would force the "ugly" peak_widths()[0] (maybe that's only me). We would definitely have to write the examples this way (at least as long as Scipy supports 2.7).
Anyway, I think I will remove the ret_... handles and just use a constant output size.
For peak_finding it doesn't matter anyway. Instead the current properties argument could force the function to calculate all possible properties even if no filter options are supplied. However this would convolute the if-logic a bit.
There was a problem hiding this comment.
FWIW git grep ")\[0\]" | wc -l on the SciPy codebase gives 399 results (and another codebase I work on with ~200k lines gives 639 results), so I think it's a sufficiently common pattern to expect users to do it :)
There was a problem hiding this comment.
Alright, that is a convincing argument. :D
scipy/signal/_peak_finding.py
Outdated
| ------- | ||
| prominences : ndarray | ||
| The calculated prominences for each peak in `peaks`. | ||
| left_bases, right_bases : ndarray, returned if ret_bases is true |
There was a problem hiding this comment.
after ndarray a newline is needed for rendering the type correctly.
There was a problem hiding this comment.
Which type? I got this from a the example in scipy.integrate.odeint (returned parameter infodict).
There was a problem hiding this comment.
I think the odeint example is less than ideal in this respect
scipy/signal/_peak_finding.py
Outdated
| @@ -293,13 +308,13 @@ def peak_prominences(vector, peaks, wlen=None): | |||
| Find all peaks and calculate prominences | |||
|
|
|||
| >>> from scipy import signal | |||
There was a problem hiding this comment.
from scipy.signal import find_peaks, peak_prominences to make it a bit more concise and to remove the signal repetition
scipy/signal/_peak_finding.py
Outdated
| Calculate the height of each peaks contour line | ||
| Calculate the height of each peak's contour line | ||
|
|
||
| >>> contour_heights = vector[peaks] - prominences |
There was a problem hiding this comment.
If not used you can combine this line to the group below.
There was a problem hiding this comment.
I don't quite understand. It is used in the call to vlines below. This is an extra line with comment to make the example as simple and explicit as possible.
There was a problem hiding this comment.
I meant if this is only going to be used in the plotting section below you don't need to make it as a separate code block. Now it is divided into three code blocks for no apparent reason. You can see the output here
You actually write what I meant in the scipy.signal.find_peaks docstring 😃
scipy/signal/_peak_finding.py
Outdated
|
|
||
| >>> from scipy import signal | ||
| >>> peaks = signal.argrelmax(vector)[0] | ||
| >>> peaks = signal.find_peaks(vector) |
There was a problem hiding this comment.
The array to be searched for peaks. Defined in line 306 (second row) in the folded section above.
scipy/signal/_peak_finding.py
Outdated
|
|
||
|
|
||
| def peak_prominences(vector, peaks, wlen=None): | ||
| def peak_prominences(vector, peaks, wlen=None, ret_bases=False): |
There was a problem hiding this comment.
can't you just use array instead of vector which is more of a physical concept that also needs directions so on?
There was a problem hiding this comment.
I used vector because I want to convey that the input is expected to be one-dimensional and find_peaks_cwt already uses this name. array is so broad we could go with x as well which seems to be the SciPy standard anyway. ts for timeseries would be possible as well but I'm not a huge fan. I think if vector is not wanted we should go with x.
Actually I would prefer the name signal but that one is even worse considering the module name... I don't suppose we could change the package name to scipy.dsp? 😉
There was a problem hiding this comment.
+1 for x as we use that most other places
scipy/signal/_peak_finding.py
Outdated
| raise ValueError('peak positions must be integers') | ||
| if wlen is not None and not 3 <= abs(wlen) <= vector.size: | ||
| raise ValueError('window lenght must in range [3, vector.size]') | ||
| if wlen is not None and not 3 <= wlen: |
There was a problem hiding this comment.
if wlen is not None and wlen < 4 and also don't you need to check the integerness of this?
There was a problem hiding this comment.
Not really. To calculate the window borders I do this:
wleft = peak - int(wlen / 2)
wright = peak + int(wlen / 2)
# Handle border cases
if wleft < 0:
wleft = 0
elif vector.size < wright:
wright = vector.size
# Use slice for prominence calculation
window = vector[wleft:wright]
# Correct peak position in vector
peak -= wleftSo wlen / 2 get's floored / converted to an integer anyway.
And 3 is a valid window range although not really useful.
There was a problem hiding this comment.
Ah OK, then my first comment is only about using negated ifs. wlen < 4 is easier to read.
Also in your second code snipped, if the first branch of if is done wleft<0 then it won't go into the second check right?
you can simply write
n = vector.size
window = vector[0 if wleft<0 else wleft:n if n<wright else wright]
one liner ifs are a bit dirty but if you want you can also take them out and write
n = vector.size
wleft = 0 if wleft<0 else wleft
wright = n if n<wright else wright
window = vector[wleft:wright]
and so on. I don't mean to be a python grammar-nazi but just suggesting a bit more cleaner flow of code. So that someone else can follow the logic before installing breakpoints and other printing stuff for debugging when needed.
There was a problem hiding this comment.
I'm happy to learn so feel free to be a Python grammar-nazi. :) I'm happy to accept your second suggestion. But why is wlen < 4 easier to read than wlen < 3?
There was a problem hiding this comment.
I think that bit is just a typo, his suggestion is essentially the same as mine regarding the wlen check -- don't negate with not, instead change the relational operator itself without changing the effective truth table
|
The "modern" alternative to "return_xxx" options is to always return a
solution object or named tuple that has all the items.
|
|
Finally, regarding the function name --- in the initial PR for
"find_peaks_cwt" the name was also "find_peaks" but was changed because
there are many ways to find peaks with no canonical way to do it AFAIK.
The same question should be asked here: is the very generic "find_peaks"
name really appropriate? Maybe better "find_peaks_XXX" with some "XXX"
that describes the method?
|
| from scipy.stats import scoreatpercentile | ||
|
|
||
|
|
||
| __all__ = ['argrelmin', 'argrelmax', 'argrelextrema', 'find_peaks_cwt'] |
There was a problem hiding this comment.
Why a permissions change on this file?
There was a problem hiding this comment.
What do you mean? I suppose you are not speaking of the new line wrapping?
There was a problem hiding this comment.
No I'm not (though I wouldn't change that either, other modules I see don't line break like this), look at the diff of this file on GitHub you'll see the permissions were changed 100644 → 100755
There was a problem hiding this comment.
Ah, I see what you mean. That was unintentional. I must have messed that up somehow. I'll revert that as well as the line wrapping.
I have several thoughts on this:
|
scipy/signal/_peak_finding.py
Outdated
| raise ValueError('vector and peaks must have exactly one dimension') | ||
| if peaks.max() >= vector.size: | ||
| raise ValueError('peak position exceeds the range of vector') | ||
| if not np.issubdtype(peaks.dtype, int): |
There was a problem hiding this comment.
Some of the builds / test fail with the the warning:
FutureWarning: Conversion of the second argument of issubdtype from
inttonp.signedintegeris deprecated. In future, it will be treated asnp.int64 == np.dtype(int).type.
I have a feeling that using np.issubdtype(peaks.dtype, np.integer) would be the fix for this but I can't reproduce this warning on my system. Any insights before I blindly push this?
|
Maybe |
scipy/signal/_peak_finding.py
Outdated
| raise ValueError('peak position exceeds the range of vector') | ||
| if not np.issubdtype(peaks.dtype, int): | ||
| raise ValueError('peak positions must be integers') | ||
| if wlen is not None and not 3 <= wlen: |
There was a problem hiding this comment.
not 3 <= wlen is not so readable, I'd change this to:
if wlen is not None:
wlen = operators.index(wlen) # ensure int-ness
if wlen < 3:
raise ValueError(...)
scipy/signal/_peak_finding.py
Outdated
| # the full vector | ||
| if wlen is not None and wlen < vector.size * 2: | ||
| # Calculate window borders around the evaluated peak | ||
| wleft = peak - int(wlen / 2) |
There was a problem hiding this comment.
... then here you can do peak + wlen // 2 instead of this int conversion
Notable changes: - Refactored `vector` to `x` - No "ret_xxx" style arguments are used to determine the number of returned parameters which is now always the same. - Improved docstrings, removed typos and made error messages more descriptive. - Hopefully addressed issue that caused builds to fail.
- wlen can be given as float and is implicitly converted to int. This has the advantage that the wlen can be more easily specified using the inverse of a frequency: e.g. wlen=(fs * 1.3) - _unpack_filter_args now raises a ValueError if interval borders are supplied as arrays whose size doesn't match x. - modified _filter_generic to be more consistent with other filter_ functions (interval is given after unpacking) - updated documentation - updated tests for new behavior and increased coverage
| References | ||
| ---------- | ||
| .. [1] Wikipedia Article for Topograhpic Prominence: | ||
| https://en.wikipedia.org/wiki/Topographic_prominence |
There was a problem hiding this comment.
Is there a more authoritative reference for this (journal article)?
There was a problem hiding this comment.
Also, typo in Topographic on the line above
There was a problem hiding this comment.
I don't have a satisfying response as I come up empty handed when searching for papers. The stuff that I do find is in line with the linked article and deals with topographic analysis of peaks and mountains (nothing DSP related). Of course I could use some of the links on that Wikipedia article itself but I think that wouldn't improve the situation.
Honestly I first used Matlab's explanation as a guideline and later this Wikipedia article. Maybe something can be found later on but for now I'd like to leave this reference as it explains the term "prominence" (although 2D) in more detail than appropriate for this docstring.
There was a problem hiding this comment.
Of course I could use some of the links on that Wikipedia article itself but I think that wouldn't improve the situation.
Yeah those don't look useful.
I had a look as well, it just doesn't seem to be a thing in DSP. So let's leave the Wikipedia reference
scipy/signal/_peak_finding.py
Outdated
| are accessible. See `scipy.signal.peak_widths` for a description of | ||
| their content. | ||
|
|
||
| Other Parameters |
There was a problem hiding this comment.
We don't use this section anymore, content should all go under Parameters
| This function takes a one-dimensional array and finds all local maxima by | ||
| simple comparision of neighbouring values. A subset of these peaks can be | ||
| selected with different filters described below. | ||
|
|
There was a problem hiding this comment.
It would be good to have one or two sentences here about what this function's limitations are - e.g. for noisy signals the approach doesn't work too well, for that it's better to use find_peaks_cwt.
There was a problem hiding this comment.
Something like:
"Because these filters depend on the calculation of properties for each peak this function can be slow for noisy data that has many small peaks relative to its length. In this case it is recommended to smooth the signal before peak finding and / or reduce the number of peaks with fast filter options first."
This feels more like it belongs in the Notes section.
There was a problem hiding this comment.
I wasn't thinking about speed, more about accuracy. Something like "This function directly searches for maxima, hence for noisy signals the determined peak locations can be off if the noise changes the position of a local maximum. In those cases, find_peaks_cwt or a peak fitting method may be more appropriate to use".
And yes, you're probably right that this should go in the notes section.
There was a problem hiding this comment.
Yeah, I should have thought of that. I definitely add this or something like it either here or in the Notes section. 👍
|
Here is another suggestion concerning the discussion around the correct name for It implies that a filtered subset of peaks in Note: The current name is a pythonized version of Matlab's |
|
@ilayn Do the introduced changes match your pending requests? |
|
Yes LGTM. Sorry I forgot to click the button |
|
The problem I see with using |
This Cython function is intended to be used for a peak finding function proposded in scipy#8264. Exposure to the user is not intended. It is meant as an alternative to scipy.signal.argrelmax` and compares as follows: - It only works on 1D arrays - It should be faster for all use cases - It can detect flat maxima (more than one sample wide). Unit tests are provided as well.
This Cython function is intended to be used for a peak finding function proposded in scipy#8264. Exposure to the user is not intended. It is meant as an alternative to scipy.signal.argrelmax` and compares as follows: - It only works on 1D arrays - It should be faster for all use cases - It can detect flat maxima (more than one sample wide). Unit tests are provided as well.
This function allows to calculate the prominince of a peak in a signal. The prominence is a measure for how much a peak stands out from the base line of a signal. See scipy#8264.
This function allows to calculate the width of a peak in a signal. The width is the horizontal distance between a peaks rising and falling slope at a chosen height. See scipy#8264
This function allows to find a subset of all local maxima in a vector by specifying requirements e.g. peak height, distance, etc. It wraps and uses the two public functions - peak_prominences - peak_widths and the private functions - _unpack_filter_args - _filter_generic - _filter_by_peak_threshold - _filter_by_peak_distance See scipy#8264
Adds basic tests (full coverage) for the functions: - scipy.signal.peak_promineces - scipy.signal.peak_widths - scipy.signal._peak_finding._unpack_filter_args - scipy.signal.find_peaks Also add notice to THANKS.txt and use better function order in signals toctree. See scipy#8264.
- The prefix `scipy.signal.` is not needed to create references to other functions within the module. The shorter version is more readable. - The example in `peak_widths` was broken for Python 2.7 because it relied on implicit float division with integers. - The example now uses the argument `rel_height` to be more explicit. See scipy#8264
- Replace kwargs in peak_widths with optional keyword arguments which improves introspection and resistance against typos. - Removed obsolete if-statement if number of peaks was 0. That way the content of `properties` gets populated properly. Extended unit tests to account for this. - Removed unnecessary indentation for prominence and with filtering. - Refactored "wheights" to "width_heights" which is more consistent with "peak_heights" and more descriptive. - More readable format for properties-key-list in docstring of find_peaks. - Explained usage of open interval `(None, None)` and made unit tests use this construct. - Replaced example in find_peaks's docstring with a more complex one that demonstrates simple and more advanced usage.
- Don't use the term "filter" because in the context of this module it already has a meaning. Instead use "condition", "select" or other terms. - "filter options" are now named "conditions" which has a clearer meaning. - Remove unnecessary argument `peaks` in for function `_select_by_property`. - Use "in samples" where horizontal distance is required. - Added some cases to unit tests and refactored a few objects.
|
Rebased because of merge conflicts after #8499 was introduced. |
|
Thanks, I'll merge tomorrow unless someone else looks in the meantime |
|
Also @lagru I noticed you commented on #3749 among others, if you think this PR will effectively fix them can you add Also this makes me think that we should maybe add a note to the |
|
Done.
The note sounds like a good idea and would prevent confusion and new issues on the topic. |
|
@lagru for now let's just add the notes and do any deprecation etc. in a subsequent PR. |
larsoner
left a comment
There was a problem hiding this comment.
Just a latin nitpick then we're good to go!
scipy/signal/_peak_finding.py
Outdated
| This function uses `argrelextrema` with np.less as comparator. | ||
| This function uses `argrelextrema` with np.less as comparator. Therefore it | ||
| requires a strict inequality on both sides of a value to consider it a | ||
| minima. This means flat minima (more than one sample wide) are not detected. |
scipy/signal/_peak_finding.py
Outdated
| This function uses `argrelextrema` with np.greater as comparator. | ||
| This function uses `argrelextrema` with np.greater as comparator. Therefore | ||
| it requires a strict inequality on both sides of a value to consider it a | ||
| maxima. This means flat maxima (more than one sample wide) are not detected. |
A reference to find_peaks which detects flat maxima was added as well where appropriate.
|
@larsoner Just a small reminder to merge this if you feel that it's ready. |
|
Thanks @lagru! Feel free to move on to the tutorial stuff (or other things) if you still have time. Bummer to hear about GSoC not being a good fit time-wise, but hope you continue to make PRs as time permits! |
|
And thank you guys for taking the time to deal with this PR and your exclusively constructive feedback. I'm positive I'll enjoy contributing here in the future. 🙂 |
* ENH: Add function to calculate peak prominence to scipy.signal This function allows to calculate the prominince of a peak in a signal. The prominence is a measure for how much a peak stands out from the base line of a signal. See scipy#8264. * ENH: Add function to calculate peak width to scipy.signal This function allows to calculate the width of a peak in a signal. The width is the horizontal distance between a peaks rising and falling slope at a chosen height. See scipy#8264 * ENH: Add new find_peaks function to scipy.signal This function allows to find a subset of all local maxima in a vector by specifying requirements e.g. peak height, distance, etc. It wraps and uses the two public functions - peak_prominences - peak_widths and the private functions - _unpack_filter_args - _filter_generic - _filter_by_peak_threshold - _filter_by_peak_distance See scipy#8264 * TST: Add tests for new functions in scipy.signal Adds basic tests (full coverage) for the functions: - scipy.signal.peak_promineces - scipy.signal.peak_widths - scipy.signal._peak_finding._unpack_filter_args - scipy.signal.find_peaks Also add notice to THANKS.txt and use better function order in signals toctree. See scipy#8264. * DOC: Shorter function references & fix example in peak_widths - The prefix `scipy.signal.` is not needed to create references to other functions within the module. The shorter version is more readable. - The example in `peak_widths` was broken for Python 2.7 because it relied on implicit float division with integers. - The example now uses the argument `rel_height` to be more explicit. See scipy#8264 * MAINT: Address suggested improvements in scipy#8264 - Replace kwargs in peak_widths with optional keyword arguments which improves introspection and resistance against typos. - Removed obsolete if-statement if number of peaks was 0. That way the content of `properties` gets populated properly. Extended unit tests to account for this. - Removed unnecessary indentation for prominence and with filtering. - Refactored "wheights" to "width_heights" which is more consistent with "peak_heights" and more descriptive. - More readable format for properties-key-list in docstring of find_peaks. - Explained usage of open interval `(None, None)` and made unit tests use this construct. - Replaced example in find_peaks's docstring with a more complex one that demonstrates simple and more advanced usage. * MAINT: Improve terminology in documentation and code - Don't use the term "filter" because in the context of this module it already has a meaning. Instead use "condition", "select" or other terms. - "filter options" are now named "conditions" which has a clearer meaning. - Remove unnecessary argument `peaks` in for function `_select_by_property`. - Use "in samples" where horizontal distance is required. - Added some cases to unit tests and refactored a few objects. * DOC: Add new functions for peak finding to release notes 1.1.0 * DOC: Describe behavior on flat extrema for argrelmax & argrelmin A reference to find_peaks which detects flat maxima was added as well where appropriate.
This follows #8211 and I feel the progress is advanced enough to make feedback possible. Thank you all in advance for taking the time to review this PR and any feedback given.
Motivation
The motivation for this pull request is that SciPy and other scientific Python libraries lack a function for simple peak finding and filtering like Matlab's findpeaks function.
In my opinion the use case (simple peak finding and filtering) is currently not very well covered by SciPy and its find_peaks_cwt function. I think these new functions would fit well into SciPy's signal module and its potential usefulness in many domains is underlined by Matlab's implementation which got extended and improved over the years.
Short description of changes
(I'll keep this up-to date with the current progress)
peak_prominences: This function allows to calculate the prominince of a peak in a signal. The prominence is a measure for how much a peak stands out from the base line of a signal.peak_widths: This function allows to calculate the width of a peak in a signal. The width is the horizontal distance between a peaks rising and falling slope at a chosen height.find_peaks: This function allows to find a subset of all local maxima in a vectorby specifying conditions e.g. peak height, distance, etc. It wraps and uses the two public functions described above and makes use of the following private functions (if this organization is useful is up for debate)
_unpack_condition_args: Parse condition arguments forfind_peaks._select_by_property: Evaluate where the generic property of peaks confirms to an interval._select_by_peak_threshold: Evaluate which peaks fulfill the threshold condition._select_by_peak_distance: Evaluate which peaks fulfill the distance condition.find_peaks_cwtto make distinction to new function clearer.To do & open questions
Still missing are unit tests for the new functions. Which will be added next.Ensure that_filter_by_peak_distancecomplies with the license of Marcos Duarte's detect_peaks function (MIT, should be compatible to SciPy's license). I'm not really sure how to include the copyright?How to handle if invalid peaks are passed topeak_prominencesandpeak_widths? Is it okay to describe this as undefined behavior in the documentation?argrelmax doesn't catch peaks that are wider than one sample. Decide how to deal with this. I have implemented an alternative here. This may be a better solution (download notebook here)Should this be included in the tutorial for-> I'd prefer to include a tutorial if wanted in another PR because this one is already quite large.scipy.signal? I feel like that would be the best place to show more complex examples which are out of place in the docstrings itself.find_peaksshould have a special postfix likefind_peaks_cwt. If so which one?Related discussions and links