Skip to content

ENH: Extend peak finding capabilities in scipy.signal#8264

Merged
larsoner merged 9 commits intoscipy:masterfrom
lagru:issue-8211
Mar 5, 2018
Merged

ENH: Extend peak finding capabilities in scipy.signal#8264
larsoner merged 9 commits intoscipy:masterfrom
lagru:issue-8211

Conversation

@lagru
Copy link
Copy Markdown
Contributor

@lagru lagru commented Jan 2, 2018

This follows #8211 and I feel the progress is advanced enough to make feedback possible. Thank you all in advance for taking the time to review this PR and any feedback given.

Motivation

The motivation for this pull request is that SciPy and other scientific Python libraries lack a function for simple peak finding and filtering like Matlab's findpeaks function.

In my opinion the use case (simple peak finding and filtering) is currently not very well covered by SciPy and its find_peaks_cwt function. I think these new functions would fit well into SciPy's signal module and its potential usefulness in many domains is underlined by Matlab's implementation which got extended and improved over the years.

Short description of changes

(I'll keep this up-to date with the current progress)

  • New function peak_prominences: This function allows to calculate the prominince of a peak in a signal. The prominence is a measure for how much a peak stands out from the base line of a signal.
  • New function peak_widths: This function allows to calculate the width of a peak in a signal. The width is the horizontal distance between a peaks rising and falling slope at a chosen height.
  • New function find_peaks: This function allows to find a subset of all local maxima in a vector
    by specifying conditions e.g. peak height, distance, etc. It wraps and uses the two public functions described above and makes use of the following private functions (if this organization is useful is up for debate)
    • _unpack_condition_args: Parse condition arguments for find_peaks.
    • _select_by_property: Evaluate where the generic property of peaks confirms to an interval.
    • _select_by_peak_threshold: Evaluate which peaks fulfill the threshold condition.
    • _select_by_peak_distance: Evaluate which peaks fulfill the distance condition.
  • Changes to the docstring of the old function find_peaks_cwt to make distinction to new function clearer.

To do & open questions

  • Still missing are unit tests for the new functions. Which will be added next.
  • Ensure that _filter_by_peak_distance complies with the license of Marcos Duarte's detect_peaks function (MIT, should be compatible to SciPy's license). I'm not really sure how to include the copyright?
  • How to handle if invalid peaks are passed to peak_prominences and peak_widths? Is it okay to describe this as undefined behavior in the documentation?
  • argrelmax doesn't catch peaks that are wider than one sample. Decide how to deal with this. I have implemented an alternative here. This may be a better solution (download notebook here)
  • Should this be included in the tutorial for scipy.signal? I feel like that would be the best place to show more complex examples which are out of place in the docstrings itself. -> I'd prefer to include a tutorial if wanted in another PR because this one is already quite large.
  • Decide whether find_peaks should have a special postfix like find_peaks_cwt. If so which one?

Related discussions and links

@tylerjereddy tylerjereddy added scipy.signal enhancement A new feature or improvement labels Jan 3, 2018
if wmax is not None:
keep &= (widths <= wmax)
peaks = peaks[keep]
if full_output:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I almost wonder if the dictionary data structure should just be used directly for storing values, if the extra return data remains part of the API.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you suggest using, for example, infodict['peaks'] instead of peaks? I think that would be possible. In that case infodict would always be used internally but only returned if full_output is True. However I think the trade off would be a slight decrease in readability because the line length would grow significantly...

Note: I have copied this style from scipy.integrate.odeint.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, something like that. I just wanted to make sure the point was brought up & trade-offs at least briefly considered. It looks like there was already quite some discussion of API matters in the original issue you linked.

if height is not None:
hmin, hmax = _parse_filter_args(height, vector, peaks)
heights = vector[peaks]
keep = np.ones(peaks.size, dtype=bool)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the initialization of keep needed here? I'm just thinking that the first comparison below would automatically generate a bool or mask array anyway.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is needed. Look at the possible valid inputs for the argument height:

  1. height=hmin -> hmax is None, only first comparison is evaluated
  2. height=(None, hmax) -> hmin is None, only second comparison is evaluated
  3. height=(hmin, hmax) -> both comparisons are evaluated
  4. height=(None, None) -> no comparison is evaluated (has no effect but is valid)

Case 3 dictates that the second evaluation must use &= (keep must be initialized). Considering the 2nd case one can easily see that initialization is not guarantied to happen in the first comparison.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, looks like a lot of input permutation handling is indeed needed.

if hmin is not None:
keep &= (vector[peaks] >= hmin)
if hmax is not None:
keep &= (vector[peaks] <= hmax)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this logic is repeated, I wonder if there's a way to streamline things a bit? Not sure if it is worth abstracting to i.e., a function or something or if this can be condensed to do both checks at once (maybe not because of the None pre-check requirement).

lagru added a commit to lagru/scipy that referenced this pull request Jan 8, 2018
- argument ret_bases makes the function return the position of peak
  bases as well. Useful for further analysis and validation of
  calculated prominences
- extended notes to summarize algorithm for prominence calculation
- if no peaks are supplied empty arrays are returned
- remove upper boundary for wlen which is not necessary
- fix typo wleft -> wright (L. 338)

See scipy#8264.
lagru added a commit to lagru/scipy that referenced this pull request Jan 8, 2018
- Maximal peak width is from base to base. This also allows to reduce
  the window in which for intersection points is searched, improving
  performance.
- Exact intersection point is found with linear interpolation.
- Basic algorithm is now described in docstring.
- Use kwargs and "Other Parameters" for optional pre-computed arguments.
- If no peaks are given the empty arrays are returned.
- rel_height can be zero.

See scipy#8264.
lagru added a commit to lagru/scipy that referenced this pull request Jan 8, 2018
Introduces major changes in the structure of the respective function.
This was partly necessary to comply with the changes to the underlying
functions peak_prominences and peak_widths.
The most important changes of its behavior are:
- The height for width calculation is now given as a ratio making it
  compliant with peak_widths. This approach is more flexible than the
  former one.
- Input and error handling was generally improved.
- _filter_generic is now used where applicable which removes duplicated
  code.
- The properties dict (former infodict) now returns more intermediate
  results. All results of peak_prominences and peak_widths are included.

See scipy#8264.
lagru added a commit to lagru/scipy that referenced this pull request Jan 8, 2018
Adds basic tests (full coverage) for the functions:
- scipy.signal.peak_promineces
- scipy.signal.peak_widths
- scipy.signal._peak_finding._unpack_filter_args
- scipy.signal.find_peaks

Also add notice to THANKS.txt and use better function order in signals
toctree.

See scipy#8264.
@@ -420,7 +1000,7 @@ def filt_func(line):
def find_peaks_cwt(vector, widths, wavelet=None, max_distances=None,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this function (find_peaks_cwt) particularly confusing. It mixes the signal pre-processing and the find peaks function.

Maybe it is worth including some kind of note mentioning that the (new) function find_peaks provide an easier, matlab-like, interface. This would probably make the life of someone looking for it easier.

In the long term, I think we should even considering deprecating this function.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with these issues to some extend. The note could be a good idea.

But I don't like the idea of removing the function find_peaks_cwt entirely. Wavelet transformation can be a really powerful tool for noisy signals and fills a niche that the new find_peaks can't fill the same way. Improving the API and / or documentation of find_peaks_cwt could allevate the the issues you mentioned above.

Copy link
Copy Markdown
Member

@ilayn ilayn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I would really recommend staying away from the ret_xxxx type output arguments. They are not shortening substantially anyways so you type almost everything hence return_xxxxx is better. But the more important issue is that the output argument number is changing according to bool switches. I would argue that this is causing more trouble than it actually solves.

I ran out of steam and will try to check the rest asap.

@@ -257,24 +257,39 @@ def peak_prominences(vector, peaks, wlen=None):
Indices of peaks in `vector`.
wlen : int, optional
A window lenght in samples that limits the search for the lowest
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo : length

contour line two a symmetric interval around the evaluated peak. If
not given the entire vector is used. Use this parameter to speed up
the calculation significantly for large vectors.
contour line two a symmetric interval around the evaluated peak. If not
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sentence is not complete I guess, "for the lowest contour line two a symmetric interval around [...]" ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that's a typo. The sentence should be "[...] for the lowest contour line to a symmetric interval around the evaluated peak."

the calculation significantly for large vectors.
contour line two a symmetric interval around the evaluated peak. If not
given the entire vector is used. Use this parameter to speed up the
calculation significantly for large vectors.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When is it large? Why don't we always use it then ? and so on. I get what you are saying but it's not helping a standard user.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is vague on purpose because I can't exactly pinpoint an array size after which there is a significant speedup. Maybe I could try to reword or expand on this in the notes? Something like:

"Searching for the peak's bases can be slow for large vector because the full vector needs to be evaluated for each peak. This evaluation area can be limited with the parameter wlen which restricts the algorithm to a window around the current peak and can shorten the calculation time significantly if wlen is significantly smaller than vector. However this may stop the algorithm from finding the true global contour line if the peak's bases are outside this window. Instead a higher contour line is found within the restricted window leading to a smaller calculated prominence. In practice this is only relevant for the largest set of peaks in vector. This behavior may even be used intentionally to calculate "local" prominences.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for saying see Notes somewhere here and adding this note below, I agree it depends on a lot of factors so we can't be specific

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes see notes would be indeed better

contour line two a symmetric interval around the evaluated peak. If not
given the entire vector is used. Use this parameter to speed up the
calculation significantly for large vectors.
ret_bases : bool
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 more letters and we have full readability return_bases. Otherwise compute_bases feels like more scipy-ish. Also note that you are already allocating these arrays so there is nothing gained from not returning them. Had you not even allocated or touched any of those arrays then maybe for very large problems it would have made a change.

Now it's just a variable number of outputs ala matlab which drove some of us crazy. Instead you might want to just always return these and remove the bool.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I can see why this is a code smell. When thinking about it the only disadvantage / annoyance I can think of is in when a function returns many values and the user is only interested in one. In Python 2.7 the syntax

widths, *_ = peak_widths(...)

seems to be unsupported which would force the "ugly" peak_widths()[0] (maybe that's only me). We would definitely have to write the examples this way (at least as long as Scipy supports 2.7).

Anyway, I think I will remove the ret_... handles and just use a constant output size.

For peak_finding it doesn't matter anyway. Instead the current properties argument could force the function to calculate all possible properties even if no filter options are supplied. However this would convolute the if-logic a bit.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW git grep ")\[0\]" | wc -l on the SciPy codebase gives 399 results (and another codebase I work on with ~200k lines gives 639 results), so I think it's a sufficiently common pattern to expect users to do it :)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, that is a convincing argument. :D

-------
prominences : ndarray
The calculated prominences for each peak in `peaks`.
left_bases, right_bases : ndarray, returned if ret_bases is true
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after ndarray a newline is needed for rendering the type correctly.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which type? I got this from a the example in scipy.integrate.odeint (returned parameter infodict).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the odeint example is less than ideal in this respect

@@ -293,13 +308,13 @@ def peak_prominences(vector, peaks, wlen=None):
Find all peaks and calculate prominences

>>> from scipy import signal
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from scipy.signal import find_peaks, peak_prominences to make it a bit more concise and to remove the signal repetition

Calculate the height of each peaks contour line
Calculate the height of each peak's contour line

>>> contour_heights = vector[peaks] - prominences
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If not used you can combine this line to the group below.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand. It is used in the call to vlines below. This is an extra line with comment to make the example as simple and explicit as possible.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant if this is only going to be used in the plotting section below you don't need to make it as a separate code block. Now it is divided into three code blocks for no apparent reason. You can see the output here

https://5174-1460385-gh.circle-artifacts.com/0/home/ubuntu/scipy/doc/build/html-scipyorg/generated/scipy.signal.peak_prominences.html#scipy.signal.peak_prominences

You actually write what I meant in the scipy.signal.find_peaks docstring 😃


>>> from scipy import signal
>>> peaks = signal.argrelmax(vector)[0]
>>> peaks = signal.find_peaks(vector)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is vector?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The array to be searched for peaks. Defined in line 306 (second row) in the folded section above.



def peak_prominences(vector, peaks, wlen=None):
def peak_prominences(vector, peaks, wlen=None, ret_bases=False):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't you just use array instead of vector which is more of a physical concept that also needs directions so on?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used vector because I want to convey that the input is expected to be one-dimensional and find_peaks_cwt already uses this name. array is so broad we could go with x as well which seems to be the SciPy standard anyway. ts for timeseries would be possible as well but I'm not a huge fan. I think if vector is not wanted we should go with x.

Actually I would prefer the name signal but that one is even worse considering the module name... I don't suppose we could change the package name to scipy.dsp? 😉

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for x as we use that most other places

raise ValueError('peak positions must be integers')
if wlen is not None and not 3 <= abs(wlen) <= vector.size:
raise ValueError('window lenght must in range [3, vector.size]')
if wlen is not None and not 3 <= wlen:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if wlen is not None and wlen < 4 and also don't you need to check the integerness of this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really. To calculate the window borders I do this:

wleft = peak - int(wlen / 2)
wright = peak + int(wlen / 2)
# Handle border cases
if wleft < 0:
    wleft = 0
elif vector.size < wright:
    wright = vector.size
# Use slice for prominence calculation
window = vector[wleft:wright]
# Correct peak position in vector
peak -= wleft

So wlen / 2 get's floored / converted to an integer anyway.

And 3 is a valid window range although not really useful.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah OK, then my first comment is only about using negated ifs. wlen < 4 is easier to read.

Also in your second code snipped, if the first branch of if is done wleft<0 then it won't go into the second check right?

you can simply write

n = vector.size
window = vector[0 if wleft<0 else wleft:n if n<wright else wright]

one liner ifs are a bit dirty but if you want you can also take them out and write

n = vector.size
wleft = 0 if wleft<0 else wleft
wright = n if n<wright else wright
window = vector[wleft:wright]

and so on. I don't mean to be a python grammar-nazi but just suggesting a bit more cleaner flow of code. So that someone else can follow the logic before installing breakpoints and other printing stuff for debugging when needed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to learn so feel free to be a Python grammar-nazi. :) I'm happy to accept your second suggestion. But why is wlen < 4 easier to read than wlen < 3?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that bit is just a typo, his suggestion is essentially the same as mine regarding the wlen check -- don't negate with not, instead change the relational operator itself without changing the effective truth table

@pv
Copy link
Copy Markdown
Member

pv commented Jan 9, 2018 via email

@pv
Copy link
Copy Markdown
Member

pv commented Jan 9, 2018 via email

from scipy.stats import scoreatpercentile


__all__ = ['argrelmin', 'argrelmax', 'argrelextrema', 'find_peaks_cwt']
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why a permissions change on this file?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean? I suppose you are not speaking of the new line wrapping?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I'm not (though I wouldn't change that either, other modules I see don't line break like this), look at the diff of this file on GitHub you'll see the permissions were changed 100644 → 100755

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see what you mean. That was unintentional. I must have messed that up somehow. I'll revert that as well as the line wrapping.

@lagru
Copy link
Copy Markdown
Contributor Author

lagru commented Jan 9, 2018

Maybe better "find_peaks_XXX" with some "XXX" that describes the method? - @pv

I have several thoughts on this:

  • The peak finding in find_peaks is currently based on argrelmax which simply compares neighboring values. The rest is just calculation of several peak properties and filtering. So in theory the "peak finding" bit could be swapped out for other functions / algorithms (even find_peaks_cwt with a smoothed vector). As you can see in the Todo section in the first post I'm not happy with argrelmax and would prefer my own solution which is similar but deals with flat peaks.
  • In both cases I can't think of a good abbreviation find_peaks_... for the current state. Maybe something abbreviation of "discrete local maxima" (dlm)?

raise ValueError('vector and peaks must have exactly one dimension')
if peaks.max() >= vector.size:
raise ValueError('peak position exceeds the range of vector')
if not np.issubdtype(peaks.dtype, int):
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the builds / test fail with the the warning:

FutureWarning: Conversion of the second argument of issubdtype from int to np.signedinteger is deprecated. In future, it will be treated as np.int64 == np.dtype(int).type.

I have a feeling that using np.issubdtype(peaks.dtype, np.integer) would be the fix for this but I can't reproduce this warning on my system. Any insights before I blindly push this?

@larsoner
Copy link
Copy Markdown
Member

larsoner commented Jan 9, 2018

Maybe find_peaks_zd short for (naively) finding points with a derivative value of zero?

raise ValueError('peak position exceeds the range of vector')
if not np.issubdtype(peaks.dtype, int):
raise ValueError('peak positions must be integers')
if wlen is not None and not 3 <= wlen:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not 3 <= wlen is not so readable, I'd change this to:

if wlen is not None:
    wlen = operators.index(wlen)  # ensure int-ness
    if wlen < 3:
        raise ValueError(...)

# the full vector
if wlen is not None and wlen < vector.size * 2:
# Calculate window borders around the evaluated peak
wleft = peak - int(wlen / 2)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... then here you can do peak + wlen // 2 instead of this int conversion

lagru added a commit to lagru/scipy that referenced this pull request Jan 9, 2018
Notable changes:
- Refactored `vector` to `x`
- No "ret_xxx" style arguments are used to determine the number of
  returned parameters which is now always the same.
- Improved docstrings, removed typos and made error messages more
  descriptive.
- Hopefully addressed issue that caused builds to fail.
@lagru lagru changed the title WIP: ENH: Extend peak finding capabilities in scipy.signal ENH: Extend peak finding capabilities in scipy.signal Jan 9, 2018
lagru added a commit to lagru/scipy that referenced this pull request Jan 26, 2018
- wlen can be given as float and is implicitly converted to int. This
  has the advantage that the wlen can be more easily specified using
  the inverse of a frequency: e.g. wlen=(fs * 1.3)
- _unpack_filter_args now raises a ValueError if interval borders are
  supplied as arrays whose size doesn't match x.
- modified _filter_generic to be more consistent with other filter_
  functions (interval is given after unpacking)
- updated documentation
- updated tests for new behavior and increased coverage
References
----------
.. [1] Wikipedia Article for Topograhpic Prominence:
https://en.wikipedia.org/wiki/Topographic_prominence
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a more authoritative reference for this (journal article)?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, typo in Topographic on the line above

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a satisfying response as I come up empty handed when searching for papers. The stuff that I do find is in line with the linked article and deals with topographic analysis of peaks and mountains (nothing DSP related). Of course I could use some of the links on that Wikipedia article itself but I think that wouldn't improve the situation.

Honestly I first used Matlab's explanation as a guideline and later this Wikipedia article. Maybe something can be found later on but for now I'd like to leave this reference as it explains the term "prominence" (although 2D) in more detail than appropriate for this docstring.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course I could use some of the links on that Wikipedia article itself but I think that wouldn't improve the situation.

Yeah those don't look useful.

I had a look as well, it just doesn't seem to be a thing in DSP. So let's leave the Wikipedia reference

are accessible. See `scipy.signal.peak_widths` for a description of
their content.

Other Parameters
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't use this section anymore, content should all go under Parameters

This function takes a one-dimensional array and finds all local maxima by
simple comparision of neighbouring values. A subset of these peaks can be
selected with different filters described below.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to have one or two sentences here about what this function's limitations are - e.g. for noisy signals the approach doesn't work too well, for that it's better to use find_peaks_cwt.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like:

"Because these filters depend on the calculation of properties for each peak this function can be slow for noisy data that has many small peaks relative to its length. In this case it is recommended to smooth the signal before peak finding and / or reduce the number of peaks with fast filter options first."

This feels more like it belongs in the Notes section.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't thinking about speed, more about accuracy. Something like "This function directly searches for maxima, hence for noisy signals the determined peak locations can be off if the noise changes the position of a local maximum. In those cases, find_peaks_cwt or a peak fitting method may be more appropriate to use".

And yes, you're probably right that this should go in the notes section.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I should have thought of that. I definitely add this or something like it either here or in the Notes section. 👍

@lagru
Copy link
Copy Markdown
Contributor Author

lagru commented Jan 29, 2018

Here is another suggestion concerning the discussion around the correct name for find_peaks:

scipy.signal.filtered_peaks(x, height=(1, 6), ...)

It implies that a filtered subset of peaks in x is returned which is exactly what this function does. As I already said: I feel that the current way to find local maxima (with argrelmax) is only an implementation detail which I intend to replace with a solution that can handle flat peaks. Thus I wouldn't consider it a strong basis for naming the function.

Note: The current name is a pythonized version of Matlab's findpeaks which has similar functionality.

@lagru
Copy link
Copy Markdown
Contributor Author

lagru commented Jan 29, 2018

@ilayn Do the introduced changes match your pending requests?

@ilayn
Copy link
Copy Markdown
Member

ilayn commented Jan 29, 2018

Yes LGTM. Sorry I forgot to click the button

@larsoner
Copy link
Copy Markdown
Member

The problem I see with using filtered_peaks in the name is that filter often (and in scipy.signal in particular) carries associations with FIR/IIR filtering. In this sense the operation is more like subselection, picking, or restriction. Maybe pick_peaks or something would work, but I don't love it.

lagru added a commit to lagru/scipy that referenced this pull request Jan 31, 2018
This Cython function is intended to be used for a peak finding function
proposded in scipy#8264. Exposure to the user is not intended. It is meant as
an alternative to scipy.signal.argrelmax` and compares as follows:

- It only works on 1D arrays
- It should be faster for all use cases
- It can detect flat maxima (more than one sample wide).

Unit tests are provided as well.
lagru added a commit to lagru/scipy that referenced this pull request Jan 31, 2018
This Cython function is intended to be used for a peak finding function
proposded in scipy#8264. Exposure to the user is not intended. It is meant as
an alternative to scipy.signal.argrelmax` and compares as follows:

- It only works on 1D arrays
- It should be faster for all use cases
- It can detect flat maxima (more than one sample wide).

Unit tests are provided as well.
lagru added 8 commits March 1, 2018 12:26
This function allows to calculate the prominince of a peak in a signal.
The prominence is a measure for how much a peak stands out from the
base line of a signal.

See scipy#8264.
This function allows to calculate the width of a peak in a signal.
The width is the horizontal distance between a peaks rising and
falling slope at a chosen height.

See scipy#8264
This function allows to find a subset of all local maxima in a vector
by specifying requirements e.g. peak height, distance, etc.
It wraps and uses the two public functions
- peak_prominences
- peak_widths
and the private functions
- _unpack_filter_args
- _filter_generic
- _filter_by_peak_threshold
- _filter_by_peak_distance

See scipy#8264
Adds basic tests (full coverage) for the functions:
- scipy.signal.peak_promineces
- scipy.signal.peak_widths
- scipy.signal._peak_finding._unpack_filter_args
- scipy.signal.find_peaks

Also add notice to THANKS.txt and use better function order in signals
toctree.

See scipy#8264.
- The prefix `scipy.signal.` is not needed to create references to other
  functions within the module. The shorter version is more readable.
- The example in `peak_widths` was broken for Python 2.7 because it
  relied on implicit float division with integers.
- The example now uses the argument `rel_height` to be more explicit.

See scipy#8264
- Replace kwargs in peak_widths with optional keyword arguments which
  improves introspection and resistance against typos.
- Removed obsolete if-statement if number of peaks was 0. That way
  the content of `properties` gets populated properly. Extended unit
  tests to account for this.
- Removed unnecessary indentation for prominence and with filtering.
- Refactored "wheights" to "width_heights" which is more consistent
  with "peak_heights" and more descriptive.
- More readable format for properties-key-list in docstring of
  find_peaks.
- Explained usage of open interval `(None, None)` and made unit tests
  use this construct.
- Replaced example in find_peaks's docstring with a more complex one
  that demonstrates simple and more advanced usage.
- Don't use the term "filter" because in the context of this module it
  already has a meaning. Instead use "condition", "select" or other
  terms.
- "filter options" are now named "conditions" which has a clearer
  meaning.
- Remove unnecessary argument `peaks` in for function
  `_select_by_property`.
- Use "in samples" where horizontal distance is required.
- Added some cases to unit tests and refactored a few objects.
@lagru
Copy link
Copy Markdown
Contributor Author

lagru commented Mar 1, 2018

Rebased because of merge conflicts after #8499 was introduced.

@larsoner
Copy link
Copy Markdown
Member

larsoner commented Mar 1, 2018

Thanks, I'll merge tomorrow unless someone else looks in the meantime

@larsoner
Copy link
Copy Markdown
Member

larsoner commented Mar 1, 2018

Also @lagru I noticed you commented on #3749 among others, if you think this PR will effectively fix them can you add Closes #3749 to your top comment so they get closed when this is merged?

Also this makes me think that we should maybe add a note to the argrel* functions that they do not work on flat peaks, but find_peaks will. WDYT?

@lagru
Copy link
Copy Markdown
Contributor Author

lagru commented Mar 1, 2018

Done.

Also this makes me think that we should maybe add a note to the argrel* functions that they do not work on flat peaks, but find_peaks will. WDYT?

The note sounds like a good idea and would prevent confusion and new issues on the topic.
Furthermore if _argmaxima1d could be vectorized and accept an axis argument like _boolextrema I think there would be quite an overlap in functionality. In that case it may be worth it to consider merging, replacing or depreciating the argrel* functions.

@larsoner
Copy link
Copy Markdown
Member

larsoner commented Mar 2, 2018

@lagru for now let's just add the notes and do any deprecation etc. in a subsequent PR.

Copy link
Copy Markdown
Member

@larsoner larsoner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a latin nitpick then we're good to go!

This function uses `argrelextrema` with np.less as comparator.
This function uses `argrelextrema` with np.less as comparator. Therefore it
requires a strict inequality on both sides of a value to consider it a
minima. This means flat minima (more than one sample wide) are not detected.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a minima -> a minimum

This function uses `argrelextrema` with np.greater as comparator.
This function uses `argrelextrema` with np.greater as comparator. Therefore
it requires a strict inequality on both sides of a value to consider it a
maxima. This means flat maxima (more than one sample wide) are not detected.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same change here here

A reference to find_peaks which detects flat maxima was added as well
where appropriate.
@lagru
Copy link
Copy Markdown
Contributor Author

lagru commented Mar 5, 2018

@larsoner Just a small reminder to merge this if you feel that it's ready.

@larsoner larsoner merged commit 386542c into scipy:master Mar 5, 2018
@larsoner
Copy link
Copy Markdown
Member

larsoner commented Mar 5, 2018

Thanks @lagru! Feel free to move on to the tutorial stuff (or other things) if you still have time.

Bummer to hear about GSoC not being a good fit time-wise, but hope you continue to make PRs as time permits!

@lagru
Copy link
Copy Markdown
Contributor Author

lagru commented Mar 5, 2018

And thank you guys for taking the time to deal with this PR and your exclusively constructive feedback. I'm positive I'll enjoy contributing here in the future. 🙂

@ilayn ilayn added this to the 1.1.0 milestone Mar 5, 2018
@lagru lagru deleted the issue-8211 branch March 5, 2018 15:58
adbugger pushed a commit to adbugger/scipy that referenced this pull request Mar 11, 2018
* ENH: Add function to calculate peak prominence to scipy.signal

This function allows to calculate the prominince of a peak in a signal.
The prominence is a measure for how much a peak stands out from the
base line of a signal.

See scipy#8264.

* ENH: Add function to calculate peak width to scipy.signal

This function allows to calculate the width of a peak in a signal.
The width is the horizontal distance between a peaks rising and
falling slope at a chosen height.

See scipy#8264

* ENH: Add new find_peaks function to scipy.signal

This function allows to find a subset of all local maxima in a vector
by specifying requirements e.g. peak height, distance, etc.
It wraps and uses the two public functions
- peak_prominences
- peak_widths
and the private functions
- _unpack_filter_args
- _filter_generic
- _filter_by_peak_threshold
- _filter_by_peak_distance

See scipy#8264

* TST: Add tests for new functions in scipy.signal

Adds basic tests (full coverage) for the functions:
- scipy.signal.peak_promineces
- scipy.signal.peak_widths
- scipy.signal._peak_finding._unpack_filter_args
- scipy.signal.find_peaks

Also add notice to THANKS.txt and use better function order in signals
toctree.

See scipy#8264.

* DOC: Shorter function references & fix example in peak_widths

- The prefix `scipy.signal.` is not needed to create references to other
  functions within the module. The shorter version is more readable.
- The example in `peak_widths` was broken for Python 2.7 because it
  relied on implicit float division with integers.
- The example now uses the argument `rel_height` to be more explicit.

See scipy#8264

* MAINT: Address suggested improvements in scipy#8264

- Replace kwargs in peak_widths with optional keyword arguments which
  improves introspection and resistance against typos.
- Removed obsolete if-statement if number of peaks was 0. That way
  the content of `properties` gets populated properly. Extended unit
  tests to account for this.
- Removed unnecessary indentation for prominence and with filtering.
- Refactored "wheights" to "width_heights" which is more consistent
  with "peak_heights" and more descriptive.
- More readable format for properties-key-list in docstring of
  find_peaks.
- Explained usage of open interval `(None, None)` and made unit tests
  use this construct.
- Replaced example in find_peaks's docstring with a more complex one
  that demonstrates simple and more advanced usage.

* MAINT: Improve terminology in documentation and code

- Don't use the term "filter" because in the context of this module it
  already has a meaning. Instead use "condition", "select" or other
  terms.
- "filter options" are now named "conditions" which has a clearer
  meaning.
- Remove unnecessary argument `peaks` in for function
  `_select_by_property`.
- Use "in samples" where horizontal distance is required.
- Added some cases to unit tests and refactored a few objects.

* DOC: Add new functions for peak finding to release notes 1.1.0

* DOC: Describe behavior on flat extrema for argrelmax & argrelmin

A reference to find_peaks which detects flat maxima was added as well
where appropriate.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement A new feature or improvement scipy.signal

Projects

None yet

9 participants