Skip to content

added option to keep or remove the DC for psd calculations issue #11767#11769

Merged
drammock merged 9 commits intomne-tools:mainfrom
chapochn:PSD-DC
Aug 23, 2023
Merged

added option to keep or remove the DC for psd calculations issue #11767#11769
drammock merged 9 commits intomne-tools:mainfrom
chapochn:PSD-DC

Conversation

@chapochn
Copy link
Copy Markdown
Contributor

Reference issue

Fixes #11767

What does this implement/fix?

enhanced the functions psd_array_welch and psd_array_multitaper with option "remove_dc", with True as default, since the previous default behavior was to remove the DC. Updated the relevant documentation and all the related functions.

@chapochn
Copy link
Copy Markdown
Contributor Author

For the failed tests, when I go to CircleCI, I get "block-unregistered-user"

@larsoner
Copy link
Copy Markdown
Member

Can you try logging in there through GitHub, then push another commit?

If that doesn't fix it, I can push an empty commit and it should run

@drammock
Copy link
Copy Markdown
Member

For the failed tests, when I go to CircleCI, I get "block-unregistered-user"

I restarted CircleCI. I think if you create a CircleCI account (or "log in with GitHub"?) then that will stop happening on your PRs. It's a bit annoying, I'll grant, but since we actually pay for CircleCI time it prevents someone (accidentally or maliciously) burning through a lot of our grant money by opening a bunch of doc-build-heavy PRs.

@chapochn
Copy link
Copy Markdown
Contributor Author

I created an account now on CircleCI now, thanks, let's see if I have issues again in the future

@chapochn
Copy link
Copy Markdown
Contributor Author

chapochn commented Jun 30, 2023

Are any of the fails related to my PR? I did not change the functions that fail.

Comment on lines +2497 to +2499
function (e.g., ``n_fft, n_overlap, n_per_seg, average, window, remove_dc``
for Welch method, or
``bandwidth, adaptive, low_bias, normalization`` for multitaper
``bandwidth, adaptive, low_bias, normalization, remove_dc`` for multitaper
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if remove_dc is a common kwarg for both welch and multitaper, then it should probably become part of the main signature of compute_psd rather than being a **method_kwarg (which is meant to handle only the kwargs that differ between methods)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about that too. If we put that option outside of method_kwarg, I would need to add it almost "everywhere" in the time_frequency classes/methods, all the constructors, etc... Was wondering if there would be a way to avoid that...

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$ git grep method_kw_psd
mne/epochs.py:2364:        %(method_kw_psd)s
mne/epochs.py:2455:        %(method_kw_psd)s
mne/evoked.py:1059:        %(method_kw_psd)s
mne/evoked.py:1151:        %(method_kw_psd)s
mne/io/base.py:2170:        %(method_kw_psd)s
mne/time_frequency/spectrum.py:124:        %(method_kw_psd)s
mne/time_frequency/spectrum.py:179:        %(method_kw_psd)s Defaults to ``dict(n_fft=2048)``.
mne/time_frequency/spectrum.py:263:        %(method_kw_psd)s
mne/time_frequency/spectrum.py:1066:    %(method_kw_psd)s
mne/time_frequency/spectrum.py:1201:    %(method_kw_psd)s
mne/utils/docs.py:2493:    "method_kw_psd"

I count ten places. That seems doable? It is only for spectrum classes and the compute_psd methods, not for the time-frequency classes/methods I think.

Copy link
Copy Markdown
Contributor Author

@chapochn chapochn Jun 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I count 56 if you do git grep method_kw, but yeah doable I guess!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before I start making all the changes, is there a preference of where it should come in the order of variables? Here is the init of BaseSpectrum for example:

    def __init__(
        self,
        inst,
        method,
        fmin,
        fmax,
        tmin,
        tmax,
        picks,
        proj,
        *,
        n_jobs,
        verbose=None,
        **method_kw,
    ):

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok i'll wait for your input before making any changes

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @drammock! Checking in if you have any updates on this.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, sorry it took so long to get you an answer. I think we should add the new kwarg to:

  • {BaseRaw, Epochs, Evoked}.compute_psd() and {BaseSpectrum,Spectrum,EpochsSpectrum}.__init__ should all get the new named param remove_dc
    • you'll need to pass it through to ._compute_spectra and from there pass it into self._psd_func.
  • {BaseRaw, Epochs, Evoked}.plot_psd() should NOT --- those are legacy methods and their API shouldn't change. It should still work to pass remove_dc --- it will get bundled into method_kw
  • SpectrumMixin.plot_psd(), .plot_psd_topo() and .plot_psd_topomap() should NOT (same reason: legacy)

Hopefully that works; this is based on git grep and reading/reasoning about the code; I haven't tested it. If it turns out to fail or be really messy it won't be the end of the world to incorporate remove_dc into the method_kw everywhere.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll work on it now

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

finished, please have a look, i tested in my notebook, all seems ok

@drammock
Copy link
Copy Markdown
Member

I haven't looked closely yet but the names of the failing test functions suggest that they're legitimate failures. For example:

mne/time_frequency/tests/test_multitaper.py:53: in test_multitaper_psd
    assert_array_almost_equal(psd, psd_ni, decimal=4)
/usr/share/miniconda3/envs/mne/lib/python3.8/contextlib.py:75: in inner
    return func(*args, **kwds)
/usr/share/miniconda3/envs/mne/lib/python3.8/contextlib.py:75: in inner
    return func(*args, **kwds)
E   AssertionError: 
E   Arrays are not almost equal to 4 decimals

https://github.com/mne-tools/mne-python/actions/runs/5906791375/job/16023547968?pr=11769#step:13:3989

@drammock
Copy link
Copy Markdown
Member

if you run those tests locally do they pass?

@chapochn
Copy link
Copy Markdown
Contributor Author

Sorry, forgot to run to check locally. I think the issue is because of the new default value for remove_dc. I'll check what is going on

@chapochn
Copy link
Copy Markdown
Contributor Author

chapochn commented Aug 18, 2023

in nitime code there is:

    # de-mean this sucker
    s = remove_bias(s, axis=-1)

https://github.com/nipy/nitime/blob/2d5a7b05a084960567318b39256a523e53f07790/nitime/utils.py#L735

So it seams they remove the mean automatically.
We have a few of options:

  • put back the default of remove_dc to True
  • set remove_dc to True in the test function
  • remove DC before calling ni.algorithms.spectral.multi_taper_psd, and maybe have 2 different tests, with and without DC

@drammock
Copy link
Copy Markdown
Member

So it seams they (NiTime) remove the mean automatically. We have a few of options:

  • put back the default of remove_dc to True
  • set remove_dc to True in the test function
  • remove DC before calling ni.algorithms.spectral.multi_taper_psd, and maybe have 2 different tests, with and without DC

For me I think I lean toward preserving backward compatibility here, which means we should make remove_dc=True be the default. I say this because so far what I've heard is

  • failing to remove DC might cause leakage (e.g. when zero-padding)
  • there might or might not have been a principled reason why it is needed for the algorithms (but so far nobody has had time to look it up)
  • NiTime does it always (and given the code comment: intentionally so)

To me this says "there are at least some cases where demeaning makes sense so we can't consider this a bugfix". Therefore defaulting to remove_dc=False would require a deprecation cycle for the behavior change, and I don't think we should go through that extra trouble if we're not sure that remove_dc=False is the clearly better choice in most cases. We can always change the default later (with a deprecation cycle) if we do become convinced of that.

@larsoner are you convinced by that reasoning?

(note that if remove_dc=True is the default then the test failures should magically go away, which is nice)

@larsoner
Copy link
Copy Markdown
Member

Yes I think unless someone looks up in a signal processing textbook or online material (or already has the knowledge!) the reasoning behind mean removal and finds that it's always safely optional then we should assume that it's at least sometimes unsafe and so shouldn't change our default. I have some vague recollection that you need to do it to prevent leakage as you say so my gut says it's better to be safe than sorry here

@chapochn
Copy link
Copy Markdown
Contributor Author

ok, I set remove_dc = True (although I'll personally use remove_dc=False :) ). All tests pass locally, needed to fix one bug in class initialization.

@chapochn
Copy link
Copy Markdown
Contributor Author

Here are some reasons for detrending in the Welch case (from https://holometer.fnal.gov/GH_FFT.pdf), so probably does make sense to keep remove_dc=True in the default case. But good to have the choice now. For Multitaper, which only has one segment, I imagine it could help with some numerical stability, as it only affects one point of the resulting PSD.

Preprocessing of the data: DC and trend removal
Often the time series we use as input for our algorithm will have a non-zero DC average, which
may even slowly change over time. Such a DC average will show up in the first frequency bin
(m = 0) of the resulting spectrum. If a window function is used or the average changes over
time, it will also leak into adjacent frequency bins, possibly masking low-frequency signals.
The DC average is not usually of interest as a result of the DFT processing, due to calibration
problems (see Section 9) and because there are simpler and better methods to determine it (i.e.
simple averaging or digital low-pass filtering).
Hence we normally want to remove it from the data before starting the DFT processing. There
are several ways to accomplish this, in (roughly) increasing order of perfection:
1. Compute the average of the whole time series (before splitting it into segments) and
subtract that average from all data points.
2. Compute a straight line between the first and last data point of the whole time series
(before splitting it into segments) and subtract that line from all data points.
3. Compute an average trend via linear regression of the whole time series (before splitting
it into segments) and subtract that line from all data points.
4. Compute the average of each segment (before applying the window function) and subtract
that average from the data points.
5. Compute a straight line between the first and last data point of each segment (before
applying the window function) and subtract that line from the data points.
6. Compute an average trend via linear regression of each segment (before applying the
window function) and subtract that line from the data points.
7. Pass the input time series through a digital high-pass filter.
Whether such a procedure should be applied depends on the situation. If the DC average
is known to be negligible (e.g. because of AC coupling) and/or the low frequency bins are
uninteresting anyway, it can be omitted. On the other hand, in some cases (e.g. small fluctuations
of huge numbers such as variations in the Hz range of a beat frequency in the GHz range) it is
even essential to prevent numerical collapse of the FFT algorithm.
For a general-purpose algorithm where the properties of the input data are unknown, it is
recommended to apply at least a simple version of the above procedures since they are cheap
compared with the rest of the processing, increase the usefulness of the lowest frequency bins
and may prevent garbage results.

@agramfort
Copy link
Copy Markdown
Member

agramfort commented Aug 21, 2023 via email

@drammock drammock merged commit c874a7d into mne-tools:main Aug 23, 2023
@chapochn
Copy link
Copy Markdown
Contributor Author

chapochn commented Aug 23, 2023

Glad to see it merged! Do I need to document the changes somewhere?

@chapochn chapochn deleted the PSD-DC branch August 23, 2023 19:23
@drammock
Copy link
Copy Markdown
Member

drammock commented Aug 23, 2023

Glad to see it merged! Do I need to document the changes somewhere?

haha, oops! I was supposed to ask you for a changelog entry before merging. Can you do a quick follow-up to add an entry to doc/changes/devel.rst? In the Enhancements section, something like

Added option to avoid detrending / DC removal when computing Welch or multitaper spectra (:gh:`11769` by `Nikolai Chapochnikov`_)

feel free to edit the changelog entry if you want to mention the specific functions/methods/classes that are affected

chapochn added a commit to chapochn/mne-python that referenced this pull request Aug 23, 2023
@chapochn
Copy link
Copy Markdown
Contributor Author

ok, I added another PR, not sure if there was a simpler way?

@drammock
Copy link
Copy Markdown
Member

ok, I added another PR, not sure if there was a simpler way?

alas there is not really a simpler way.

drammock added a commit that referenced this pull request Aug 23, 2023
Co-authored-by: Daniel McCloy <dan@mccloy.info>
snwnde pushed a commit to snwnde/mne-python that referenced this pull request Mar 20, 2024
Co-authored-by: Daniel McCloy <dan@mccloy.info>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

psd welch calculated incorrectly at frequency=0

4 participants