Skip to content

adding a bad channel detection method using LOF algorithm#11234

Merged
larsoner merged 23 commits intomne-tools:mainfrom
vpKumaravel:badChannelLOF
Mar 1, 2024
Merged

adding a bad channel detection method using LOF algorithm#11234
larsoner merged 23 commits intomne-tools:mainfrom
vpKumaravel:badChannelLOF

Conversation

@vpKumaravel
Copy link
Copy Markdown
Contributor

@vpKumaravel vpKumaravel commented Oct 10, 2022

Reference issue

NA

What does this implement/fix?

This PR contains a new feature for bad channel detection using Local Outlier Factor (LOF) algorithm.
File name: mne/preprocessing/detect_bad_channels.py

Additional information

The proposed algorithm is used in Newborns EEG Artifact Removal (NEAR) pipeline published earlier this year. Recently, we analyzed data from adult subjects and the algorithm is adaptable to different populations. This contribution is a first step towards a fully-automated preprocessing pipeline based on MNE-Python. Your feedback is greatly appreciated.

Best regards,
Velu

@welcome
Copy link
Copy Markdown

welcome bot commented Oct 10, 2022

Hello! 👋 Thanks for opening your first pull request here! ❤️ We will try to get back to you soon. 🚴🏽‍♂️

Copy link
Copy Markdown
Member

@drammock drammock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! We'll want to take a closer look at your paper before reviewing/merging... in the meantime here are some quick comments of things I noticed just skimming the docstring and code.

@agramfort
Copy link
Copy Markdown
Member

agramfort commented Oct 17, 2022

@vpKumaravel what I would do is to take some MEG data in openneuro and see how this compares with find_bad_channels_maxwell. Just report the amount of overlap.

Now you should start with the data we use in our tutorials here. For example on MNE sample they are 2 clear bad channels (one grad and one EEG)

@vpKumaravel
Copy link
Copy Markdown
Contributor Author

@vpKumaravel what I would do is to take some MEG data in openneuro and see how this compares with find_bad_channels_maxwell. Just report the amount of overlap.

Now you should start with the data we use in our tutorials here. For example on MNE sample they are 2 clear bad channels (one grad and one EEG)

@agramfort Sorry for the delay! I was busy with my PhD thesis defense.

Here are the notebook files in which I validated LOF against Maxwell in both MNE and an OpenNeuro MEG dataset. This is the first time, I validated LOF in MEG data. I think the results are promising. Please let me know what you think.

If you want, I can also do the same for EEG datasets and compare the results against "Auto-reject" as Maxwell does not deal with EEG. However, auto-reject considers epoched data, while LOF works in continuous data.

Notebook 1: https://colab.research.google.com/drive/15MiXEWGyExvpQxuM7HedxT0zogwwzjVk?usp=sharing
Notebook 2: https://colab.research.google.com/drive/17X7DJ3p231B20fNb6A75pwX1WSN0Qnm4?usp=sharing

Thanks for your time :)

@agramfort
Copy link
Copy Markdown
Member

agramfort commented Apr 10, 2023 via email

@vpKumaravel
Copy link
Copy Markdown
Contributor Author

this looks like single dataset experiment and using our eyes to evaluate the method. I would say it needs a more quantitative evaluation on a relevant metric and computed over a few datasets to be fully convincing. You can maybe find inspiration on this in the autoreject paper?

On Fri, Mar 31, 2023 at 5:02 PM Velu Prabhakar Kumaravel < @.> wrote: @vpKumaravel https://github.com/vpKumaravel what I would do is to take some MEG data in openneuro and see how this compares with find_bad_channels_maxwell https://mne.tools/dev/generated/mne.preprocessing.find_bad_channels_maxwell.html#mne.preprocessing.find_bad_channels_maxwell. Just report the amount of overlap. Now you should start with the data we use in our tutorials here. For example on MNE sample they are 2 clear bad channels (one grad and one EEG) @agramfort https://github.com/agramfort Sorry for the delay! I was busy with my PhD thesis defense. Here are the notebook files in which I validated LOF against Maxwell in both MNE and an OpenNeuro MEG dataset. This is the first time, I validated LOF in MEG data. I think the results are promising. Please let me know what you think. If you want, I can also do the same for EEG datasets and compare the results against "Auto-reject" as Maxwell does not deal with EEG. However, auto-reject considers epoched data, while LOF works in continuous data. Notebook 1: https://colab.research.google.com/drive/15MiXEWGyExvpQxuM7HedxT0zogwwzjVk?usp=sharing Notebook 2: https://colab.research.google.com/drive/17X7DJ3p231B20fNb6A75pwX1WSN0Qnm4?usp=sharing Thanks for your time :) — Reply to this email directly, view it on GitHub <#11234 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABHKHHJNTIMXY4ABOOB3XDW63WWXANCNFSM6AAAAAARBQFCF4 . You are receiving this because you were mentioned.Message ID: @.>

Thanks, Alexandre!
For EEG datasets, I have already done such extensive analysis in the past, where we used F1 Score metric for comparative evaluation (results here).
I will try to look for MEG datasets with labeled ground truth for bad sensors to do the same.

-V

@drammock drammock added the needs-discussion issues requiring a dev meeting discussion before the way forward is clear label Apr 11, 2023
@drammock
Copy link
Copy Markdown
Member

@agramfort let's discuss at our next dev meeting; on quick look the Sensors paper looks like a reasonable quantification of benefit for EEG at least.

@agramfort
Copy link
Copy Markdown
Member

agramfort commented Apr 11, 2023 via email

@drammock
Copy link
Copy Markdown
Member

@jasmainak I wonder if you could weigh in here (with your author-of-autoreject hat on)? @larsoner and @britta-wstnr and I were chatting today and wondering whether this bad channel detection method is a good candidate for inclusion in MNE, vs living in its own small package. Here is the paper with the comparison to other methods (EEG only): https://www.mdpi.com/1424-8220/22/19/7314/htm

@jasmainak
Copy link
Copy Markdown
Member

autoreject is pretty "comprehensive" ... it detects artifacts at the level of single trials, then uses that to detect bad epochs and/or repair bad segments of the data using physics-based interpolation.

I quickly read the code of the LOF algorithm and it seems similar to the FASTER/RANSAC family of algorithms in that they work on the entire time period or epoch. The threshold of such algorithms is typically tuned on a large dataset, so they tend to work fine for a wide variety of cases and it can be useful as a "quick" preprocessing step for other algorithms like maxwell filtering, ICA etc.

However, I suspect there is a chance of losing "good" data by removing channels where all the data is not bad. Or conversely, if a channel has 20% of the trials bad, those 20% of trials will be completely removed during epoching if the channel is not marked bad. I think these algorithms are hard to test empirically because it depends how the performance metric is chosen/designed and what it is sensitive to. Also it might be worth considering whether there is an advantage for the user to bake domain-agnostic outlier detection methods that are already available in sklearn.

Having said that, if many users find such a function useful in their workflows, I would say why not.

@vpKumaravel
Copy link
Copy Markdown
Contributor Author

autoreject is pretty "comprehensive" ... it detects artifacts at the level of single trials, then uses that to detect bad epochs and/or repair bad segments of the data using physics-based interpolation.

I quickly read the code of the LOF algorithm and it seems similar to the FASTER/RANSAC family of algorithms in that they work on the entire time period or epoch. The threshold of such algorithms is typically tuned on a large dataset, so they tend to work fine for a wide variety of cases and it can be useful as a "quick" preprocessing step for other algorithms like maxwell filtering, ICA etc.

However, I suspect there is a chance of losing "good" data by removing channels where all the data is not bad. Or conversely, if a channel has 20% of the trials bad, those 20% of trials will be completely removed during epoching if the channel is not marked bad. I think these algorithms are hard to test empirically because it depends how the performance metric is chosen/designed and what it is sensitive to. Also it might be worth considering whether there is an advantage for the user to bake domain-agnostic outlier detection methods that are already available in sklearn.

Having said that, if many users find such a function useful in their workflows, I would say why not.

Thanks, everyone, for the nice discussion.
If I may add, LOF is also a lot different compared to the FASTER algorithm: Firstly, LOF does not assume any distribution for the data (unlike FASTER) as it detects outliers based on the density of the clusters. Second, even if it considers the entire time period, LOF finds outliers within a "local" neighborhood determined by the k-nearest neighbors algorithm. In other words, LOF does not consider each channel separately, and it computes the outlier score based on the relative density distribution of different M/EEG channel clusters. However, I completely agree and suggest the users calibrate both LOF threshold (a default value of 1.5 seems a good starting point) and the number of nearest neighbors (a default value of 20 seems a good point to start).

Moreover, there are researchers who look for channel-wise bad channels rather than epoch-wise. I did a survey on Twitter over a year ago, and here are the responses (link)

@larsoner
Copy link
Copy Markdown
Member

@vpKumaravel one sticking point will be performance on different channel types. MNE is used for EEG but also (quite often) MEG. Would it be possible for you to compare find_bad_channels_maxwell to your algorithm when applied to MEG data on some datasets (e.g., subjects from https://openneuro.org/datasets/ds000117 and maybe another MEG dataset?)? FYI you'll probably have to process magnetometer (meg='mag') and gradiometer (meg='grad') separately since they generally have very different scales (and units).

@vpKumaravel
Copy link
Copy Markdown
Contributor Author

vpKumaravel commented Apr 28, 2023

@vpKumaravel one sticking point will be performance on different channel types. MNE is used for EEG but also (quite often) MEG. Would it be possible for you to compare find_bad_channels_maxwell to your algorithm when applied to MEG data on some datasets (e.g., subjects from https://openneuro.org/datasets/ds000117 and maybe another MEG dataset?)? FYI you'll probably have to process magnetometer (meg='mag') and gradiometer (meg='grad') separately since they generally have very different scales (and units).

Hi @larsoner, thanks for your comment. I would actually be interested to see how LOF performs on MEG datasets and compare the results with the Maxwell function. But, as I stated somewhere earlier in this thread, I couldn't find open-source MEG datasets with annotated bad channels. If you know of any, kindly suggest them.

Nevertheless, I took the datasets you shared here and ran the LOF scripts just for subject 01. The results are stored in the CSV files here. You will also find the LOF outlier scores for each of these detected bad channels to interpret the results. I have set the threshold to 1.5 (empirically).

Surprisingly, none of the channels were found by the Maxwell method. If it helps, I filtered the data between 1 and 40 Hz.

Edit (4/5/23):

I add a couple of PSD plots before and after LOF preprocessing. To me, it seems LOF removes channels that contain high-frequency (EMG) and low-frequency (EOG) noise. These files represent run-01 and run-04, respectively.

sub-01_ses-meg_task-facerecognition_run-01_meg.pdf

sub-01_ses-meg_task-facerecognition_run-04_meg.pdf

Cheers,
Velu

@larsoner
Copy link
Copy Markdown
Member

larsoner commented May 8, 2023

Having thought about the problem and the results a bit, I'm +1 for including this. I think:

  1. The maintenance overhead will be low
  2. Mostly we have to think about how to handle multichannel data and the details of the actual API
  3. The multichannel use case (EEG, MEG-mag, MEG-grad) for example for the sample dataset should take care of itself in normal use cases, too, since neighbors should be found within a particular channel type automatically. So we might not even need scalings (though it might make sense to add it)
  4. We probably do want a picks parameter, though, which should default to using the good data channels. And we should just return a list of (additional) bad channels rather than modifying raw itself

@drammock WDYT?

@drammock
Copy link
Copy Markdown
Member

drammock commented May 8, 2023

I'm still in favor of including this in MNE. Other thoughts:

  • As far as returns, I'd say a list of bads and a scores dict (optional, if return_scores=True), just like find_bad_channels_maxwell. Agree on not modifying the Raw object. This means:
    • you probably don't need to make a copy of Raw anymore
    • the function name should be find_bad_channels_lof (instead of mark_...)
  • The new file should not be called mne/preprocessing/detect_bad_channels.py. I would suggest mne/preprocessing/_lof.py
  • There are several unaddressed comments from my review last october (re: docstring formatting / content)

@vpKumaravel Note that we've recently started using black for code formatting and ruff for code linting (instead of flake8). Both of these are set up to run on pre-commit hooks too, so you may need to update your environment to get all that working. See this recently changed section of our contributor guide.

@vpKumaravel
Copy link
Copy Markdown
Contributor Author

@drammock, could you please guide me how to fix the build error "ModuleNotFoundError: No module named 'sklearn'"?



import numpy as np
from sklearn.neighbors import LocalOutlierFactor
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vpKumaravel you need to import this inside the function. sklearn is only an dependency of mne.

@larsoner larsoner added this to the 1.5 milestone May 22, 2023
@larsoner larsoner modified the milestones: 1.6, 1.7 Nov 7, 2023
@drammock drammock removed the needs-discussion issues requiring a dev meeting discussion before the way forward is clear label Jan 19, 2024
@larsoner
Copy link
Copy Markdown
Member

Hey @vpKumaravel do you have time to come back to this? It would be nice to get this in!

@larsoner
Copy link
Copy Markdown
Member

@vpKumaravel there was a bit of cruft following the rebase, I pushed a commit to fix that and clean up / streamline some stuff. Along the way I added a picks option to the function. Can you see if the diff looks reasonable to you now?

@vpKumaravel
Copy link
Copy Markdown
Contributor Author

@vpKumaravel there was a bit of cruft following the rebase, I pushed a commit to fix that and clean up / streamline some stuff. Along the way I added a picks option to the function. Can you see if the diff looks reasonable to you now?

Thank you very much @larsoner. Yes, it all looks good to me. And thanks for helping me clear the PR checks! Hope the contribution is useful.

@larsoner
Copy link
Copy Markdown
Member

larsoner commented Mar 1, 2024

I think all of @drammock's concerns have been taken care of, so I'll merge main into this branch just to make sure everything is still okay then mark for merge-when-green -- thanks in advance @vpKumaravel !

Co-authored-by: Daniel McCloy <dan@mccloy.info>
@larsoner larsoner merged commit ff1cfdd into mne-tools:main Mar 1, 2024
@welcome
Copy link
Copy Markdown

welcome bot commented Mar 1, 2024

🎉 Congrats on merging your first pull request! 🥳 Looking forward to seeing more from you in the future! 💪

snwnde pushed a commit to snwnde/mne-python that referenced this pull request Mar 20, 2024
…11234)

Co-authored-by: Velu Prabhakar Kumaravel <veluprabhakarkumaravel@Velus-MBP.lan>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Larson <larson.eric.d@gmail.com>
Co-authored-by: Daniel McCloy <dan@mccloy.info>
@larsoner larsoner mentioned this pull request Jul 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants