Skip to content

WIP fix brainvision event parser#6276

Closed
jona-sassenhagen wants to merge 11 commits intomne-tools:masterfrom
jona-sassenhagen:bv_event_parser
Closed

WIP fix brainvision event parser#6276
jona-sassenhagen wants to merge 11 commits intomne-tools:masterfrom
jona-sassenhagen:bv_event_parser

Conversation

@jona-sassenhagen
Copy link
Copy Markdown
Contributor

Closes #6267

@cbrnr
Copy link
Copy Markdown
Contributor

cbrnr commented May 7, 2019

You have an undefined value vhdr_fname and some PEP8 style errors. If these are fixed, +1 for merge.

@codecov
Copy link
Copy Markdown

codecov bot commented May 7, 2019

Codecov Report

Merging #6276 into master will increase coverage by 0.06%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #6276      +/-   ##
==========================================
+ Coverage   89.17%   89.24%   +0.06%     
==========================================
  Files         416      417       +1     
  Lines       74954    75228     +274     
  Branches    12353    12381      +28     
==========================================
+ Hits        66843    67137     +294     
+ Misses       5227     5214      -13     
+ Partials     2884     2877       -7


assert _bv_parser("Stimulus/S 11") == 11
assert _bv_parser("Stimulus/S202") == 202
assert _bv_parser("Response/R 12", add_for_r=100) == 112
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this much. You have a parameter that is not really exposed.
Also please use a regexp as this:

int(re.split('(Stimulus|Response)/[RS]\s*', "Response/R 1")[-1])

I am not it has the full logic but it's much more robust and easier to understand what pattern to match

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I just remove add_for_r from the call signature?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise I changed it to this, is that what you mean:

def _bv_parser(description, add_for_r=1000):
    """Parse BrainVision event codes (like `Stimulus/S 11`) to ints."""
    try:
        *_, kind, code = re.split('(Stimulus|Response)/[RS]\s*', description)
        code = int(code)
        if kind == "Response":
            code += add_for_r
        return code
    except ValueError:  # probably not coercable to int, or no match at all
        warn("Marker '%s' could not be unambiguously mapped to an int and "
             "will be dropped." % description)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use rather re.match rather than a try-catch?
but yes this for me really cleaner.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@larsoner also see here

@larsoner larsoner added this to the 0.18 milestone May 13, 2019
@larsoner
Copy link
Copy Markdown
Member

@jona-sassenhagen FYI I've marked this for 0.18 which hopefully we can push out this week

return code
except ValueError: # probably not coercable to int
warn("Marker '%s' could not be unambiguously mapped to an int "
"and will be dropped." % description)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Events not matching the pattern should probably not be dropped but assigned arbitrary integers (just like we did before). This will fix the current error in the test.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we had that at some point and we reverted. (we can do it again)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have to, because we want to have all events, not just the ones matching the patterns.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How should we do that? It must be deterministic, mapping the same code to the same int every time. We could do it externally, or hash it somehow, or use a decorator?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. Maybe sort the rest alphabetically and assign numbers starting with 2000?

@agramfort
Copy link
Copy Markdown
Member

agramfort commented May 14, 2019 via email

annotations = raw.annotations

if event_id == "auto":
from .io.brainvision.brainvision import RawBrainVision
Copy link
Copy Markdown
Contributor

@massich massich May 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to go that route I would put all these things in a function and register the default behaviors there.

event_id = get_auto_event_id(raw.__class__()) if event_id == 'auto' else event_id

then remove None from the parameters and return the default behaviour if raw does not quack properly. (which is what is happening here: https://github.com/mne-tools/mne-python/blob/master/mne/annotations.py#L778)

it would be something like this:

class RawBrainVision(BaseRaw):

    @classmethod
    def auto_event_id(cls):
        return _bv_parse

...

def get_auto_event_id(kls):
    foo = kls.auto_event_id if 'auto_event_id' in kls.__dict__.keys() else _Counter()
    return foo

(plus we should check that the parameter event_id is only is a dict, a callable or a string strictly equal to 'auto')

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this can be refactored to make it nicer, but it's imperative it gets in at .18.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that getting this done for 0.18 should be our priority for now. @massich you could also add the classmethod to the base class and return a counter function there, then you don't even have to write that if clause.

pass

raw = read_raw_brainvision(vhdr_path)
events, event_id = events_from_annotations(raw)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need to parse the file to make sure that the parsing works.

for key, values in event_id.items():
assert int(key[-3:].lstrip()) == values

assert _bv_parser("Stimulus/S 11") == 11
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather like to see a battery of falling edge cases than the expected.

what would happen if I get something like 'stiMOulus/S', 'Stimulus\S', 'foobar/09876564578' ?

Copy link
Copy Markdown
Contributor Author

@jona-sassenhagen jona-sassenhagen May 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not how the Brainvision recorder works (... in principle one could set up pycorder to work like that I guess? But to do that one needs to be sufficiently python savy to quickly come up with a parser)

Again, I'm fine with suggestions for alternatives, but it must get in by .18.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, that we should make sure that the else branch of this line https://github.com/mne-tools/mne-python/pull/6276/files#diff-db95532aad4fae1b22f34cbf50ed7f19R845 does not generate unexpected behaviors.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know the recorder, these events will never be interesting for epoching; i.e., it's always NewSegment or something like that.

Right @cbrnr ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, these are not important for epoching, but we still need to catch them.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I'm +1 for adding failing test cases as @massich suggests. Like I said, I think we should return all events not just the ones relevant for epoching (users can and should filter out events relevant for them after we return everything, which will be easy because they can filter by value, i.e. between 0-255 and 1000-1255 if they only want stimulus and response markers).

Copy link
Copy Markdown
Member

@larsoner larsoner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Work with @agramfort and @massich to satisfy their desire for simplicity of design, but I'm on board with the (updated) description as the user-facing API at least

i.e., RawBrainVision - are supported; in this case, Stimulus events
will be stripped to their integer part, Response events will be mapped
to their integer part + 1000, and all other events dropped); otherwise
as None.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to be way too long / hard to parse. I'd do:

    event_id : dict | callable | None | 'auto'
        Can be:

        - **dict**: string keys mapping to and integer values as used in mne.Epochs
          to map annotation descriptions to integer event codes. Only the
          keys present will be mapped and the annotations with other descriptions
          will be ignored.
        - **callable**: takes a string input, and outputs an integer to use, or None
          to ignore.
        - **None**: All descriptions of annotations are mapped and assigned
          unique integer values starting at 1 based on the ``sorted`` annotation
          descriptions.
        - **'auto'**: use an appropriate parser if the raw instance corresponds to:

          - Brainvision annotations: stimulus events stripped to their integer part;
            Response events mapped to their integer part + 1000; all other events
            dropped.
          - All others: Behaves like None.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't drop other events please.

if event_type == "Response/R":
code += add_for_r
return code
except ValueError: # probably not coercable to int
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just for the record, this function has no return value. We should try to avoid multiple return points. (or make sure that all branches do return)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BrainVision annotation parsing is needlessly complicated

5 participants