Skip to content

MRG: Don't guess date when writing info#4990

Merged
agramfort merged 3 commits intomne-tools:masterfrom
jasmainak:dont_guess_date
Apr 13, 2018
Merged

MRG: Don't guess date when writing info#4990
agramfort merged 3 commits intomne-tools:masterfrom
jasmainak:dont_guess_date

Conversation

@jasmainak
Copy link
Copy Markdown
Member

closes #4949

@jasmainak jasmainak requested a review from teonbrooks March 6, 2018 15:50
mne/io/write.py Outdated
id_['version'] = FIFF.FIFFC_VERSION
id_['machid'] = get_machid()
id_['secs'], id_['usecs'] = _date_now()
id_['secs'], id_['usecs'] = np.array([0, 0], dtype='int32')
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you prefer to put 0 rather than omitting to write the tag?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC we agreed to not write the tag, set as None when reading if not present, and in functions that require a meas_date (like annotations) treat None as the same as [0, 0]

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this tag cannot be omitted. C.f., also the anonymization discussions we had with Matti.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is where we should change this. This will set meas_id to (0, 0) for all newly written files.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay I see. Let me see if I can come up with a better fix

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually I am not sure anymore of the right way to do it. We have to write an ID tag in the fif file and this has to contain 8 bytes for the date. @larsoner what do you think it the right approach?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC we currently "time stamp" the file creation time. How does this break BIDS / why do they forbid this?

If there is a good reason, maybe we should abandon this practice. Using (0, 0) effectively does this.

@agramfort
Copy link
Copy Markdown
Member

agramfort commented Apr 4, 2018 via email

@jasmainak
Copy link
Copy Markdown
Member Author

I'll try to hack on this tomorrow

@agramfort
Copy link
Copy Markdown
Member

agramfort commented Apr 9, 2018 via email

@larsoner
Copy link
Copy Markdown
Member

larsoner commented Apr 9, 2018

BIDS forces us to use dates that are in theory older than 1900 as date in now considered PHI

Okay then I'd say let's make an API change that we will always use some date (1900 exactly?) when writing files from now on. People will still need to raw.anonymize() files from their acquisition system or use mne anonymize which could go through and 1900-ize these tags. It's an API change but 1) I think the likelihood of the auto-populated meas_date being used is fairly low, and 2) it pushes people toward standardization / adoption.

Something like this?

from datetime import datetime
dt = datetime.strptime('1900/01/01', '%Y/%m/%d')
meas_date = [(dt - datetime(1970, 1, 1)).days * 86400, 0]  # no sec or μs  here
# can hard-code this as [-2208988800, 0]

meas_date[0] will be negative but I think it should write fine, we use write_int with it. I can make a PR if you want, we just need to choose a date. If BIDS says <=1900 let's use 1900, if they say < then maybe 1800? Do the BIDS people recommend a specific date, or just say < 1900? If they don't say anything, we should check what other packages are doing because consistency would be nice.

@larsoner larsoner added this to the 0.16 milestone Apr 9, 2018
@agramfort
Copy link
Copy Markdown
Member

agramfort commented Apr 9, 2018 via email

@larsoner
Copy link
Copy Markdown
Member

larsoner commented Apr 9, 2018

I don't think defaulting to 1900 is the least surprising behavior.

so thinking about it maybe the current PR is good as is...

Defaulting to [0, 0] is defaulting to Jan 1 1970. Why is this a better default than the latest one that would be BIDS compatible...?

@larsoner
Copy link
Copy Markdown
Member

larsoner commented Apr 9, 2018

I guess we could add an anonymize(date=0. | '1900/01/01') and default to the latter (I would argue) or something. If float assume UTC time code, if str assume yyyy/mm/dd format.

@jasmainak
Copy link
Copy Markdown
Member Author

I am not sure we are talking about the same thing any more :-) The problem this PR was meant to solve was the following. If raw.info['meas_date'] = None and you do raw.save(fname), when you read back this file raw.info['meas_date'] is not None anymore. That is to say, the state of the object changes when you do I/O.

Anonymizing the files is of course important, and BIDS recommends the following:

Dates can be shifted by a random number of days for privacy protection reasons. To distinguish real dates from shifted dates always use year 1900 or earlier when including shifted years. For longitudinal studies please remember to shift dates within one subject by the same number of days to maintain the interval information. Example: ​ 1867-06-15T13:45:30

I think the crucial information here is that you often want to maintain interval information in studies, but using a fixed date will not work in that case.

@larsoner
Copy link
Copy Markdown
Member

Okay I see what you mean.

I still don't like using [0, 0] because it is an actual date. Especially when combined with this BIDS offset suggestion, you could land on it while doing a random date offset (esp. for formats that don't include microseconds). Maybe the possibility is vanishingly small?

However, the date is stored as two signed integers, secs for seconds and usecs for microseconds. This means that a meas_date like [0, int(1e10)] is something that would never be written out in practice when splitting by date, because the largest microsecond value you should store is int(1e6) - 1, otherwise it should be added to the secs entry. Thus:

  1. We should safely be able to encode date=None as the magic pair [0, 2 ** 31 - 1].
  2. We could also allow negative usecs values to be re-interpreted as "negative years". If we do this, we can add an arbitrary negative offset beyond the 24855-day limit imposed by 24855 == (2 ** 31 - 1) // 3600 // 24 (currently in MNE-C).

I don't think that either of these would break existing software that does something like time = sec + usec / 1e6, either.

@jasmainak
Copy link
Copy Markdown
Member Author

okay I see what you mean. That makes sense.

What do you mean by negative year? Is the reference year 1970 relative to which it is negative?

@larsoner
Copy link
Copy Markdown
Member

Yep, so to get to < 1900 we need at least -70 years. The sec field only permits a minimum of -24855 days which is -68 years.

@agramfort
Copy link
Copy Markdown
Member

so what do we do here?

@larsoner
Copy link
Copy Markdown
Member

My proposal is points 1 and 2 above. Pick [0, 2 ** 31 - 1] as the "magic pair" instead of [0, 0] is step 1.

Step 2 is more drastic, use negative usecs to mean "negative years" so we can get arbitrary dates backward.

@agramfort
Copy link
Copy Markdown
Member

agramfort commented Apr 11, 2018 via email

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 12, 2018

Codecov Report

Merging #4990 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #4990      +/-   ##
==========================================
+ Coverage   88.14%   88.14%   +<.01%     
==========================================
  Files         356      356              
  Lines       65762    65774      +12     
  Branches    11192    11193       +1     
==========================================
+ Hits        57966    57978      +12     
  Misses       4959     4959              
  Partials     2837     2837

@larsoner larsoner changed the title Don't guess date when writing info MRG: Don't guess date when writing info Apr 12, 2018
Copy link
Copy Markdown
Member

@larsoner larsoner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... other changes to tests are cosmetic for pytest (figured might as well get it done for this file).

The user-facing changes should be summarized in the whats_new.rst. Ready for review/merge from my end.

with open(temp_file_2, 'rb') as fid:
m2.update(fid.read())
m2 = m2.hexdigest()
assert m1 == m2
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes are new (should fail on master because we used date_now when writing, so this is lost information but arguably a good choice?)

assert_true(raw.info['file_id']['secs'] != orig_file_id)
assert_true(raw.info['meas_id']['secs'] != orig_meas_id)
assert raw.info['file_id']['secs'] != orig_file_id
assert raw.info['meas_id']['secs'] != orig_meas_id
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes to this test are relevant


- Channels with unknown locations are now assigned position ``[np.nan, np.nan, np.nan]`` instead of ``[0., 0., 0.]``, by `Eric Larson`_

- Unknown measurement dates are now stored as ``info['meas_date'] = None`` rather than ``[0, 0]``. ``None`` is also now used when anonymizing data and when setting the machine ID for writing files, by `Mainak Jas`_ and `Eric Larson`_
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think earlier it was just using the current date not [0, 0] ...

# to treat as meas_date=None. This one should be impossible for systems
# to write -- the second field is microseconds, so anything >= 1e6
# should be moved into the first field (seconds).
DATE_NONE = (0, 2 ** 31 - 1)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe overkill but do you think it makes sense to add a warning or error when reading a file if the microsecond field is >= 1e6 and not 2 ** 31 - 1 ?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably overkill, if it happens nothing will really break anyway

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that's true ... :)

@jasmainak
Copy link
Copy Markdown
Member Author

I'm missing a test which checks the following:

raw.info['meas_date'] = None
raw.save(fname)
raw2 = mne.io.read_raw_fif(fname)
assert raw2.info['meas_date'] == None

as this is how I discovered the bug. I'm not sure if the current tests cover this case. Otherwise LGTM

@jasmainak
Copy link
Copy Markdown
Member Author

Thanks @larsoner for taking over !

@larsoner
Copy link
Copy Markdown
Member

@jasmainak
Copy link
Copy Markdown
Member Author

okay that one works

@jasmainak
Copy link
Copy Markdown
Member Author

okay changes LGTM, at least for the meas_date part

@agramfort agramfort merged commit d77cc95 into mne-tools:master Apr 13, 2018
@bloyl bloyl mentioned this pull request Jun 4, 2018
britta-wstnr pushed a commit to britta-wstnr/mne-python that referenced this pull request Jul 5, 2018
* Don't guess date when writing info

* FIX: Set magic date

* FIX: Text
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

undead vampire meas date

3 participants