MRG: Don't guess date when writing info by jasmainak · Pull Request #4990 · mne-tools/mne-python

jasmainak · 2018-03-06T15:50:10Z

agramfort · 2018-03-06T16:07:15Z

mne/io/write.py

    id_['version'] = FIFF.FIFFC_VERSION
    id_['machid'] = get_machid()
-    id_['secs'], id_['usecs'] = _date_now()
+    id_['secs'], id_['usecs'] = np.array([0, 0], dtype='int32')


you prefer to put 0 rather than omitting to write the tag?

IIRC we agreed to not write the tag, set as None when reading if not present, and in functions that require a meas_date (like annotations) treat None as the same as [0, 0]

I think this tag cannot be omitted. C.f., also the anonymization discussions we had with Matti.

I don't think this is where we should change this. This will set meas_id to (0, 0) for all newly written files.

okay I see. Let me see if I can come up with a better fix

actually I am not sure anymore of the right way to do it. We have to write an ID tag in the fif file and this has to contain 8 bytes for the date. @larsoner what do you think it the right approach?

IIUC we currently "time stamp" the file creation time. How does this break BIDS / why do they forbid this?

If there is a good reason, maybe we should abandon this practice. Using (0, 0) effectively does this.

agramfort · 2018-04-04T14:23:19Z

@jasmainak any time to get back to this?

jasmainak · 2018-04-04T21:10:18Z

I'll try to hack on this tomorrow

agramfort · 2018-04-09T13:51:12Z

BIDS forces us to use dates that are in theory older than 1900 as date in now considered PHI that's also what motivated the issue related to mne_anonymize. See latests commit in mne-c on mne_anonymize

larsoner · 2018-04-09T14:27:33Z

BIDS forces us to use dates that are in theory older than 1900 as date in now considered PHI

Okay then I'd say let's make an API change that we will always use some date (1900 exactly?) when writing files from now on. People will still need to raw.anonymize() files from their acquisition system or use mne anonymize which could go through and 1900-ize these tags. It's an API change but 1) I think the likelihood of the auto-populated meas_date being used is fairly low, and 2) it pushes people toward standardization / adoption.

Something like this?

from datetime import datetime
dt = datetime.strptime('1900/01/01', '%Y/%m/%d')
meas_date = [(dt - datetime(1970, 1, 1)).days * 86400, 0]  # no sec or μs  here
# can hard-code this as [-2208988800, 0]

meas_date[0] will be negative but I think it should write fine, we use write_int with it. I can make a PR if you want, we just need to choose a date. If BIDS says <=1900 let's use 1900, if they say < then maybe 1800? Do the BIDS people recommend a specific date, or just say < 1900? If they don't say anything, we should check what other packages are doing because consistency would be nice.

agramfort · 2018-04-09T20:08:44Z

as people will need to run anonymize I think using [0, 0] as done here and have bids validator require people to run anonymize. I don't think defaulting to 1900 is the least surprising behavior. so thinking about it maybe the current PR is good as is...

larsoner · 2018-04-09T20:15:02Z

I don't think defaulting to 1900 is the least surprising behavior.

so thinking about it maybe the current PR is good as is...

Defaulting to [0, 0] is defaulting to Jan 1 1970. Why is this a better default than the latest one that would be BIDS compatible...?

larsoner · 2018-04-09T20:16:48Z

I guess we could add an anonymize(date=0. | '1900/01/01') and default to the latter (I would argue) or something. If float assume UTC time code, if str assume yyyy/mm/dd format.

jasmainak · 2018-04-09T21:49:42Z

I am not sure we are talking about the same thing any more :-) The problem this PR was meant to solve was the following. If raw.info['meas_date'] = None and you do raw.save(fname), when you read back this file raw.info['meas_date'] is not None anymore. That is to say, the state of the object changes when you do I/O.

Anonymizing the files is of course important, and BIDS recommends the following:

Dates can be shifted by a random number of days for privacy protection reasons. To distinguish real dates from shifted dates always use year 1900 or earlier when including shifted years. For longitudinal studies please remember to shift dates within one subject by the same number of days to maintain the interval information. Example: 1867-06-15T13:45:30

I think the crucial information here is that you often want to maintain interval information in studies, but using a fixed date will not work in that case.

larsoner · 2018-04-10T19:36:05Z

Okay I see what you mean.

I still don't like using [0, 0] because it is an actual date. Especially when combined with this BIDS offset suggestion, you could land on it while doing a random date offset (esp. for formats that don't include microseconds). Maybe the possibility is vanishingly small?

However, the date is stored as two signed integers, secs for seconds and usecs for microseconds. This means that a meas_date like [0, int(1e10)] is something that would never be written out in practice when splitting by date, because the largest microsecond value you should store is int(1e6) - 1, otherwise it should be added to the secs entry. Thus:

We should safely be able to encode date=None as the magic pair [0, 2 ** 31 - 1].
We could also allow negative usecs values to be re-interpreted as "negative years". If we do this, we can add an arbitrary negative offset beyond the 24855-day limit imposed by 24855 == (2 ** 31 - 1) // 3600 // 24 (currently in MNE-C).

I don't think that either of these would break existing software that does something like time = sec + usec / 1e6, either.

jasmainak · 2018-04-10T20:25:20Z

okay I see what you mean. That makes sense.

What do you mean by negative year? Is the reference year 1970 relative to which it is negative?

larsoner · 2018-04-10T20:27:00Z

Yep, so to get to < 1900 we need at least -70 years. The sec field only permits a minimum of -24855 days which is -68 years.

agramfort · 2018-04-11T18:17:58Z

so what do we do here?

larsoner · 2018-04-11T18:25:50Z

My proposal is points 1 and 2 above. Pick [0, 2 ** 31 - 1] as the "magic pair" instead of [0, 0] is step 1.

Step 2 is more drastic, use negative usecs to mean "negative years" so we can get arbitrary dates backward.

agramfort · 2018-04-11T18:51:58Z

go for 1 then

codecov · 2018-04-12T19:01:47Z

Codecov Report

Merging #4990 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #4990      +/-   ##
==========================================
+ Coverage   88.14%   88.14%   +<.01%     
==========================================
  Files         356      356              
  Lines       65762    65774      +12     
  Branches    11192    11193       +1     
==========================================
+ Hits        57966    57978      +12     
  Misses       4959     4959              
  Partials     2837     2837

larsoner

... other changes to tests are cosmetic for pytest (figured might as well get it done for this file).

The user-facing changes should be summarized in the whats_new.rst. Ready for review/merge from my end.

larsoner · 2018-04-12T19:08:58Z

mne/io/tests/test_meas_info.py

+    with open(temp_file_2, 'rb') as fid:
+        m2.update(fid.read())
+    m2 = m2.hexdigest()
+    assert m1 == m2


These changes are new (should fail on master because we used date_now when writing, so this is lost information but arguably a good choice?)

larsoner · 2018-04-12T19:09:20Z

mne/io/tests/test_meas_info.py

-    assert_true(raw.info['file_id']['secs'] != orig_file_id)
-    assert_true(raw.info['meas_id']['secs'] != orig_meas_id)
+    assert raw.info['file_id']['secs'] != orig_file_id
+    assert raw.info['meas_id']['secs'] != orig_meas_id


Changes to this test are relevant

jasmainak · 2018-04-13T00:35:21Z

doc/whats_new.rst


 - Channels with unknown locations are now assigned position ``[np.nan, np.nan, np.nan]`` instead of ``[0., 0., 0.]``, by `Eric Larson`_

+- Unknown measurement dates are now stored as ``info['meas_date'] = None`` rather than ``[0, 0]``. ``None`` is also now used when anonymizing data and when setting the machine ID for writing files, by `Mainak Jas`_ and `Eric Larson`_


I think earlier it was just using the current date not [0, 0] ...

jasmainak · 2018-04-13T00:37:39Z

mne/io/write.py

+# to treat as meas_date=None. This one should be impossible for systems
+# to write -- the second field is microseconds, so anything >= 1e6
+# should be moved into the first field (seconds).
+DATE_NONE = (0, 2 ** 31 - 1)


maybe overkill but do you think it makes sense to add a warning or error when reading a file if the microsecond field is >= 1e6 and not 2 ** 31 - 1 ?

Probably overkill, if it happens nothing will really break anyway

yes, that's true ... :)

jasmainak · 2018-04-13T01:09:08Z

I'm missing a test which checks the following:

raw.info['meas_date'] = None
raw.save(fname)
raw2 = mne.io.read_raw_fif(fname)
assert raw2.info['meas_date'] == None

as this is how I discovered the bug. I'm not sure if the current tests cover this case. Otherwise LGTM

jasmainak · 2018-04-13T01:09:47Z

Thanks @larsoner for taking over !

larsoner · 2018-04-13T01:11:51Z

You mean this?

https://github.com/mne-tools/mne-python/pull/4990/files#diff-8c219726d300a83a91c5e38cdb8fe21aR419

jasmainak · 2018-04-13T01:21:25Z

okay that one works

jasmainak · 2018-04-13T01:24:53Z

okay changes LGTM, at least for the meas_date part

* Don't guess date when writing info * FIX: Set magic date * FIX: Text

jasmainak requested a review from teonbrooks March 6, 2018 15:50

agramfort reviewed Mar 6, 2018

View reviewed changes

larsoner added this to the 0.16 milestone Apr 9, 2018

Don't guess date when writing info

81a337b

larsoner force-pushed the dont_guess_date branch from 413967d to a201cdb Compare April 12, 2018 19:01

larsoner force-pushed the dont_guess_date branch from a201cdb to 48e3f01 Compare April 12, 2018 19:07

larsoner changed the title ~~Don't guess date when writing info~~ MRG: Don't guess date when writing info Apr 12, 2018

FIX: Set magic date

39af725

larsoner force-pushed the dont_guess_date branch from 48e3f01 to 39af725 Compare April 12, 2018 19:08

larsoner reviewed Apr 12, 2018

View reviewed changes

jasmainak commented Apr 13, 2018

View reviewed changes

FIX: Text

9fd3e34

agramfort merged commit d77cc95 into mne-tools:master Apr 13, 2018

bloyl mentioned this pull request Jun 4, 2018

CTF meas_id #5258

Closed

britta-wstnr pushed a commit to britta-wstnr/mne-python that referenced this pull request Jul 5, 2018

MRG: Don't guess date when writing info (mne-tools#4990)

0195397

* Don't guess date when writing info * FIX: Set magic date * FIX: Text

jasmainak mentioned this pull request Nov 17, 2019

meas_date is guessed during I/O when it is None #7067

Closed


		- Channels with unknown locations are now assigned position ``[np.nan, np.nan, np.nan]`` instead of ``[0., 0., 0.]``, by `Eric Larson`_

		- Unknown measurement dates are now stored as ``info['meas_date'] = None`` rather than ``[0, 0]``. ``None`` is also now used when anonymizing data and when setting the machine ID for writing files, by `Mainak Jas`_ and `Eric Larson`_

Uh oh!

Conversation

jasmainak commented Mar 6, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

agramfort commented Apr 4, 2018 via email

Uh oh!

jasmainak commented Apr 4, 2018

Uh oh!

agramfort commented Apr 9, 2018 via email

Uh oh!

larsoner commented Apr 9, 2018

Uh oh!

agramfort commented Apr 9, 2018 via email

Uh oh!

larsoner commented Apr 9, 2018

Uh oh!

larsoner commented Apr 9, 2018

Uh oh!

jasmainak commented Apr 9, 2018

Uh oh!

larsoner commented Apr 10, 2018

Uh oh!

jasmainak commented Apr 10, 2018

Uh oh!

larsoner commented Apr 10, 2018

Uh oh!

agramfort commented Apr 11, 2018

Uh oh!

larsoner commented Apr 11, 2018

Uh oh!

agramfort commented Apr 11, 2018 via email

Uh oh!

codecov bot commented Apr 12, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

larsoner left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jasmainak commented Apr 13, 2018

Uh oh!

jasmainak commented Apr 13, 2018

Uh oh!

larsoner commented Apr 13, 2018

Uh oh!

jasmainak commented Apr 13, 2018

Uh oh!

jasmainak commented Apr 13, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

codecov bot commented Apr 12, 2018 •

edited

Loading