MRG: Don't guess date when writing info#4990
Conversation
mne/io/write.py
Outdated
| id_['version'] = FIFF.FIFFC_VERSION | ||
| id_['machid'] = get_machid() | ||
| id_['secs'], id_['usecs'] = _date_now() | ||
| id_['secs'], id_['usecs'] = np.array([0, 0], dtype='int32') |
There was a problem hiding this comment.
you prefer to put 0 rather than omitting to write the tag?
There was a problem hiding this comment.
IIRC we agreed to not write the tag, set as None when reading if not present, and in functions that require a meas_date (like annotations) treat None as the same as [0, 0]
There was a problem hiding this comment.
I think this tag cannot be omitted. C.f., also the anonymization discussions we had with Matti.
There was a problem hiding this comment.
I don't think this is where we should change this. This will set meas_id to (0, 0) for all newly written files.
There was a problem hiding this comment.
okay I see. Let me see if I can come up with a better fix
There was a problem hiding this comment.
actually I am not sure anymore of the right way to do it. We have to write an ID tag in the fif file and this has to contain 8 bytes for the date. @larsoner what do you think it the right approach?
There was a problem hiding this comment.
IIUC we currently "time stamp" the file creation time. How does this break BIDS / why do they forbid this?
If there is a good reason, maybe we should abandon this practice. Using (0, 0) effectively does this.
|
@jasmainak any time to get back to this?
|
|
I'll try to hack on this tomorrow |
|
BIDS forces us to use dates that are in theory older than 1900 as date in
now considered PHI
that's also what motivated the issue related to mne_anonymize. See latests
commit in mne-c on mne_anonymize
|
Okay then I'd say let's make an API change that we will always use some date (1900 exactly?) when writing files from now on. People will still need to Something like this?
|
|
as people will need to run anonymize I think using [0, 0] as done here and
have bids validator require people to run anonymize.
I don't think defaulting to 1900 is the least surprising behavior.
so thinking about it maybe the current PR is good as is...
|
Defaulting to |
|
I guess we could add an |
|
I am not sure we are talking about the same thing any more :-) The problem this PR was meant to solve was the following. If Anonymizing the files is of course important, and BIDS recommends the following:
I think the crucial information here is that you often want to maintain interval information in studies, but using a fixed date will not work in that case. |
|
Okay I see what you mean. I still don't like using However, the date is stored as two signed integers,
I don't think that either of these would break existing software that does something like |
|
okay I see what you mean. That makes sense. What do you mean by negative year? Is the reference year 1970 relative to which it is negative? |
|
Yep, so to get to < 1900 we need at least -70 years. The |
|
so what do we do here? |
|
My proposal is points 1 and 2 above. Pick Step 2 is more drastic, use negative |
|
go for 1 then
|
413967d to
a201cdb
Compare
Codecov Report
@@ Coverage Diff @@
## master #4990 +/- ##
==========================================
+ Coverage 88.14% 88.14% +<.01%
==========================================
Files 356 356
Lines 65762 65774 +12
Branches 11192 11193 +1
==========================================
+ Hits 57966 57978 +12
Misses 4959 4959
Partials 2837 2837 |
a201cdb to
48e3f01
Compare
48e3f01 to
39af725
Compare
larsoner
left a comment
There was a problem hiding this comment.
... other changes to tests are cosmetic for pytest (figured might as well get it done for this file).
The user-facing changes should be summarized in the whats_new.rst. Ready for review/merge from my end.
| with open(temp_file_2, 'rb') as fid: | ||
| m2.update(fid.read()) | ||
| m2 = m2.hexdigest() | ||
| assert m1 == m2 |
There was a problem hiding this comment.
These changes are new (should fail on master because we used date_now when writing, so this is lost information but arguably a good choice?)
| assert_true(raw.info['file_id']['secs'] != orig_file_id) | ||
| assert_true(raw.info['meas_id']['secs'] != orig_meas_id) | ||
| assert raw.info['file_id']['secs'] != orig_file_id | ||
| assert raw.info['meas_id']['secs'] != orig_meas_id |
There was a problem hiding this comment.
Changes to this test are relevant
doc/whats_new.rst
Outdated
|
|
||
| - Channels with unknown locations are now assigned position ``[np.nan, np.nan, np.nan]`` instead of ``[0., 0., 0.]``, by `Eric Larson`_ | ||
|
|
||
| - Unknown measurement dates are now stored as ``info['meas_date'] = None`` rather than ``[0, 0]``. ``None`` is also now used when anonymizing data and when setting the machine ID for writing files, by `Mainak Jas`_ and `Eric Larson`_ |
There was a problem hiding this comment.
I think earlier it was just using the current date not [0, 0] ...
| # to treat as meas_date=None. This one should be impossible for systems | ||
| # to write -- the second field is microseconds, so anything >= 1e6 | ||
| # should be moved into the first field (seconds). | ||
| DATE_NONE = (0, 2 ** 31 - 1) |
There was a problem hiding this comment.
maybe overkill but do you think it makes sense to add a warning or error when reading a file if the microsecond field is >= 1e6 and not 2 ** 31 - 1 ?
There was a problem hiding this comment.
Probably overkill, if it happens nothing will really break anyway
There was a problem hiding this comment.
yes, that's true ... :)
|
I'm missing a test which checks the following: raw.info['meas_date'] = None
raw.save(fname)
raw2 = mne.io.read_raw_fif(fname)
assert raw2.info['meas_date'] == Noneas this is how I discovered the bug. I'm not sure if the current tests cover this case. Otherwise LGTM |
|
Thanks @larsoner for taking over ! |
|
okay that one works |
|
okay changes LGTM, at least for the |
* Don't guess date when writing info * FIX: Set magic date * FIX: Text
closes #4949