MGH / MGZ formats: Always big-endian by Lestropie · Pull Request #956 · MRtrix3/mrtrix3

Lestropie · 2017-04-06T07:37:38Z

Creating .mgh / .mgz images with little-endian encoding causes faults in other softwares; therefore enforce big-endianness regardless of header datatype.

Closes #952.

Creating .mgh / .mgz images with little-endian encoding causes faults in other softwares; therefore enforce big-endianness regardless of header datatype.

Consistently read flip angle in radians and convert to degrees for header storage, and convert from degrees to radians when writing from header into MGHO.

See #952.

jdtournier · 2017-04-07T22:30:40Z

OK, had a look at this - not fun...

First off, the docs state clearly that it's supposed to be stored big-endian - I think we can safely skip all the endian-ness checks and just assume big-endian.

Otherwise, I had a go at understanding their storage conventions, and I think it's not too difficult. Obscure to figure out, but otherwise pretty easy. The relevant sources are their tags.h and tags.c files. But essentially, it boils down to:

a 32 bit int (signed as far as I can tell) with a tag 'ID' (3 corresponds to a command-line history entry, I guess, etc);
a 64 bit int (also signed as best I can figure out), to store the size of the entry (no idea why they'd need a 64 bit integer here, seems a bit OTT...);
the entry itself (which it looks like we can safely interpret as a string).
next tag starts at (entry contents offset) + size.

I've had a quick go at getting this working for reading, check out branch mgh_mgz_try_reading_other, it seems to work fine for the dataset shared by @ansk (for reading .mgh only, this is just to verify operation). What I've done so far is store them all in comments with a [TAG N]: prefix, but maybe they could be stored under their own keys. For the file we have, most tags are not listed in their tags.h, only the CMDLINE one. Maybe we could store that in the header under the key CMD or something. For the others, maybe a key name like MGH_TAG_N or similar would do the trick...?

As to supporting these entries for .mgz, I guess we could add a lead_out member variable to the ImageIO::GZ handler - there's already a lead_in to store whatever comes before the data (i.e. the header itself), so it should be trivial to add the equivalent lead_out variant and write it out if non-empty. What do you think?

ansk · 2017-04-08T21:30:06Z

Here is relevant up-to-date excerpt from tags.h containing definition of remaining tags:

#define TAG_OLD_COLORTABLE          1
#define TAG_OLD_USEREALRAS          2
#define TAG_CMDLINE                 3
#define TAG_USEREALRAS              4
#define TAG_COLORTABLE              5

#define TAG_GCAMORPH_GEOM           10
#define TAG_GCAMORPH_TYPE           11
#define TAG_GCAMORPH_LABELS         12

#define TAG_OLD_SURF_GEOM           20
#define TAG_SURF_GEOM               21

#define TAG_OLD_MGH_XFORM           30
#define TAG_MGH_XFORM               31
#define TAG_GROUP_AVG_SURFACE_AREA  32

#define TAG_AUTO_ALIGN              33

#define TAG_SCALAR_DOUBLE           40
#define TAG_PEDIR                   41
#define TAG_MRI_FRAME               42
#define TAG_FIELDSTRENGTH           43

PEDIR means phase encoding direction.

I have tested the branch mgh_mgz_try_reading_other with other .mgh files and it worked in all cases.

jdtournier · 2017-04-10T00:25:27Z

OK, had a shot at getting full read/write support working for tags with both .mgh and .mgz formats. I've opted for keeping things as comments, to avoid any potential conflicts with keys in the header. I've also decided to skip tags that contain no data, even though they might have a non-zero size according to the tag size field. Finally, since we read the content of the tag as a string, these enormous tags will be shortened considerably, so the output file will not have bitwise identical content compared to the original. Would appreciate thorough review and testing before this gets merged in...

ansk · 2017-04-10T10:23:46Z

I have tested this on several files and it worked OK: The tags have been correctly preserved and read by mri_info. The only think to come to my mind is whether it is optimal to preserve record of tag 3: Command line which produced the file. To preserve previous record could be misleading in this context. To achieve full-consistency with FreeSurfer behavior, this tag content should be maybe replaced by actual mrtrix command line which produced the file. Another option could be to preserve existing instances of tags 3 and append append additional instance of tag 3 with current command line. I have seen files with two instances of tag 3, not sure whether there is some limit of occurences of tag 3.

jdtournier · 2017-04-10T20:51:42Z

I have seen files with two instances of tag 3, not sure whether there is some limit of occurences of tag 3.

Well, the file you sent through has 3 such tags... 😉

To achieve full-consistency with FreeSurfer behavior, this tag content should be maybe replaced by actual mrtrix command line which produced the file.

We have discussed including such an option within MRtrix3 quite a few times in the past. Maybe now is the time... Part of the reason we haven't been too bothered about it so far is that the command history won't survive conversion to other formats, notably NIfTI. So passing data through e.g. dwipreproc, which invokes FSL commands with data converted to NIfTI, will lose that history. But it does look like we might be able to preserve it across MRtrix and FreeSurfer formats, so it's probably worth doing.

Note that if we do go along with it, it'll probably be as part of the next release, which shouldn't be too far now. In this case, we'd need to handle that MGH tag differently - not a problem, just needs to handled properly.

On the other hand, I'm not too sure how to handle that tag 41 TAG_PEDIR correctly. The example data you sent through just shows UNKNOWN, but we'd have to see what format the information might come in if present, and whether it's compatible with what @Lestropie has been up to on that front (#747). So for now, the safest thing is to keep them all as specially crafted comments, so that they can be pushed back to the output file without loss of information...

as suggested in #956.

ansk · 2017-04-11T09:24:17Z

not sure whether there is some limit of occurences of tag 3

Looking at the code, the maximum number of command line tags limits MAX_CMDS constant, which is set to 1000.

On the other hand, I'm not too sure how to handle that tag 41 TAG_PEDIR correctly. The example data you sent through just shows UNKNOWN, but we'd have to see what format the information might come in if present, and whether it's compatible with what @Lestropie has been up to on that front (#747)

TAG_PEDIR is not so informative as in #747 . As far as I can tell, the workflow of this tag content is following: During DICOM conversion the content of DICOM tag 0018,1312

http://dicomlookup.com/lookup.asp?sw=Tnumber&q=(0018,1312)

is copied into TAG_PEDIR. Therefore the possible values are only ROW, COL and UNKNOWN (the latter in case this info is not present or relevant).

During mri_convert -conform which is very early step of recon-all the TAG_PEDIR is set to UNKNOWN.

https://surfer.nmr.mgh.harvard.edu/fswiki/ReconAllDevTable

Lestropie · 2017-04-12T06:44:58Z

So passing data through e.g. dwipreproc, which invokes FSL commands with data converted to NIfTI, will lose that history.

Such cases could (should!) be dealt with explicitly.

Ideally each script would grab the command_history contents of the primary input file, append a string for the script invoked (i.e. not include all of the MRtrix3 commands invoked within the script, but the script invocation), and overwrite the command_history key of any output images with that. As it stands now, image outputs from these scripts would have a long command history corresponding to the internals of the Python script subsequent to the first .nii -> .mif conversion.

jdtournier · 2017-04-12T21:58:26Z

@Lestropie: are you happy to merge this as-is to master? The discussion about command-line history handling can carry on in #962, which is due to merge into tag_0.3.16, not master. We'd need to modify the handling of this MGH tag anyway if we want to support it transparently alongside our own history, and that should probably be reserved for the next release anyway...

thijsdhollander · 2017-04-13T02:12:39Z

Sounds good to me 👍 . Administrative announcement: @Lestropie will probably not respond until after the long Easter weekend. (he left camping with only pen and paper as a means to do science)

Lestropie · 2017-04-13T04:20:06Z

Man, writing TCP-IP packets using pen and paper is hard... :-P
(But yes, I'm radio silent until probably Tuesday night; y'all go on without me)

Don't have time to muck around with this extensively. If the addition is to properly read the MGHO "tags" field, and it's tested for both .mgh and .mgz, and it's just reading them into header key-value fields / writing to output image file lead-out as-is, that's fine by me. Only things I would suggest:

Double-check what's been committed to mgh_mgz_try_reading_other v.s. this branch.
Copy the #define table posted by @ansk and use that to define appropriate keys for the imported header entries, rather than just enumerating them?

- remove detection of byte order, now hard-coded to Big-endian as per documentation; - use sized types throughout in MGH header structures to ensure correct byte swapping, etc.; - convert tag IDs to text and vice-versa for known tags, and substitute with raw number if unknown; - general cleanup to maximise code reuse between .mgh and .mgz formats.

jdtournier · 2017-04-18T20:58:52Z

OK, had a shot at cleaning things up a bit more, should be good to go. Tags are now named according to the table supplied by @ansk, and convert back and forth OK as far as I can tell. The mgh_mgz_try_reading_other branch is gone, it got merged into this one quite early on anyway. Wouldn't mind another pair of eyes to make sure everyone does work as expected before merging.

After that, I guess we can start thinking about #962...

Note that now the contents of struct mgh_tag will appear as big-endian in all ciircumstances, and therefore conversion is necessary whenever reading or writing; this is to be consistent with the other mgh header structures.

Lestropie · 2017-04-19T08:03:12Z

Have had a look and a tweak, but it's still not ready to go. MRtrix3 will now happily read .mgh and .mgz created by either FreeSurfer or itself, but FreeSurfer will still not open files created by MRtrix3. If I had to take a guess, I'd say that all that empty null space is doing something. Regardless, the FreeSurfer source will need a closer look.

jdtournier · 2017-04-19T08:32:49Z

OK, I see you didn't like my prepare_tag() idea... Oh well.

FreeSurfer will still not open files created by MRtrix3

That's odd, I thought this had previously been confirmed as working by @ansk? Maybe the latest commits have mangled things a bit? Does it not work with e19dabe?

Lestropie · 2017-04-19T08:45:01Z

I got the idea of prepare_tag(), but in the process of trying to figure out what was going wrong, it disappeared. Partly because in mgh_header and mgh_other, the stored values always remain as big-endian and have to be read/written accordingly. It looks like you tried to do that, but then there were missing ByteOrder::BE() calls around the place, so I decided to remove that function and validate every read / write functionality individually.

Yes, it was confirmed working at one point; but there were changes subsequent to that. I've also been looking at null terminations (or lack thereof) wondering if that's causing issues. Also not sure if all combinations were tested, i.e. Freesurfer export to MRtrix import, MRtrix export to FreeSurfer import, MRtrix export to MRtrix import, zipped v.s. unzipped.

jdtournier · 2017-04-19T09:00:56Z

lib/formats/mgh.cpp

      for (const auto& tag : MGHO.tags) {
        out.write (reinterpret_cast<const char*> (&tag.id), sizeof(tag.id));
        out.write (reinterpret_cast<const char*> (&tag.size), sizeof(tag.size));
+        assert (size_t(ByteOrder::BE (tag.size)) == tag.content.size());


That assert will fail routinely, because of the null termination on the tag.content string. For example, in the data sent through by @ansk:

... mrinfo [lib/formats/mgh.cpp: 63]: tag.id = 33 mrinfo [lib/formats/mgh.cpp: 64]: tag.size = 1600 mrinfo [lib/formats/mgh.cpp: 65]: tag.content.size() = 185 mrinfo [lib/formats/mgh.cpp: 66]: tag.content = AutoAlign 1.000000 0.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 1.000000 ...

Not sure whether this has anything to do with FreeSurfer not accepting these data, but that's how it worked before when it was confirmed as working. I'm guessing it'll be something else...

That specific case is a write, so it's only testing that MRtrix has set the tag.size value appropriately. Nothing to do with input data.

I'm pretty sure that FreeSurfer does not include null terminators (at least not always) in the length calculation: the phase encode direction tag gave a size of 7 for the string UNKNOWN.

Ah, OK... please ignore me.

jdtournier · 2017-04-19T10:40:33Z

I got the idea of prepare_tag(), but in the process of trying to figure out what was going wrong, it disappeared. Partly because in mgh_header and mgh_other, the stored values always remain as big-endian and have to be read/written accordingly. It looks like you tried to do that, but then there were missing ByteOrder::BE() calls around the place, so I decided to remove that function and validate every read / write functionality individually.

I figured it might have been something like that...

My only concern with your changes is that it looks like you're keeping big-endian versions of the tags once read in RAM. I think it would be much cleaner to convert at the read or write time, and maintain native-endian versions in RAM at all times otherwise. I'll have a go at that.

Yes, it was confirmed working at one point; but there were changes subsequent to that. I've also been looking at null terminations (or lack thereof) wondering if that's causing issues. Also not sure if all combinations were tested, i.e. Freesurfer export to MRtrix import, MRtrix export to FreeSurfer import, MRtrix export to MRtrix import, zipped v.s. unzipped.

Pretty sure the testing wouldn't have been extensive, at least not on my side - I don't have FreeSurfer installed... It will need some more thorough testing before merging.

Lestropie · 2017-04-19T10:44:03Z

I think it would be much cleaner to convert at the read or write time, and maintain native-endian versions in RAM at all times otherwise.

Trouble then is that you have these tags being stored in RAM with native endianness, but the header & other structs being represented as big-endian at all times. Was just another source of confusion when I was looking through it all.

jdtournier · 2017-04-19T10:48:42Z

Trouble then is that you have these tags being stored in RAM with native endianness, but the header & other structs being represented as big-endian at all times.

OK, I see. It might be that we'd be better off not using memory-mapping for the read then. Would also avoid this issue with alignment and the #pragma pack directive...? Would be a bit more work, though...

jdtournier · 2017-04-21T12:24:19Z

Canned in favour of #970

Lestropie · 2017-05-10T04:20:09Z

@ansk: Where did you get that up-to-date tags.h? I've checked downloads of FreeSurfer 5.3 and 6.0 beta, and neither seem to come with the relevant source code. I'm pretty sure that the source files linked to on the MGH format page are old; they contain less #define's than what you provided, and I can see some potential bugs in the image file read/write functions that may have since been fixed. If I could get my hands on fully up-to-date source code, I might be able to mimic their functions to provide more reliable support in MRtrix3...

ansk · 2017-05-10T05:52:02Z

@Lestropie,
I requested access to their git repository. But recently it is much easier since FreeSurfer sources are now also on github: https://github.com/freesurfer/freesurfer
So tags.h for development version of FreeSurfer is:
https://github.com/freesurfer/freesurfer/blob/dev/include/tags.h

Lestropie · 2017-05-10T23:37:30Z

OK thanks @ansk; that looks more up-to-date than a forked version of the repository I found after sending that message.

For reference for future me: Implementation code is found in /utils/ rather than /include/.

Lestropie · 2017-05-30T23:44:16Z

@ansk (and anybody else interested): I've mimiced much of the FreeSurfer code in MRtrix3 to further improve the handling of MGH / MGZ files in #970. If anybody has image files that have consistently caused problems in MRtrix3, it would be appreciated to test them against the mgh_mgz_write_fix_take2 code branch before those changes get merged to master. Though it certainly worked with the example problematic files I've been provided with.

Lestropie added 2 commits April 6, 2017 17:36

MGH / MGZ formats: Always big-endian

70fde73

Creating .mgh / .mgz images with little-endian encoding causes faults in other softwares; therefore enforce big-endianness regardless of header datatype.

MGH / MGZ formats: Fix setting endianness on image write

d6205e7

Lestropie mentioned this pull request Apr 6, 2017

.mgz / .mgh files created by mrtrix tools are not correctly read by FreeSurfer's tools #952

Closed

Lestropie added 2 commits April 7, 2017 11:41

MGH / MGZ: Tweaks

5aaaa23

Consistently read flip angle in radians and convert to degrees for header storage, and convert from degrees to radians when writing from header into MGHO.

MGH / MGZ: Robust read of metadata, omit write of metadata

b6a4d1d

See #952.

Lestropie requested a review from jdtournier April 7, 2017 02:30

MGH: try to read other section correctly

a3af495

jdtournier added 3 commits April 10, 2017 02:05

MGH format: read tags properly for compressed images

f9059da

ImageIO::GZ: add support for lead out

86b9a14

MGH: changes to support read/write for tags

e19dabe

jdtournier added a commit that referenced this pull request Apr 10, 2017

add current command-line to 'command_history' header entry

bc31b55

as suggested in #956.

jdtournier mentioned this pull request Apr 10, 2017

add current command-line to 'command_history' header entry #962

Merged

Merge branch 'master' into mgh_mgz_write_fix

da7adac

jdtournier added 3 commits April 17, 2017 23:08

MGH format: add functions to convert to/from tag IDs to names

a3f3a7e

Merge remote-tracking branch 'github/master' into mgh_mgz_write_fix

eefc9bf

MGH / MGZ formats: Further tweaking

43431db

Note that now the contents of struct mgh_tag will appear as big-endian in all ciircumstances, and therefore conversion is necessary whenever reading or writing; this is to be consistent with the other mgh header structures.

jdtournier reviewed Apr 19, 2017

View reviewed changes

This was referenced Apr 19, 2017

MGH / MGZ: alternative implementation using normal file-based IO #970

Merged

Tag 0.3.16 #785

Merged

jdtournier closed this Apr 21, 2017

Lestropie deleted the mgh_mgz_write_fix branch June 5, 2017 02:36

Conversation

Lestropie commented Apr 6, 2017

Uh oh!

jdtournier commented Apr 7, 2017

Uh oh!

ansk commented Apr 8, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jdtournier commented Apr 10, 2017

Uh oh!

ansk commented Apr 10, 2017

Uh oh!

jdtournier commented Apr 10, 2017

Uh oh!

ansk commented Apr 11, 2017

Uh oh!

Lestropie commented Apr 12, 2017

Uh oh!

jdtournier commented Apr 12, 2017

Uh oh!

thijsdhollander commented Apr 13, 2017

Uh oh!

Lestropie commented Apr 13, 2017

Uh oh!

jdtournier commented Apr 18, 2017

Uh oh!

Lestropie commented Apr 19, 2017

Uh oh!

jdtournier commented Apr 19, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Lestropie commented Apr 19, 2017

Uh oh!

jdtournier Apr 19, 2017

Choose a reason for hiding this comment

Uh oh!

Lestropie Apr 19, 2017

Choose a reason for hiding this comment

Uh oh!

jdtournier Apr 19, 2017

Choose a reason for hiding this comment

Uh oh!

jdtournier commented Apr 19, 2017

Uh oh!

Lestropie commented Apr 19, 2017

Uh oh!

jdtournier commented Apr 19, 2017

Uh oh!

jdtournier commented Apr 21, 2017

Uh oh!

Lestropie commented May 10, 2017

Uh oh!

ansk commented May 10, 2017

Uh oh!

Lestropie commented May 10, 2017

Uh oh!

Lestropie commented May 30, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ansk commented Apr 8, 2017 •

edited

Loading

jdtournier commented Apr 19, 2017 •

edited

Loading