Skip to content

MGH / MGZ formats: Always big-endian#956

Closed
Lestropie wants to merge 13 commits intomasterfrom
mgh_mgz_write_fix
Closed

MGH / MGZ formats: Always big-endian#956
Lestropie wants to merge 13 commits intomasterfrom
mgh_mgz_write_fix

Conversation

@Lestropie
Copy link
Copy Markdown
Member

Creating .mgh / .mgz images with little-endian encoding causes faults in other softwares; therefore enforce big-endianness regardless of header datatype.

Closes #952.

Creating .mgh / .mgz images with little-endian encoding causes faults in other softwares; therefore enforce big-endianness regardless of header datatype.
Consistently read flip angle in radians and convert to degrees for header storage, and convert from degrees to radians when writing from header into MGHO.
@Lestropie Lestropie requested a review from jdtournier April 7, 2017 02:30
@jdtournier
Copy link
Copy Markdown
Member

OK, had a look at this - not fun...

First off, the docs state clearly that it's supposed to be stored big-endian - I think we can safely skip all the endian-ness checks and just assume big-endian.

Otherwise, I had a go at understanding their storage conventions, and I think it's not too difficult. Obscure to figure out, but otherwise pretty easy. The relevant sources are their tags.h and tags.c files. But essentially, it boils down to:

  • a 32 bit int (signed as far as I can tell) with a tag 'ID' (3 corresponds to a command-line history entry, I guess, etc);
  • a 64 bit int (also signed as best I can figure out), to store the size of the entry (no idea why they'd need a 64 bit integer here, seems a bit OTT...);
  • the entry itself (which it looks like we can safely interpret as a string).
  • next tag starts at (entry contents offset) + size.

I've had a quick go at getting this working for reading, check out branch mgh_mgz_try_reading_other, it seems to work fine for the dataset shared by @ansk (for reading .mgh only, this is just to verify operation). What I've done so far is store them all in comments with a [TAG N]: prefix, but maybe they could be stored under their own keys. For the file we have, most tags are not listed in their tags.h, only the CMDLINE one. Maybe we could store that in the header under the key CMD or something. For the others, maybe a key name like MGH_TAG_N or similar would do the trick...?

As to supporting these entries for .mgz, I guess we could add a lead_out member variable to the ImageIO::GZ handler - there's already a lead_in to store whatever comes before the data (i.e. the header itself), so it should be trivial to add the equivalent lead_out variant and write it out if non-empty. What do you think?

@ansk
Copy link
Copy Markdown

ansk commented Apr 8, 2017

Here is relevant up-to-date excerpt from tags.h containing definition of remaining tags:

#define TAG_OLD_COLORTABLE          1
#define TAG_OLD_USEREALRAS          2
#define TAG_CMDLINE                 3
#define TAG_USEREALRAS              4
#define TAG_COLORTABLE              5

#define TAG_GCAMORPH_GEOM           10
#define TAG_GCAMORPH_TYPE           11
#define TAG_GCAMORPH_LABELS         12

#define TAG_OLD_SURF_GEOM           20
#define TAG_SURF_GEOM               21

#define TAG_OLD_MGH_XFORM           30
#define TAG_MGH_XFORM               31
#define TAG_GROUP_AVG_SURFACE_AREA  32

#define TAG_AUTO_ALIGN              33

#define TAG_SCALAR_DOUBLE           40
#define TAG_PEDIR                   41
#define TAG_MRI_FRAME               42
#define TAG_FIELDSTRENGTH           43

PEDIR means phase encoding direction.

I have tested the branch mgh_mgz_try_reading_other with other .mgh files and it worked in all cases.

@jdtournier
Copy link
Copy Markdown
Member

OK, had a shot at getting full read/write support working for tags with both .mgh and .mgz formats. I've opted for keeping things as comments, to avoid any potential conflicts with keys in the header. I've also decided to skip tags that contain no data, even though they might have a non-zero size according to the tag size field. Finally, since we read the content of the tag as a string, these enormous tags will be shortened considerably, so the output file will not have bitwise identical content compared to the original. Would appreciate thorough review and testing before this gets merged in...

@ansk
Copy link
Copy Markdown

ansk commented Apr 10, 2017

I have tested this on several files and it worked OK: The tags have been correctly preserved and read by mri_info. The only think to come to my mind is whether it is optimal to preserve record of tag 3: Command line which produced the file. To preserve previous record could be misleading in this context. To achieve full-consistency with FreeSurfer behavior, this tag content should be maybe replaced by actual mrtrix command line which produced the file. Another option could be to preserve existing instances of tags 3 and append append additional instance of tag 3 with current command line. I have seen files with two instances of tag 3, not sure whether there is some limit of occurences of tag 3.

@jdtournier
Copy link
Copy Markdown
Member

I have seen files with two instances of tag 3, not sure whether there is some limit of occurences of tag 3.

Well, the file you sent through has 3 such tags... 😉

To achieve full-consistency with FreeSurfer behavior, this tag content should be maybe replaced by actual mrtrix command line which produced the file.

We have discussed including such an option within MRtrix3 quite a few times in the past. Maybe now is the time... Part of the reason we haven't been too bothered about it so far is that the command history won't survive conversion to other formats, notably NIfTI. So passing data through e.g. dwipreproc, which invokes FSL commands with data converted to NIfTI, will lose that history. But it does look like we might be able to preserve it across MRtrix and FreeSurfer formats, so it's probably worth doing.

Note that if we do go along with it, it'll probably be as part of the next release, which shouldn't be too far now. In this case, we'd need to handle that MGH tag differently - not a problem, just needs to handled properly.

On the other hand, I'm not too sure how to handle that tag 41 TAG_PEDIR correctly. The example data you sent through just shows UNKNOWN, but we'd have to see what format the information might come in if present, and whether it's compatible with what @Lestropie has been up to on that front (#747). So for now, the safest thing is to keep them all as specially crafted comments, so that they can be pushed back to the output file without loss of information...

@ansk
Copy link
Copy Markdown

ansk commented Apr 11, 2017

not sure whether there is some limit of occurences of tag 3

Looking at the code, the maximum number of command line tags limits MAX_CMDS constant, which is set to 1000.

On the other hand, I'm not too sure how to handle that tag 41 TAG_PEDIR correctly. The example data you sent through just shows UNKNOWN, but we'd have to see what format the information might come in if present, and whether it's compatible with what @Lestropie has been up to on that front (#747)

TAG_PEDIR is not so informative as in #747 . As far as I can tell, the workflow of this tag content is following: During DICOM conversion the content of DICOM tag 0018,1312

http://dicomlookup.com/lookup.asp?sw=Tnumber&q=(0018,1312)

is copied into TAG_PEDIR. Therefore the possible values are only ROW, COL and UNKNOWN (the latter in case this info is not present or relevant).

During mri_convert -conform which is very early step of recon-all the TAG_PEDIR is set to UNKNOWN.

https://surfer.nmr.mgh.harvard.edu/fswiki/ReconAllDevTable

@Lestropie
Copy link
Copy Markdown
Member Author

So passing data through e.g. dwipreproc, which invokes FSL commands with data converted to NIfTI, will lose that history.

Such cases could (should!) be dealt with explicitly.

Ideally each script would grab the command_history contents of the primary input file, append a string for the script invoked (i.e. not include all of the MRtrix3 commands invoked within the script, but the script invocation), and overwrite the command_history key of any output images with that. As it stands now, image outputs from these scripts would have a long command history corresponding to the internals of the Python script subsequent to the first .nii -> .mif conversion.

@jdtournier
Copy link
Copy Markdown
Member

@Lestropie: are you happy to merge this as-is to master? The discussion about command-line history handling can carry on in #962, which is due to merge into tag_0.3.16, not master. We'd need to modify the handling of this MGH tag anyway if we want to support it transparently alongside our own history, and that should probably be reserved for the next release anyway...

@thijsdhollander
Copy link
Copy Markdown
Contributor

Sounds good to me 👍 . Administrative announcement: @Lestropie will probably not respond until after the long Easter weekend. (he left camping with only pen and paper as a means to do science)

@Lestropie
Copy link
Copy Markdown
Member Author

Man, writing TCP-IP packets using pen and paper is hard... :-P
(But yes, I'm radio silent until probably Tuesday night; y'all go on without me)

Don't have time to muck around with this extensively. If the addition is to properly read the MGHO "tags" field, and it's tested for both .mgh and .mgz, and it's just reading them into header key-value fields / writing to output image file lead-out as-is, that's fine by me. Only things I would suggest:

  • Double-check what's been committed to mgh_mgz_try_reading_other v.s. this branch.

  • Copy the #define table posted by @ansk and use that to define appropriate keys for the imported header entries, rather than just enumerating them?

- remove detection of byte order, now hard-coded to Big-endian as per
  documentation;
- use sized types throughout in MGH header structures to ensure correct
  byte swapping, etc.;
- convert tag IDs to text and vice-versa for known tags, and substitute
  with raw number if unknown;
- general cleanup to maximise code reuse between .mgh and .mgz formats.
@jdtournier
Copy link
Copy Markdown
Member

OK, had a shot at cleaning things up a bit more, should be good to go. Tags are now named according to the table supplied by @ansk, and convert back and forth OK as far as I can tell. The mgh_mgz_try_reading_other branch is gone, it got merged into this one quite early on anyway. Wouldn't mind another pair of eyes to make sure everyone does work as expected before merging.

After that, I guess we can start thinking about #962...

Note that now the contents of struct mgh_tag will appear as big-endian in all ciircumstances, and therefore conversion is necessary whenever reading or writing; this is to be consistent with the other mgh header structures.
@Lestropie
Copy link
Copy Markdown
Member Author

Have had a look and a tweak, but it's still not ready to go. MRtrix3 will now happily read .mgh and .mgz created by either FreeSurfer or itself, but FreeSurfer will still not open files created by MRtrix3. If I had to take a guess, I'd say that all that empty null space is doing something. Regardless, the FreeSurfer source will need a closer look.

@jdtournier
Copy link
Copy Markdown
Member

jdtournier commented Apr 19, 2017

OK, I see you didn't like my prepare_tag() idea... Oh well.

FreeSurfer will still not open files created by MRtrix3

That's odd, I thought this had previously been confirmed as working by @ansk? Maybe the latest commits have mangled things a bit? Does it not work with e19dabe?

@Lestropie
Copy link
Copy Markdown
Member Author

I got the idea of prepare_tag(), but in the process of trying to figure out what was going wrong, it disappeared. Partly because in mgh_header and mgh_other, the stored values always remain as big-endian and have to be read/written accordingly. It looks like you tried to do that, but then there were missing ByteOrder::BE() calls around the place, so I decided to remove that function and validate every read / write functionality individually.

Yes, it was confirmed working at one point; but there were changes subsequent to that. I've also been looking at null terminations (or lack thereof) wondering if that's causing issues. Also not sure if all combinations were tested, i.e. Freesurfer export to MRtrix import, MRtrix export to FreeSurfer import, MRtrix export to MRtrix import, zipped v.s. unzipped.

for (const auto& tag : MGHO.tags) {
out.write (reinterpret_cast<const char*> (&tag.id), sizeof(tag.id));
out.write (reinterpret_cast<const char*> (&tag.size), sizeof(tag.size));
assert (size_t(ByteOrder::BE (tag.size)) == tag.content.size());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That assert will fail routinely, because of the null termination on the tag.content string. For example, in the data sent through by @ansk:

...
mrinfo [lib/formats/mgh.cpp: 63]: tag.id = 33
mrinfo [lib/formats/mgh.cpp: 64]: tag.size = 1600
mrinfo [lib/formats/mgh.cpp: 65]: tag.content.size() = 185
mrinfo [lib/formats/mgh.cpp: 66]: tag.content = AutoAlign   1.000000   0.000000   0.000000   0.000000   0.000000   1.000000   0.000000   0.000000   0.000000   0.000000   1.000000   0.000000   0.000000   0.000000   0.000000   1.000000
...

Not sure whether this has anything to do with FreeSurfer not accepting these data, but that's how it worked before when it was confirmed as working. I'm guessing it'll be something else...

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That specific case is a write, so it's only testing that MRtrix has set the tag.size value appropriately. Nothing to do with input data.

I'm pretty sure that FreeSurfer does not include null terminators (at least not always) in the length calculation: the phase encode direction tag gave a size of 7 for the string UNKNOWN.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, OK... please ignore me.

@jdtournier
Copy link
Copy Markdown
Member

I got the idea of prepare_tag(), but in the process of trying to figure out what was going wrong, it disappeared. Partly because in mgh_header and mgh_other, the stored values always remain as big-endian and have to be read/written accordingly. It looks like you tried to do that, but then there were missing ByteOrder::BE() calls around the place, so I decided to remove that function and validate every read / write functionality individually.

I figured it might have been something like that...

My only concern with your changes is that it looks like you're keeping big-endian versions of the tags once read in RAM. I think it would be much cleaner to convert at the read or write time, and maintain native-endian versions in RAM at all times otherwise. I'll have a go at that.

Yes, it was confirmed working at one point; but there were changes subsequent to that. I've also been looking at null terminations (or lack thereof) wondering if that's causing issues. Also not sure if all combinations were tested, i.e. Freesurfer export to MRtrix import, MRtrix export to FreeSurfer import, MRtrix export to MRtrix import, zipped v.s. unzipped.

Pretty sure the testing wouldn't have been extensive, at least not on my side - I don't have FreeSurfer installed... It will need some more thorough testing before merging.

@Lestropie
Copy link
Copy Markdown
Member Author

I think it would be much cleaner to convert at the read or write time, and maintain native-endian versions in RAM at all times otherwise.

Trouble then is that you have these tags being stored in RAM with native endianness, but the header & other structs being represented as big-endian at all times. Was just another source of confusion when I was looking through it all.

@jdtournier
Copy link
Copy Markdown
Member

Trouble then is that you have these tags being stored in RAM with native endianness, but the header & other structs being represented as big-endian at all times.

OK, I see. It might be that we'd be better off not using memory-mapping for the read then. Would also avoid this issue with alignment and the #pragma pack directive...? Would be a bit more work, though...

@jdtournier
Copy link
Copy Markdown
Member

Canned in favour of #970

@jdtournier jdtournier closed this Apr 21, 2017
@Lestropie
Copy link
Copy Markdown
Member Author

@ansk: Where did you get that up-to-date tags.h? I've checked downloads of FreeSurfer 5.3 and 6.0 beta, and neither seem to come with the relevant source code. I'm pretty sure that the source files linked to on the MGH format page are old; they contain less #define's than what you provided, and I can see some potential bugs in the image file read/write functions that may have since been fixed. If I could get my hands on fully up-to-date source code, I might be able to mimic their functions to provide more reliable support in MRtrix3...

@ansk
Copy link
Copy Markdown

ansk commented May 10, 2017

@Lestropie,
I requested access to their git repository. But recently it is much easier since FreeSurfer sources are now also on github: https://github.com/freesurfer/freesurfer
So tags.h for development version of FreeSurfer is:
https://github.com/freesurfer/freesurfer/blob/dev/include/tags.h

@Lestropie
Copy link
Copy Markdown
Member Author

OK thanks @ansk; that looks more up-to-date than a forked version of the repository I found after sending that message.

For reference for future me: Implementation code is found in /utils/ rather than /include/.

@Lestropie
Copy link
Copy Markdown
Member Author

@ansk (and anybody else interested): I've mimiced much of the FreeSurfer code in MRtrix3 to further improve the handling of MGH / MGZ files in #970. If anybody has image files that have consistently caused problems in MRtrix3, it would be appreciated to test them against the mgh_mgz_write_fix_take2 code branch before those changes get merged to master. Though it certainly worked with the example problematic files I've been provided with.

@Lestropie Lestropie deleted the mgh_mgz_write_fix branch June 5, 2017 02:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants