Skip to content

fixelcfestats: New option -mask#1030

Merged
Lestropie merged 13 commits intotag_3.0_RC2from
fixelcfestats_mask
Jul 14, 2017
Merged

fixelcfestats: New option -mask#1030
Lestropie merged 13 commits intotag_3.0_RC2from
fixelcfestats_mask

Conversation

@Lestropie
Copy link
Copy Markdown
Member

Had a go since there were multiple requests. Untested, but it's easier for me to get it installed on a server to test if I push to the online repo, so might as well post here straight away in case anybody is curious.

Wasn't a single unique solution. What I've gone with is:

  • Despite using -mask option, output fixel image will still contain the same set of fixels as the input fixel template.

  • When constructing the fixel-fixel connectivity matrix, streamlines may still be assigned to a fixel that is outside the mask, if the streamline is most collinear with that fixel. However connectivity to that fixel will not contribute in any way to the analysis.

  • Only fixels within the mask are involved in data smoothing, GLM, CFE.

Benefit of keeping all fixels is that you keep fixel correspondence across different experiments from the same template. You can still see those fixels, but they should disappear as soon as any quantitative value is loaded (should; need to check). Cropping the output would have made the set of fixels within the analysis mask more immediately obvious; but you can just fixelcrop after the fact if you want.

Closes #987.

This option restricts the statistical analysis to a subset of fixels, as defined by a boolean fixel data file.
Calculation of the fixel-fixel connectivity matrix, data smoothing, t-statistic calculation, and stiatistical enhancement, are all only performed on those fixels within the mask.
However, the output fixel directory contains all of the fixels from the input template fixel image, not just those within the mask; this retains fixel-fixel correspondence between the input template image and the output results.
For fixels outside the mask, a value of NaN is written for all fixel data file outputs.
@Lestropie Lestropie self-assigned this Jun 27, 2017
Also includes a bug fix where opting to not smooth the input data would likely have resulted in an error.
On testing prior changes to fixelcfestats to support the -mask option, it was found to run extremely slowly, even when not using the -mask option. The suspected cause was that during the matrix normalisation, each row of the connectivity matrix (a map<uint32_t, float>) was being re-constructed from scratch (due to the possibility of fixel indices changing), and hence what was a balanced binary search tree would become essentially a sorted doubly-linked-list. Now, each row of the connectivity matrix is instead converted to essentially a vector<pair<uint32_t,float>>, since it is no longer necessary to have fast lookup for an arbitrary fixel index. This should reduce memory requirement after matrix construction has completed, and will also hopefully improve cache performance since the connectivity data for each fixel will be serialised.
There are also various little fixes and tweaks to polish off additions for supporting the -mask option.
@Lestropie
Copy link
Copy Markdown
Member Author

OK, looks like this is now working as it should. I'll see if I can get one of the users to test as well. Then need to decide whether to go to master or a generic dev branch (which I've been meaning to create for other things anyway).

Comes with a nice execution speedup thanks to better cache hitrate: Independently of masking, my test dataset went from 8h to 2h 😈

@Lestropie Lestropie requested a review from jdtournier July 5, 2017 05:59
@jdtournier
Copy link
Copy Markdown
Member

These changes sound good - but I'm not sure I'm in a position to review them properly at the moment... I've added a couple of other reviewers, hopefully someone can just have a look and approve these changes more rapidly than me.

Impressed and somewhat puzzled by this:

Independently of masking, my test dataset went from 8h to 2h

Not sure I fully understand how that worked out...

@Lestropie
Copy link
Copy Markdown
Member Author

Not sure I fully understand how that worked out...

The connectivity matrix is built using a std::map<uint32_t, float> per fixel, so that existing connections can be found efficiently. But if std::map is implemented using a binary search tree, that's 2 64-bit pointers per 8-byte connection stored; and the memory allocations could end up scattered, depending on how STL manages memory allocation for a std::map (or ~500,000 of them at once).

With these changes, during the normalisation / thresholding step the connectivity matrix is converted to a vector<std::pair<uint32_t, float>> per fixel, making the data per fixel contiguous and reducing its total size. For the statistical enhancement you only need to loop over the data per fixel; no need for efficient lookup of a particular fixel index.

Time will tell if the speedup is reproducible, I'm only reporting a single timed run; but it makes sense.

Copy link
Copy Markdown
Member

@draffelt draffelt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just had a look over these changes and it all looks good.

Sadly I never got around to adding tests for fixelcfestats. Would be good to add at least one test where it tests the output when using the -notest option (i.e. only checks the t-stat, cfe, effect size etc). Actually, with the relatively new -permutations option, we could also test the pvalue output with a handful of set permutations.

Happy to add these tests, just won't be in the next week or so.

@jdtournier
Copy link
Copy Markdown
Member

Good point re the tests - this is a pretty central command, it really does need to be protected with a few tests...

@jdtournier
Copy link
Copy Markdown
Member

@draffelt - you might want to actually approve these changes if you're happy with them, otherwise Rob can't merge...

@draffelt
Copy link
Copy Markdown
Member

draffelt commented Jul 5, 2017

I'll approve for now. But was thinking it might be good to check we at least get the same result without the mask as current master.

draffelt
draffelt previously approved these changes Jul 5, 2017
@draffelt
Copy link
Copy Markdown
Member

draffelt commented Jul 5, 2017

Or have you done that already Rob?

That is an impressive speed up.

@jdtournier
Copy link
Copy Markdown
Member

@draffelt: OK, gotcha. That's a really good suggestion, we should set up some testing now to verify equivalent results pre & post. @Lestropie, any chance you can find the time to generate a few tests?

@Lestropie
Copy link
Copy Markdown
Member Author

I did a pre & post without mask, it wasn't bitwise identical (presumably due to random permutations) but it was only edge fixels here and there that changed for p<0.05. I could run again with pre-generated permutations if you want.

One thing I did change was this hard-coded value, to use the same threshold for the smoothing matrix as the connectivity matrix. For default parameters the change has no net effect though; it just made sense to me that if the user expands or contracts the matrix by tuning the connectivity threshold up or down, the extent of the smoothing matrix should do the same thing.

@Lestropie
Copy link
Copy Markdown
Member Author

With pre-calculated permutations, fwe_pvalue images vary by no more than one permutation's worth; p<0.05 masks are bitwise equivalent.

$ time /software/mrtrix3/bin/fixelcfestats ...
real    518m12.199s
user    13251m53.299s
sys     195m55.055s

$ time /software/mrtrix_testing/bin/fixelcfestats ...
real    137m26.918s
user    1562m37.928s
sys     83m32.612s

💪

I'm veering away from pushing anything other than fixes to master, but if we're tagging 3.0_RC2 for #1036 (which I think we should), this can possibly be carried along with that?

@jdtournier
Copy link
Copy Markdown
Member

Good to hear. Any chance of running tests to verify operation that don't take 1½ hours...?

I'm veering away from pushing anything other than fixes to master, but if we're tagging 3.0_RC2 for #1036 (which I think we should), this can possibly be carried along with that?

My thoughts exactly. A new 3.0_RC2 tag would be a good opportunity to add/modify things that we might otherwise have delayed for a future version. I reckon now is the time to create a 3.0_pre_RC2 branch, with a view to merging these changes along with the mtlognorm branch and some of the other pull requests into it, and we'll then push the lot out to master when ready and tag accordingly.

@jdtournier jdtournier changed the base branch from master to dev July 12, 2017 13:47
@Lestropie
Copy link
Copy Markdown
Member Author

Good point re the tests - this is a pretty central command, it really does need to be protected with a few tests...

As an aside: I've been working on various aspects of the stats code lately (got multiple contrasts working!), so one thing I did just today was write a script that generates random dummy data to feed to vectorstats. That's the cleanest way to test the core GLM functionality. Testing the higher-order stats commands gets a bit trickier... but I figured it'd be easier to start from the bottom and work my way up.

@Lestropie Lestropie changed the base branch from dev to tag_3.0_RC2 July 14, 2017 06:42
@Lestropie Lestropie merged commit fc9d9b1 into tag_3.0_RC2 Jul 14, 2017
@Lestropie Lestropie deleted the fixelcfestats_mask branch July 14, 2017 06:51
@Lestropie Lestropie mentioned this pull request Jul 14, 2017
Lestropie added a commit that referenced this pull request Dec 14, 2017
With the introduction of the -mask option in fixelcfestats (#1030), fixel data files may contain the value NaN to indicate fixels that were outside the processing mask. However if such values were read within fixel2tsf, and hence propagated to the output track scalar file, these would be erroneously interpreted as track delimiters. This fix replaces non-finite values with 0.0 in fixel2tsf to overcome this limitation of the .tck / .tsf formats.
Some assert() calls have also been added to the relevant code in mrview to ensure that any loaded .tsf file does indeed align perfectly with the corresponding track file.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants