fixelcfestats: New option -mask by Lestropie · Pull Request #1030 · MRtrix3/mrtrix3

Lestropie · 2017-06-27T05:39:10Z

Had a go since there were multiple requests. Untested, but it's easier for me to get it installed on a server to test if I push to the online repo, so might as well post here straight away in case anybody is curious.

Wasn't a single unique solution. What I've gone with is:

Despite using -mask option, output fixel image will still contain the same set of fixels as the input fixel template.
When constructing the fixel-fixel connectivity matrix, streamlines may still be assigned to a fixel that is outside the mask, if the streamline is most collinear with that fixel. However connectivity to that fixel will not contribute in any way to the analysis.
Only fixels within the mask are involved in data smoothing, GLM, CFE.

Benefit of keeping all fixels is that you keep fixel correspondence across different experiments from the same template. You can still see those fixels, but they should disappear as soon as any quantitative value is loaded (should; need to check). Cropping the output would have made the set of fixels within the analysis mask more immediately obvious; but you can just fixelcrop after the fact if you want.

Closes #987.

This option restricts the statistical analysis to a subset of fixels, as defined by a boolean fixel data file. Calculation of the fixel-fixel connectivity matrix, data smoothing, t-statistic calculation, and stiatistical enhancement, are all only performed on those fixels within the mask. However, the output fixel directory contains all of the fixels from the input template fixel image, not just those within the mask; this retains fixel-fixel correspondence between the input template image and the output results. For fixels outside the mask, a value of NaN is written for all fixel data file outputs.

Also includes a bug fix where opting to not smooth the input data would likely have resulted in an error.

On testing prior changes to fixelcfestats to support the -mask option, it was found to run extremely slowly, even when not using the -mask option. The suspected cause was that during the matrix normalisation, each row of the connectivity matrix (a map<uint32_t, float>) was being re-constructed from scratch (due to the possibility of fixel indices changing), and hence what was a balanced binary search tree would become essentially a sorted doubly-linked-list. Now, each row of the connectivity matrix is instead converted to essentially a vector<pair<uint32_t,float>>, since it is no longer necessary to have fast lookup for an arbitrary fixel index. This should reduce memory requirement after matrix construction has completed, and will also hopefully improve cache performance since the connectivity data for each fixel will be serialised. There are also various little fixes and tweaks to polish off additions for supporting the -mask option.

Lestropie · 2017-07-05T03:58:35Z

OK, looks like this is now working as it should. I'll see if I can get one of the users to test as well. Then need to decide whether to go to master or a generic dev branch (which I've been meaning to create for other things anyway).

Comes with a nice execution speedup thanks to better cache hitrate: Independently of masking, my test dataset went from 8h to 2h 😈

jdtournier · 2017-07-05T09:21:28Z

These changes sound good - but I'm not sure I'm in a position to review them properly at the moment... I've added a couple of other reviewers, hopefully someone can just have a look and approve these changes more rapidly than me.

Impressed and somewhat puzzled by this:

Independently of masking, my test dataset went from 8h to 2h

Not sure I fully understand how that worked out...

Lestropie · 2017-07-05T10:27:02Z

Not sure I fully understand how that worked out...

The connectivity matrix is built using a std::map<uint32_t, float> per fixel, so that existing connections can be found efficiently. But if std::map is implemented using a binary search tree, that's 2 64-bit pointers per 8-byte connection stored; and the memory allocations could end up scattered, depending on how STL manages memory allocation for a std::map (or ~500,000 of them at once).

With these changes, during the normalisation / thresholding step the connectivity matrix is converted to a vector<std::pair<uint32_t, float>> per fixel, making the data per fixel contiguous and reducing its total size. For the statistical enhancement you only need to loop over the data per fixel; no need for efficient lookup of a particular fixel index.

Time will tell if the speedup is reproducible, I'm only reporting a single timed run; but it makes sense.

draffelt

Just had a look over these changes and it all looks good.

Sadly I never got around to adding tests for fixelcfestats. Would be good to add at least one test where it tests the output when using the -notest option (i.e. only checks the t-stat, cfe, effect size etc). Actually, with the relatively new -permutations option, we could also test the pvalue output with a handful of set permutations.

Happy to add these tests, just won't be in the next week or so.

jdtournier · 2017-07-05T11:31:39Z

Good point re the tests - this is a pretty central command, it really does need to be protected with a few tests...

jdtournier · 2017-07-05T11:33:06Z

@draffelt - you might want to actually approve these changes if you're happy with them, otherwise Rob can't merge...

draffelt · 2017-07-05T11:50:08Z

I'll approve for now. But was thinking it might be good to check we at least get the same result without the mask as current master.

draffelt · 2017-07-05T11:52:02Z

Or have you done that already Rob?

That is an impressive speed up.

jdtournier · 2017-07-05T12:49:04Z

@draffelt: OK, gotcha. That's a really good suggestion, we should set up some testing now to verify equivalent results pre & post. @Lestropie, any chance you can find the time to generate a few tests?

Lestropie · 2017-07-05T13:40:58Z

I did a pre & post without mask, it wasn't bitwise identical (presumably due to random permutations) but it was only edge fixels here and there that changed for p<0.05. I could run again with pre-generated permutations if you want.

One thing I did change was this hard-coded value, to use the same threshold for the smoothing matrix as the connectivity matrix. For default parameters the change has no net effect though; it just made sense to me that if the user expands or contracts the matrix by tuning the connectivity threshold up or down, the extent of the smoothing matrix should do the same thing.

Lestropie · 2017-07-07T01:07:07Z

With pre-calculated permutations, fwe_pvalue images vary by no more than one permutation's worth; p<0.05 masks are bitwise equivalent.

$ time /software/mrtrix3/bin/fixelcfestats ...
real    518m12.199s
user    13251m53.299s
sys     195m55.055s

$ time /software/mrtrix_testing/bin/fixelcfestats ...
real    137m26.918s
user    1562m37.928s
sys     83m32.612s

💪

I'm veering away from pushing anything other than fixes to master, but if we're tagging 3.0_RC2 for #1036 (which I think we should), this can possibly be carried along with that?

…option

jdtournier · 2017-07-07T12:31:34Z

Good to hear. Any chance of running tests to verify operation that don't take 1½ hours...?

I'm veering away from pushing anything other than fixes to master, but if we're tagging 3.0_RC2 for #1036 (which I think we should), this can possibly be carried along with that?

My thoughts exactly. A new 3.0_RC2 tag would be a good opportunity to add/modify things that we might otherwise have delayed for a future version. I reckon now is the time to create a 3.0_pre_RC2 branch, with a view to merging these changes along with the mtlognorm branch and some of the other pull requests into it, and we'll then push the lot out to master when ready and tag accordingly.

Lestropie · 2017-07-13T08:29:42Z

Good point re the tests - this is a pretty central command, it really does need to be protected with a few tests...

As an aside: I've been working on various aspects of the stats code lately (got multiple contrasts working!), so one thing I did just today was write a script that generates random dummy data to feed to vectorstats. That's the cleanest way to test the core GLM functionality. Testing the higher-order stats commands gets a bit trickier... but I figured it'd be easier to start from the bottom and work my way up.

With the introduction of the -mask option in fixelcfestats (#1030), fixel data files may contain the value NaN to indicate fixels that were outside the processing mask. However if such values were read within fixel2tsf, and hence propagated to the output track scalar file, these would be erroneously interpreted as track delimiters. This fix replaces non-finite values with 0.0 in fixel2tsf to overcome this limitation of the .tck / .tsf formats. Some assert() calls have also been added to the relevant code in mrview to ensure that any loaded .tsf file does indeed align perfectly with the corresponding track file.

Lestropie added 2 commits June 27, 2017 15:17

Docs update

11911d5

Lestropie added the enhancement label Jun 27, 2017

Lestropie self-assigned this Jun 27, 2017

Lestropie added 7 commits June 27, 2017 15:45

fixelcfestats: Fix compilation

7c58e1c

fixelcfestats: Various fixels for -mask option

3a6aded

Also includes a bug fix where opting to not smooth the input data would likely have resulted in an error.

fixelcfestats: Make sure connectivity matrix access functions are inline

3769f48

fixelcfestats: Minor tweaks

840beac

mrview Fixel plot: Compatibility with fixel data files containing NaNs

f746055

Stats::CFE::NormMatrixElement class: Eigen 3.3 compatibility

07289db

Lestropie requested a review from jdtournier July 5, 2017 05:59

fixelcfestats: Remove vestigial debugging call

0f13690

jdtournier requested review from draffelt and thijsdhollander July 5, 2017 09:19

draffelt reviewed Jul 5, 2017

View reviewed changes

Merge branch 'master' into fixelcfestats_mask

ca549ce

draffelt previously approved these changes Jul 5, 2017

View reviewed changes

fixelcfestats: Add doc information regarding output when using -mask …

9866913

…option

Lestropie dismissed draffelt’s stale review via 9866913 July 7, 2017 07:49

jdtournier changed the base branch from master to dev July 12, 2017 13:47

Merge branch 'dev' into fixelcfestats_mask

dbf9e1e

Lestropie changed the base branch from dev to tag_3.0_RC2 July 14, 2017 06:42

Lestropie merged commit fc9d9b1 into tag_3.0_RC2 Jul 14, 2017

Lestropie deleted the fixelcfestats_mask branch July 14, 2017 06:51

Lestropie mentioned this pull request Jul 14, 2017

Tag 3.0 RC2 #1052

Merged

This was referenced Sep 15, 2017

fixelcfestats: -mask option #987

Closed

fixelcfestats: fix import of file list #1125

Merged

Conversation

Lestropie commented Jun 27, 2017

Uh oh!

Lestropie commented Jul 5, 2017

Uh oh!

jdtournier commented Jul 5, 2017

Uh oh!

Lestropie commented Jul 5, 2017

Uh oh!

draffelt left a comment

Choose a reason for hiding this comment

Uh oh!

jdtournier commented Jul 5, 2017

Uh oh!

jdtournier commented Jul 5, 2017

Uh oh!

draffelt commented Jul 5, 2017

Uh oh!

draffelt commented Jul 5, 2017

Uh oh!

jdtournier commented Jul 5, 2017

Uh oh!

Lestropie commented Jul 5, 2017

Uh oh!

Lestropie commented Jul 7, 2017

Uh oh!

jdtournier commented Jul 7, 2017

Uh oh!

Lestropie commented Jul 13, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants