fixelcfestats: New option -mask#1030
Conversation
This option restricts the statistical analysis to a subset of fixels, as defined by a boolean fixel data file. Calculation of the fixel-fixel connectivity matrix, data smoothing, t-statistic calculation, and stiatistical enhancement, are all only performed on those fixels within the mask. However, the output fixel directory contains all of the fixels from the input template fixel image, not just those within the mask; this retains fixel-fixel correspondence between the input template image and the output results. For fixels outside the mask, a value of NaN is written for all fixel data file outputs.
Also includes a bug fix where opting to not smooth the input data would likely have resulted in an error.
On testing prior changes to fixelcfestats to support the -mask option, it was found to run extremely slowly, even when not using the -mask option. The suspected cause was that during the matrix normalisation, each row of the connectivity matrix (a map<uint32_t, float>) was being re-constructed from scratch (due to the possibility of fixel indices changing), and hence what was a balanced binary search tree would become essentially a sorted doubly-linked-list. Now, each row of the connectivity matrix is instead converted to essentially a vector<pair<uint32_t,float>>, since it is no longer necessary to have fast lookup for an arbitrary fixel index. This should reduce memory requirement after matrix construction has completed, and will also hopefully improve cache performance since the connectivity data for each fixel will be serialised. There are also various little fixes and tweaks to polish off additions for supporting the -mask option.
|
OK, looks like this is now working as it should. I'll see if I can get one of the users to test as well. Then need to decide whether to go to Comes with a nice execution speedup thanks to better cache hitrate: Independently of masking, my test dataset went from 8h to 2h 😈 |
|
These changes sound good - but I'm not sure I'm in a position to review them properly at the moment... I've added a couple of other reviewers, hopefully someone can just have a look and approve these changes more rapidly than me. Impressed and somewhat puzzled by this:
Not sure I fully understand how that worked out... |
The connectivity matrix is built using a With these changes, during the normalisation / thresholding step the connectivity matrix is converted to a Time will tell if the speedup is reproducible, I'm only reporting a single timed run; but it makes sense. |
draffelt
left a comment
There was a problem hiding this comment.
Just had a look over these changes and it all looks good.
Sadly I never got around to adding tests for fixelcfestats. Would be good to add at least one test where it tests the output when using the -notest option (i.e. only checks the t-stat, cfe, effect size etc). Actually, with the relatively new -permutations option, we could also test the pvalue output with a handful of set permutations.
Happy to add these tests, just won't be in the next week or so.
|
Good point re the tests - this is a pretty central command, it really does need to be protected with a few tests... |
|
@draffelt - you might want to actually approve these changes if you're happy with them, otherwise Rob can't merge... |
|
I'll approve for now. But was thinking it might be good to check we at least get the same result without the mask as current master. |
|
Or have you done that already Rob? That is an impressive speed up. |
|
@draffelt: OK, gotcha. That's a really good suggestion, we should set up some testing now to verify equivalent results pre & post. @Lestropie, any chance you can find the time to generate a few tests? |
|
I did a pre & post without mask, it wasn't bitwise identical (presumably due to random permutations) but it was only edge fixels here and there that changed for p<0.05. I could run again with pre-generated permutations if you want. One thing I did change was this hard-coded value, to use the same threshold for the smoothing matrix as the connectivity matrix. For default parameters the change has no net effect though; it just made sense to me that if the user expands or contracts the matrix by tuning the connectivity threshold up or down, the extent of the smoothing matrix should do the same thing. |
|
With pre-calculated permutations, 💪 I'm veering away from pushing anything other than fixes to |
|
Good to hear. Any chance of running tests to verify operation that don't take 1½ hours...?
My thoughts exactly. A new |
As an aside: I've been working on various aspects of the stats code lately (got multiple contrasts working!), so one thing I did just today was write a script that generates random dummy data to feed to |
With the introduction of the -mask option in fixelcfestats (#1030), fixel data files may contain the value NaN to indicate fixels that were outside the processing mask. However if such values were read within fixel2tsf, and hence propagated to the output track scalar file, these would be erroneously interpreted as track delimiters. This fix replaces non-finite values with 0.0 in fixel2tsf to overcome this limitation of the .tck / .tsf formats. Some assert() calls have also been added to the relevant code in mrview to ensure that any loaded .tsf file does indeed align perfectly with the corresponding track file.
Had a go since there were multiple requests. Untested, but it's easier for me to get it installed on a server to test if I push to the online repo, so might as well post here straight away in case anybody is curious.
Wasn't a single unique solution. What I've gone with is:
Despite using
-maskoption, output fixel image will still contain the same set of fixels as the input fixel template.When constructing the fixel-fixel connectivity matrix, streamlines may still be assigned to a fixel that is outside the mask, if the streamline is most collinear with that fixel. However connectivity to that fixel will not contribute in any way to the analysis.
Only fixels within the mask are involved in data smoothing, GLM, CFE.
Benefit of keeping all fixels is that you keep fixel correspondence across different experiments from the same template. You can still see those fixels, but they should disappear as soon as any quantitative value is loaded (should; need to check). Cropping the output would have made the set of fixels within the analysis mask more immediately obvious; but you can just
fixelcropafter the fact if you want.Closes #987.