GLM enhancements & surrounding code by Lestropie · Pull Request #1543 · MRtrix3/mrtrix3

Lestropie · 2019-02-01T07:02:48Z

Since there's plenty of demand for it, I'll see how I go getting this incorporated into RC4.

Currently there are new unit tests for vectorstats, which primarily test the internals of the GLM. Generating CI tests for connectomestats / fixelcfestats / mrclusterstats is feasibly possible, just haven't put the energy into them; unlike the GLM internals the relevant code hasn't changed a great deal from old code anyway.

Major things that are in here:

GLM:

Freedman-Lane method rather than ~~Shuffle-X~~ Shuffle-Y / Manly.
(Edit: While in the code it was the design matrix that was shuffled, the effect of interest vs. nuisance regressors were not separated, and hence the empirical behaviour was equivalent to instead shuffling the data, which makes it Shuffle-Y).
Multiple hypothesis testing (i.e. contrast matrix as opposed to vector).
F-tests.
Per-element design matrices; can have:
- Nuisance regressors, or indeed effects of interest, where the value is unique for each fixel / voxel / connectome edge tested;
- Non-finite subject data, in which case the relevant design matrix rows are scrubbed prior to inverting the model.
Sign-flipping in addition to / instead of permutations; enables one-sample t-tests.

`fixelcfestats`:

Normalised form of CFE equation. Provides effect comparable to non-stationarity correction without computational penalty. Note however that the recommended FBA processing may need to change here: both this mechanism and permutation-based non-stationarity correction are sensitive to masking.
Fixel-fixel connectivity matrix generation ~ 4-5 times faster and requires ~ 4-5 times less RAM.
Multi-threaded other stages of processing.

New commands:

fixelconnectivity: Generate a fixel-fixel connectivity matrix (and optionally a second matrix intended for data smoothing), and write to file. Current file format used does however incur decent performance hit on load/save.
fixelfilter: Perform filtering operations on fixel data. This includes smoothing, and a preliminary connected-component algorithm. Can operate on individual fixel data files, or all data files within a fixel format directory.

Use of the two commands above can be followed by running fixelcfestats but providing pre-smoothed fixel data and a pre-calculated fixel-fixel connectivity matrix. If fixelcfestats is provided with a tractogram file as per current usage, it will generate the connectivity matrices and perform fixel data smoothing during data import as per current behaviour.

Manually ported from fixel_twi_0.3.15 branch.

Previously, all images in the fixel directory were opened, and then is_directions_file() was called, which checked both the image properties and the file name. Apart from being inefficient, this also clogs -info output as the directory is scanned. This change validates that the name of a file is a plausible directions file before instantiating a Header and checking the image properties.

Functions for T-tests were moved into individual GLM classes, to reduce confusion with respect to pre-scaling of contrasts and transposition of matrices.

…e design matrix experiments

See #957.

- Incorrect definition of non-const references to global matrices during calculation of default statistics resulting in zero stdev and std_effect_size. - Incorrect calculation of degrees of freedom in GLMTTestVariable::ttest() resulting in infinite variance and zero t-values.

… calculation

NOTE: Not yet tested Detects the presence of non-finite values in either the input data, or element-wise design matrix columns, and omits the relevant rows from the linear regression. Currently only implemented for fixelcfestats.

Provides functionality for edge-wise explanatory variables, implemented in the same manner as was done for fixelcfestats.

Performing substantial re-write of connected components filter in order to make roles and functionalities of different code more clear.

The roles of different parts of the code have been made more clear by encapsulating them within appropriately-named classes. The functionality of mapping each voxel position to an index in a 1D vector of data now resides in the Voxel2Vector class, placed in core/misc since it is used both by the connected components filter, and the statistics code. core/bitset.h has also been moved into this new core/misc/ directory, since it's just a utility class that isn't as fundamental to the compilation of MRtrix3 as the other contents of core/.

Functionality for voxel-wise explanatory variables, just as has already been implemented for fixelcfestats and connectomestats.

Note: None of this code has been tested whatsoever.

… into stats_elementwise_design_matrix Some tricky merging happening here, since adding multiple contrast matrix row support conflicted quite heavily with other changes that had happened in this branch e.g. support of NaN values in the data.

- Provide a text string describing the fact that a column of ones is not automatically added to GLM design matrices, and add this string to the DESCRIPTION field of all statistical inference commands. Also fix up some code comments regarding the purpose of this column. - Provide a test for vectorstats; its output is not yet tested however.

- Modify how multiple contrasts are handled. Initial support for this was previously implemented for vectorstats only, based on the contrast being a matrix rather than a vector. However, this framework does not naturally extend to F-tests. This new approach stores each within a GLM::Contrast class, and these are explicitly looped over whenever required. GLM t-test code has been replaced with the F-test (i.e. without variance groups) as presented in Winkler et al., 2014. This has removed many optimisations, but was necessary in order to sufficiently generalise the framework for upcoming enhancement. Changed stdev calculations in statistical inference commands to a vector, since this does not change betwee contrasts. - Removed apparent redundant creation of output images related to the default permutation in mrclusterstats. - Model is currently partitioned based on null columns of the contrast matrix only. Note that this is merely the first COMPILING version of this code.

Conflicts: cmd/connectomestats.cpp docs/reference/commands/connectomestats.rst testing/data

Fixes operation of vectorstats test number 3, where the model is "partitioned" despite possessing no nuisance regressors.

Rather than randomly generating test data for vectorstats input, use pre-generated test data known to pass tests for statistical significance. Additionally perform precise tests of non-stochastic outputs against pre-generated output data.

Lestropie · 2019-11-28T00:38:24Z

🤞

diagiraldo · 2020-03-11T17:16:58Z

Hi @Lestropie
I am trying to run the new fixelcfestats in the dev branch but it produced an error I consider a bit weird:

The directory with subject fixel values (already smoothed) is something like PATH_1/smooth_log_fc
The text file listing the subjects is in another directory, something like PATH_2/list_of_subjects.txt
When I pass all the inputs to fixelcfestats, It produces the following error:

fixelcfestats: Number of fixels in template: 1015842
fixelcfestats: Number of fixels in mask: 874231
fixelcfestats: [done] Importing data from files listed in "list_of_subjects.txt"
fixelcfestats: [ERROR] Unable to find subject image "PATH_2/subject_01.mif" either in input fixel diretory "PATH_1/smooth_log_fc" or in current working directory
fixelcfestats: [ERROR] Reading text file "list_of_subjects.txt": input image data file not found: "PATH_2/subject_01.mif"

I checked the text file list_of_subjects.txt and it only contains the identifiers with the .mif extension, so the error is not because of the content inside that file.
It gives me the impression that somewhere the code is concatenating the list path with the file identifier when it initializes the fixel importing.

At this point I am not sure if this is the expected behavior and the list file is supposed to be inside the input fixel directory or if this is some kind of bug.

Lestropie · 2020-03-12T02:11:08Z

I believe the problem is that the directory location of the subject list text file is being used as the reference location, rather than the current working directory. The bit of the error message that says "... or in current working directory" is maybe not accurate.

Moving list_of_subjects.txt from PATH_2/ into the parent directory should get it to work.

I probably want to remove this line of code entirely.

diagiraldo · 2020-04-14T15:14:33Z

Hi again!
Yep, I tried moving the list file and then updating the repository and the command moved forward... to another error:

fixelcfestats: [DEBUG] initialising threads...
fixelcfestats: [DEBUG] launching 10 threads "loop threads"...
fixelcfestats: [DEBUG] waiting for completion of threads "loop threads"...
fixelcfestats: [DEBUG] threads "loop threads" completed OK
fixelcfestats: [DEBUG] unmapping file "results_log_fc/directions.mif"
fixelcfestats: [DEBUG] image "results_log_fc/directions.mif" unloaded
fixelcfestats: [DEBUG] unmapping file "out_smooth_log_fc/directions.mif"
fixelcfestats: [DEBUG] image "out_smooth_log_fc/directions.mif" unloaded

fixelcfestats: [SYSTEM FATAL CODE: SIGSEGV (11)] Segmentation fault: Invalid memory access

This happens when I run the fixelcfestats command with a conservative mask containing 1896 fixels with the -notest flag. First, I thought it was a memory issue but it happens exactly the same way in a machine with 256 GB of available memory.
Then I tried the Advanced debugging instructions with these results:

MRtrix build type requested: debug version with asserts, nooptim

Detecting OS: linux
Looking for compiler [clang++]: not found
Looking for compiler [g++]: g++ (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Checking for C++11 compliance: ok
Checking shared library generation: ok
Detecting pointer size: 64 bit
Detecting byte order: little-endian
Checking for variable-length array support: ok
Checking for non-POD variable-length array support: ok
Checking for ::max_align_t: 16 bytes
Checking for std::max_align_t: 16 bytes
Checking for Eigen3 library: 3.3.4
Checking for Eigen3 Unsupported: Present
Checking for zlib compression library: 1.2.11
Checking for "JSON for Modern C++" requirements: ok
Checking for TIFF library: LIBTIFF, Version 4.0.9
Checking for PNG library: Header: 1.6.34; library: 1.6.34
Checking for FFTW library: fftw-3.3.7-sse2-avx
Checking for Qt moc: moc (version 4.8.7)
Checking for Qt qmake: qmake (version 4.8.7)
Checking for Qt rcc: rcc (version 4.8.7)
Checking for Qt: 4.8.7
writing configuration to file './config': ok

when I ran the command with gdb:

diana@expe:/mnt/md0/mrtrix3$ gdb --args bin/fixelcfestats -force out_smooth_log_fc list_of_subjects.txt design.txt contrast.txt template/test_fixel_connectivity results_log_fc -mask test_fixel_mask/th_1.mif -info -notest -debug
GNU gdb (Ubuntu 8.1-0ubuntu3.2) 8.1.0.20180409-git
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from bin/fixelcfestats...done.

and nothing else happens, so I tried it with valgrind:

fixelcfestats: /usr/include/eigen3/Eigen/src/Core/DenseCoeffsBase.h:408: Eigen::DenseCoeffsBase<Derived, 1>::Scalar& Eigen::DenseCoeffsBase<Derived, 1>::operator[](Eigen::Index) [with Derived = Eigen::Block<Eigen::Matrix<double, -1, -1>, 1, -1, false>; Eigen::DenseCoeffsBase<Derived, 1>::Scalar = double; Eigen::Index = long int]: Assertion index >= 0 && index < size()' failed.
==31391== 
==31391== Process terminating with default action of signal 6 (SIGABRT)
==31391==    at 0x7138E97: raise (raise.c:51)
==31391==    by 0x713A800: abort (abort.c:79)
==31391==    by 0x712A399: __assert_fail_base (assert.c:92)
==31391==    by 0x712A411: __assert_fail (assert.c:101)
==31391==    by 0x4C4C4B: Eigen::DenseCoeffsBase<Eigen::Block<Eigen::Matrix<double, -1, -1, 0, -1, -1>, 1, -1, false>, 1>::operator[](long) (DenseCoeffsBase.h:408)
==31391==    by 0x4C2295: SubjectFixelImport::operator()(Eigen::Block<Eigen::Matrix<double, -1, -1, 0, -1, -1>, 1, -1, false>) const (fixelcfestats.cpp:192)
==31391==    by 0x47BA24: run() (fixelcfestats.cpp:368)
==31391==    by 0x477E17: main (command.h:101)
==31391== 
==31391== HEAP SUMMARY:
==31391==     in use at exit: 1,199,383 bytes in 1,873 blocks
==31391==   total heap usage: 10,365,194 allocs, 10,363,321 frees, 196,577,044 bytes allocated
==31391== 
==31391== LEAK SUMMARY:
==31391==    definitely lost: 0 bytes in 0 blocks
==31391==    indirectly lost: 0 bytes in 0 blocks
==31391==      possibly lost: 0 bytes in 0 blocks
==31391==    still reachable: 1,199,383 bytes in 1,873 blocks
==31391==         suppressed: 0 bytes in 0 blocks
==31391== Rerun with --leak-check=full to see details of leaked memory
==31391== 
==31391== For counts of detected and suppressed errors, rerun with: -v
==31391== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Aborted (core dumped)

But honestly, I have no idea if this error is due to insufficient memory or something related with the Eigen library
I did all this on Ubuntu 18.04.2.

Lestropie · 2020-04-15T00:31:19Z

when I ran the command with gdb: ... and nothing else happens

Did you type "r" and hit ENTER to run the command as per point 6?

One possibility is that the number of fixels is not equivalent between the various input fixel data files. Regardless of which fixels are or are not included in the mask, the sizes of the fixel data files still need to be equivalent to both the number of fixels in the index image and the fixel mask image. If this is the case I can add a check so that the error message is more informative.

diagiraldo · 2020-04-15T03:57:31Z

Did you type "r" and hit ENTER to run the command as per point 6?

No, I didn't 🤦‍♀️
When I do it, this is the message:

fixelcfestats: /usr/include/eigen3/Eigen/src/Core/DenseCoeffsBase.h:408: Eigen::DenseCoeffsBase<Derived, 1>::Scalar& Eigen::DenseCoeffsBase<Derived, 1>::operator[](Eigen::Index) [with Derived = Eigen::Block<Eigen::Matrix<double, -1, -1>, 1, -1, false>; Eigen::DenseCoeffsBase<Derived, 1>::Scalar = double; Eigen::Index = long int]: Assertion `index >= 0 && index < size()' failed.

Thread 1 "fixelcfestats" received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007ffff5766801 in __GI_abort () at abort.c:79
#2  0x00007ffff575639a in __assert_fail_base (fmt=0x7ffff58dd7d8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", 
    assertion=assertion@entry=0x555555a916cd "index >= 0 && index < size()", file=file@entry=0x555555a91698 "/usr/include/eigen3/Eigen/src/Core/DenseCoeffsBase.h", 
    line=line@entry=408, 
    function=function@entry=0x555555a9d2e0 <Eigen::DenseCoeffsBase<Eigen::Block<Eigen::Matrix<double, -1, -1, 0, -1, -1>, 1, -1, false>, 1>::operator[](long)::__PRETTY_FUNCTION__> "Eigen::DenseCoeffsBase<Derived, 1>::Scalar& Eigen::DenseCoeffsBase<Derived, 1>::operator[](Eigen::Index) [with Derived = Eigen::Block<Eigen::Matrix<double, -1, -1>, 1, -1, false>; Eigen::DenseCoeffsBa"...) at assert.c:92
#3  0x00007ffff5756412 in __GI___assert_fail (assertion=0x555555a916cd "index >= 0 && index < size()", file=0x555555a91698 "/usr/include/eigen3/Eigen/src/Core/DenseCoeffsBase.h", 
    line=408, 
    function=0x555555a9d2e0 <Eigen::DenseCoeffsBase<Eigen::Block<Eigen::Matrix<double, -1, -1, 0, -1, -1>, 1, -1, false>, 1>::operator[](long)::__PRETTY_FUNCTION__> "Eigen::DenseCoeffsBase<Derived, 1>::Scalar& Eigen::DenseCoeffsBase<Derived, 1>::operator[](Eigen::Index) [with Derived = Eigen::Block<Eigen::Matrix<double, -1, -1>, 1, -1, false>; Eigen::DenseCoeffsBa"...) at assert.c:101
#4  0x0000555555910c4c in Eigen::DenseCoeffsBase<Eigen::Block<Eigen::Matrix<double, -1, -1, 0, -1, -1>, 1, -1, false>, 1>::operator[] (this=0x7fffffffd390, index=1896)
    at /usr/include/eigen3/Eigen/src/Core/DenseCoeffsBase.h:408
#5  0x000055555590e296 in SubjectFixelImport::operator() (this=0x555555e7c240, row=...) at cmd/fixelcfestats.cpp:192
#6  0x00005555558c7a25 in run () at cmd/fixelcfestats.cpp:368
#7  0x00005555558c3e18 in main (cmdline_argc=15, cmdline_argv=0x7fffffffdc28) at ./core/command.h:101
(gdb)

One possibility is that the number of fixels is not equivalent between the various input fixel data files.

I checked the number of fixels and it is the same for the subjects' input data and for the fixel mask, I mean, when I run mrinfo with the corresponding index.mif files, the output of nfixels is exactly the same, So I don't think this is the case.

As part of #1693, fixelcfestats was modified to construct the subject data matrix based on all fixels, and then restrict processing to only those fixels inside the mask by filling subject data outside of the mask with NaNs and propagating the mask information to various functions. Previously, an index remapping was performed so that the subject data matrix would contain as many columns as there were fixels in the mask, and so input fixel indices would need to be projected to internal fixel indices. It appears as though during this change the allocation of the subject fixel data matrix was not properly updated to reflect its requisite larger size when the -mask option is used. Reported in #1543.

Lestropie · 2020-04-15T04:43:58Z

@diagiraldo: Please see #2022.

Similarly to how FWE-corrected p-value files had their names changed from "fwe_pvalue" to "fwe_1mpvalue" in d3f2ad3 / #1543, the filename of the uncorrected p-value data should have the same modification in order to better reflect the contents of the file.

As raised in #2696. Files should have been removed as part of #1543.

Lestropie and others added 30 commits April 3, 2017 14:35

Stats: First steps toward element-wise design matrices

6ee275c

Manually ported from fixel_twi_0.3.15 branch.

fixelcfestats: First compilable version using data import class

be04e40

fixelcfestats: First compilable version with per-fixel design matrix

bafaf70

mrconvert: Neaten template functions

49966ea

fixelcfestats: Various fixes for per-fixel design matrices

69b0417

fixelcfestats: Various changes and fixes

437a3aa

Functions for T-tests were moved into individual GLM classes, to reduce confusion with respect to pre-scaling of contrasts and transposition of matrices.

fixelcfestats: Add default permutation statistics outputs for variabl…

f41f186

…e design matrix experiments

fixelcfestats: Fix and speedup GLM statistics for default permutation

03d6a6f

fixelcfestats: Some code cleanup

d2dde19

fixelcfestats: Playing with memory alignment issues

c14c11e

fixelcfestats: Avoid use of make_shared due to memory alignment

672c0fa

See #957.

fixelcffestats: Forgot to multi-thread default permutation statistics…

6dd4e54

… calculation

mrstats: Fix median value string width

e410c6b

fixelcfestats: Fix memory allocation for nonstationarity correction

4294f2a

GLM: Permit non-finite numbers

2af1e92

NOTE: Not yet tested Detects the presence of non-finite values in either the input data, or element-wise design matrix columns, and omits the relevant rows from the linear regression. Currently only implemented for fixelcfestats.

connectomestats: New option -column

eaac89e

Provides functionality for edge-wise explanatory variables, implemented in the same manner as was done for fixelcfestats.

mrclusterstats: Initial work towards -column option

e3e24b2

Performing substantial re-write of connected components filter in order to make roles and functionalities of different code more clear.

mrclusterstats: -column option

c2e3a6f

Functionality for voxel-wise explanatory variables, just as has already been implemented for fixelcfestats and connectomestats.

Stats: First compilation of multi-contrast support

ca19ceb

Note: None of this code has been tested whatsoever.

vectorstats: Revert outputting to matrix files

eec7e18

vectorstats: First working version with multiple contrasts

3a3af1d

vectorstats: Use standardised stats data import interface

dadea22

vectorstats: Add -column option

ae46e2d

Stats GLM: Fixes for compilation in debug mode

3d56d8d

Lestropie added 10 commits November 27, 2019 09:00

FBA docs: Tweak filesystem paths of final steps

0caabcb

Docs: Add page on FBA mitigating brain cropping

0f72344

Merge branch 'dev' into stats_enhancements

4066dc3

Conflicts: cmd/connectomestats.cpp docs/reference/commands/connectomestats.rst testing/data

Move vectorstats test to appropriate location

3d21ee9

Unit tests: Split across individual test files

5bd2516

GLM: Fix model partitioning for no nuisance

c667ff4

Fixes operation of vectorstats test number 3, where the model is "partitioned" despite possessing no nuisance regressors.

GLM: Change solution for model partitioning with no nuisance regressors

96c31e7

CI: vectorstats: Use pre-generated data

8bdfd65

Rather than randomly generating test data for vectorstats input, use pre-generated test data known to pass tests for statistical significance. Additionally perform precise tests of non-stochastic outputs against pre-generated output data.

Docs: Fix reference for NaN masking in FBA

80b7677

CI: Add tolerance to vectorstats tests

f1dc3ed

Lestropie merged commit f524782 into dev Nov 28, 2019

Lestropie deleted the stats_enhancements branch November 28, 2019 00:38

This was referenced Jan 25, 2020

Stats: Handle NaNs #817

Closed

Support one-sampled t-tests in fixelcfestats and mrclusterstats #690

Closed

Lestropie mentioned this pull request Mar 12, 2020

Stats: Modify import of cohort data #1977

Merged

Lestropie mentioned this pull request Apr 17, 2020

[BUG] fixelcfestats: -mask fix #2022

Merged

Lestropie mentioned this pull request Jan 27, 2021

Refactoring of statistical inference code #2269

Draft

Lestropie mentioned this pull request Aug 29, 2023

Unused files in the codebase #2696

Closed

Lestropie added a commit that referenced this pull request Aug 30, 2023

Remove files src/stats/permstack.*

8915aab

As raised in #2696. Files should have been removed as part of #1543.

Lestropie mentioned this pull request Aug 30, 2023

Remove files src/stats/permstack.* #2699

Merged

Lestropie added a commit that referenced this pull request Aug 30, 2023

Remove files src/stats/permstack.*

86ab4ba

As raised in #2696. Files should have been removed as part of #1543.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GLM enhancements & surrounding code#1543

GLM enhancements & surrounding code#1543
Lestropie merged 211 commits intodevfrom
stats_enhancements

Lestropie commented Feb 1, 2019 •

edited

Loading

Uh oh!

Lestropie commented Nov 28, 2019

Uh oh!

diagiraldo commented Mar 11, 2020 •

edited

Loading

Uh oh!

Lestropie commented Mar 12, 2020

Uh oh!

diagiraldo commented Apr 14, 2020 •

edited

Loading

Uh oh!

Lestropie commented Apr 15, 2020

Uh oh!

diagiraldo commented Apr 15, 2020

Uh oh!

Lestropie commented Apr 15, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

Lestropie commented Feb 1, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

GLM:

fixelcfestats:

New commands:

Uh oh!

Lestropie commented Nov 28, 2019

Uh oh!

diagiraldo commented Mar 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Lestropie commented Mar 12, 2020

Uh oh!

diagiraldo commented Apr 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Lestropie commented Apr 15, 2020

Uh oh!

diagiraldo commented Apr 15, 2020

Uh oh!

Lestropie commented Apr 15, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Lestropie commented Feb 1, 2019 •

edited

Loading

`fixelcfestats`:

diagiraldo commented Mar 11, 2020 •

edited

Loading

diagiraldo commented Apr 14, 2020 •

edited

Loading