Skip to content

GLM enhancements & surrounding code#1543

Merged
Lestropie merged 211 commits intodevfrom
stats_enhancements
Nov 28, 2019
Merged

GLM enhancements & surrounding code#1543
Lestropie merged 211 commits intodevfrom
stats_enhancements

Conversation

@Lestropie
Copy link
Copy Markdown
Member

@Lestropie Lestropie commented Feb 1, 2019

Since there's plenty of demand for it, I'll see how I go getting this incorporated into RC4.

Currently there are new unit tests for vectorstats, which primarily test the internals of the GLM. Generating CI tests for connectomestats / fixelcfestats / mrclusterstats is feasibly possible, just haven't put the energy into them; unlike the GLM internals the relevant code hasn't changed a great deal from old code anyway.


Major things that are in here:

GLM:

  • Freedman-Lane method rather than Shuffle-X Shuffle-Y / Manly.
    (Edit: While in the code it was the design matrix that was shuffled, the effect of interest vs. nuisance regressors were not separated, and hence the empirical behaviour was equivalent to instead shuffling the data, which makes it Shuffle-Y).

  • Multiple hypothesis testing (i.e. contrast matrix as opposed to vector).

  • F-tests.

  • Per-element design matrices; can have:

    • Nuisance regressors, or indeed effects of interest, where the value is unique for each fixel / voxel / connectome edge tested;

    • Non-finite subject data, in which case the relevant design matrix rows are scrubbed prior to inverting the model.

  • Sign-flipping in addition to / instead of permutations; enables one-sample t-tests.

fixelcfestats:

  • Normalised form of CFE equation. Provides effect comparable to non-stationarity correction without computational penalty. Note however that the recommended FBA processing may need to change here: both this mechanism and permutation-based non-stationarity correction are sensitive to masking.

  • Fixel-fixel connectivity matrix generation ~ 4-5 times faster and requires ~ 4-5 times less RAM.

  • Multi-threaded other stages of processing.

New commands:

  • fixelconnectivity: Generate a fixel-fixel connectivity matrix (and optionally a second matrix intended for data smoothing), and write to file. Current file format used does however incur decent performance hit on load/save.

  • fixelfilter: Perform filtering operations on fixel data. This includes smoothing, and a preliminary connected-component algorithm. Can operate on individual fixel data files, or all data files within a fixel format directory.

Use of the two commands above can be followed by running fixelcfestats but providing pre-smoothed fixel data and a pre-calculated fixel-fixel connectivity matrix. If fixelcfestats is provided with a tractogram file as per current usage, it will generate the connectivity matrices and perform fixel data smoothing during data import as per current behaviour.

Lestropie and others added 30 commits April 3, 2017 14:35
Manually ported from fixel_twi_0.3.15 branch.
Previously, all images in the fixel directory were opened, and then is_directions_file() was called, which checked both the image properties and the file name.
Apart from being inefficient, this also clogs -info output as the directory is scanned.
This change validates that the name of a file is a plausible directions file before instantiating a Header and checking the image properties.
Functions for T-tests were moved into individual GLM classes, to reduce confusion with respect to pre-scaling of contrasts and transposition of matrices.
- Incorrect definition of non-const references to global matrices during calculation of default statistics resulting in zero stdev and std_effect_size.
- Incorrect calculation of degrees of freedom in GLMTTestVariable::ttest() resulting in infinite variance and zero t-values.
NOTE: Not yet tested
Detects the presence of non-finite values in either the input data, or element-wise design matrix columns, and omits the relevant rows from the linear regression.
Currently only implemented for fixelcfestats.
Provides functionality for edge-wise explanatory variables, implemented in the same manner as was done for fixelcfestats.
Performing substantial re-write of connected components filter in order to make roles and functionalities of different code more clear.
The roles of different parts of the code have been made more clear by encapsulating them within appropriately-named classes.
The functionality of mapping each voxel position to an index in a 1D vector of data now resides in the Voxel2Vector class, placed in core/misc since it is used both by the connected components filter, and the statistics code.
core/bitset.h has also been moved into this new core/misc/ directory, since it's just a utility class that isn't as fundamental to the compilation of MRtrix3 as the other contents of core/.
Functionality for voxel-wise explanatory variables, just as has already been implemented for fixelcfestats and connectomestats.
Note: None of this code has been tested whatsoever.
… into stats_elementwise_design_matrix

Some tricky merging happening here, since adding multiple contrast
matrix row support conflicted quite heavily with other changes that had
happened in this branch e.g. support of NaN values in the data.
- Provide a text string describing the fact that a column of ones is not automatically added to GLM design matrices, and add this string to the DESCRIPTION field of all statistical inference commands. Also fix up some code comments regarding the purpose of this column.
- Provide a test for vectorstats; its output is not yet tested however.
- Modify how multiple contrasts are handled. Initial support for this was previously implemented for vectorstats only, based on the contrast being a matrix rather than a vector. However, this framework does not naturally extend to F-tests. This new approach stores each within a GLM::Contrast class, and these are explicitly looped over whenever required.
GLM t-test code has been replaced with the F-test (i.e. without variance groups) as presented in Winkler et al., 2014. This has removed many optimisations, but was necessary in order to sufficiently generalise the framework for upcoming enhancement.
Changed stdev calculations in statistical inference commands to a vector, since this does not change betwee contrasts.
- Removed apparent redundant creation of output images related to the default permutation in mrclusterstats.
- Model is currently partitioned based on null columns of the contrast matrix only.
Note that this is merely the first COMPILING version of this code.
Conflicts:
	cmd/connectomestats.cpp
	docs/reference/commands/connectomestats.rst
	testing/data
Fixes operation of vectorstats test number 3, where the model is "partitioned" despite possessing no nuisance regressors.
Rather than randomly generating test data for vectorstats input, use pre-generated test data known to pass tests for statistical significance. Additionally perform precise tests of non-stochastic outputs against pre-generated output data.
@Lestropie Lestropie merged commit f524782 into dev Nov 28, 2019
@Lestropie Lestropie deleted the stats_enhancements branch November 28, 2019 00:38
@Lestropie
Copy link
Copy Markdown
Member Author

🤞

@diagiraldo
Copy link
Copy Markdown

diagiraldo commented Mar 11, 2020

Hi @Lestropie
I am trying to run the new fixelcfestats in the dev branch but it produced an error I consider a bit weird:

  • The directory with subject fixel values (already smoothed) is something like PATH_1/smooth_log_fc
  • The text file listing the subjects is in another directory, something like PATH_2/list_of_subjects.txt
  • When I pass all the inputs to fixelcfestats, It produces the following error:
fixelcfestats: Number of fixels in template: 1015842
fixelcfestats: Number of fixels in mask: 874231
fixelcfestats: [done] Importing data from files listed in "list_of_subjects.txt"
fixelcfestats: [ERROR] Unable to find subject image "PATH_2/subject_01.mif" either in input fixel diretory "PATH_1/smooth_log_fc" or in current working directory
fixelcfestats: [ERROR] Reading text file "list_of_subjects.txt": input image data file not found: "PATH_2/subject_01.mif"

I checked the text file list_of_subjects.txt and it only contains the identifiers with the .mif extension, so the error is not because of the content inside that file.
It gives me the impression that somewhere the code is concatenating the list path with the file identifier when it initializes the fixel importing.

At this point I am not sure if this is the expected behavior and the list file is supposed to be inside the input fixel directory or if this is some kind of bug.

@Lestropie
Copy link
Copy Markdown
Member Author

I believe the problem is that the directory location of the subject list text file is being used as the reference location, rather than the current working directory. The bit of the error message that says "... or in current working directory" is maybe not accurate.

Moving list_of_subjects.txt from PATH_2/ into the parent directory should get it to work.

I probably want to remove this line of code entirely.

@diagiraldo
Copy link
Copy Markdown

diagiraldo commented Apr 14, 2020

Hi again!
Yep, I tried moving the list file and then updating the repository and the command moved forward... to another error:

fixelcfestats: [DEBUG] initialising threads...
fixelcfestats: [DEBUG] launching 10 threads "loop threads"...
fixelcfestats: [DEBUG] waiting for completion of threads "loop threads"...
fixelcfestats: [DEBUG] threads "loop threads" completed OK
fixelcfestats: [DEBUG] unmapping file "results_log_fc/directions.mif"
fixelcfestats: [DEBUG] image "results_log_fc/directions.mif" unloaded
fixelcfestats: [DEBUG] unmapping file "out_smooth_log_fc/directions.mif"
fixelcfestats: [DEBUG] image "out_smooth_log_fc/directions.mif" unloaded

fixelcfestats: [SYSTEM FATAL CODE: SIGSEGV (11)] Segmentation fault: Invalid memory access

This happens when I run the fixelcfestats command with a conservative mask containing 1896 fixels with the -notest flag. First, I thought it was a memory issue but it happens exactly the same way in a machine with 256 GB of available memory.
Then I tried the Advanced debugging instructions with these results:

MRtrix build type requested: debug version with asserts, nooptim

Detecting OS: linux
Looking for compiler [clang++]: not found
Looking for compiler [g++]: g++ (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Checking for C++11 compliance: ok
Checking shared library generation: ok
Detecting pointer size: 64 bit
Detecting byte order: little-endian
Checking for variable-length array support: ok
Checking for non-POD variable-length array support: ok
Checking for ::max_align_t: 16 bytes
Checking for std::max_align_t: 16 bytes
Checking for Eigen3 library: 3.3.4
Checking for Eigen3 Unsupported: Present
Checking for zlib compression library: 1.2.11
Checking for "JSON for Modern C++" requirements: ok
Checking for TIFF library: LIBTIFF, Version 4.0.9
Checking for PNG library: Header: 1.6.34; library: 1.6.34
Checking for FFTW library: fftw-3.3.7-sse2-avx
Checking for Qt moc: moc (version 4.8.7)
Checking for Qt qmake: qmake (version 4.8.7)
Checking for Qt rcc: rcc (version 4.8.7)
Checking for Qt: 4.8.7
writing configuration to file './config': ok

when I ran the command with gdb:

diana@expe:/mnt/md0/mrtrix3$ gdb --args bin/fixelcfestats -force out_smooth_log_fc list_of_subjects.txt design.txt contrast.txt template/test_fixel_connectivity results_log_fc -mask test_fixel_mask/th_1.mif -info -notest -debug
GNU gdb (Ubuntu 8.1-0ubuntu3.2) 8.1.0.20180409-git
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from bin/fixelcfestats...done.

and nothing else happens, so I tried it with valgrind:

fixelcfestats: /usr/include/eigen3/Eigen/src/Core/DenseCoeffsBase.h:408: Eigen::DenseCoeffsBase<Derived, 1>::Scalar& Eigen::DenseCoeffsBase<Derived, 1>::operator[](Eigen::Index) [with Derived = Eigen::Block<Eigen::Matrix<double, -1, -1>, 1, -1, false>; Eigen::DenseCoeffsBase<Derived, 1>::Scalar = double; Eigen::Index = long int]: Assertion index >= 0 && index < size()' failed.
==31391== 
==31391== Process terminating with default action of signal 6 (SIGABRT)
==31391==    at 0x7138E97: raise (raise.c:51)
==31391==    by 0x713A800: abort (abort.c:79)
==31391==    by 0x712A399: __assert_fail_base (assert.c:92)
==31391==    by 0x712A411: __assert_fail (assert.c:101)
==31391==    by 0x4C4C4B: Eigen::DenseCoeffsBase<Eigen::Block<Eigen::Matrix<double, -1, -1, 0, -1, -1>, 1, -1, false>, 1>::operator[](long) (DenseCoeffsBase.h:408)
==31391==    by 0x4C2295: SubjectFixelImport::operator()(Eigen::Block<Eigen::Matrix<double, -1, -1, 0, -1, -1>, 1, -1, false>) const (fixelcfestats.cpp:192)
==31391==    by 0x47BA24: run() (fixelcfestats.cpp:368)
==31391==    by 0x477E17: main (command.h:101)
==31391== 
==31391== HEAP SUMMARY:
==31391==     in use at exit: 1,199,383 bytes in 1,873 blocks
==31391==   total heap usage: 10,365,194 allocs, 10,363,321 frees, 196,577,044 bytes allocated
==31391== 
==31391== LEAK SUMMARY:
==31391==    definitely lost: 0 bytes in 0 blocks
==31391==    indirectly lost: 0 bytes in 0 blocks
==31391==      possibly lost: 0 bytes in 0 blocks
==31391==    still reachable: 1,199,383 bytes in 1,873 blocks
==31391==         suppressed: 0 bytes in 0 blocks
==31391== Rerun with --leak-check=full to see details of leaked memory
==31391== 
==31391== For counts of detected and suppressed errors, rerun with: -v
==31391== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Aborted (core dumped)

But honestly, I have no idea if this error is due to insufficient memory or something related with the Eigen library
I did all this on Ubuntu 18.04.2.

@Lestropie
Copy link
Copy Markdown
Member Author

when I ran the command with gdb: ... and nothing else happens

Did you type "r" and hit ENTER to run the command as per point 6?

One possibility is that the number of fixels is not equivalent between the various input fixel data files. Regardless of which fixels are or are not included in the mask, the sizes of the fixel data files still need to be equivalent to both the number of fixels in the index image and the fixel mask image. If this is the case I can add a check so that the error message is more informative.

@diagiraldo
Copy link
Copy Markdown

Did you type "r" and hit ENTER to run the command as per point 6?

No, I didn't 🤦‍♀️
When I do it, this is the message:

fixelcfestats: /usr/include/eigen3/Eigen/src/Core/DenseCoeffsBase.h:408: Eigen::DenseCoeffsBase<Derived, 1>::Scalar& Eigen::DenseCoeffsBase<Derived, 1>::operator[](Eigen::Index) [with Derived = Eigen::Block<Eigen::Matrix<double, -1, -1>, 1, -1, false>; Eigen::DenseCoeffsBase<Derived, 1>::Scalar = double; Eigen::Index = long int]: Assertion `index >= 0 && index < size()' failed.

Thread 1 "fixelcfestats" received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007ffff5766801 in __GI_abort () at abort.c:79
#2  0x00007ffff575639a in __assert_fail_base (fmt=0x7ffff58dd7d8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", 
    assertion=assertion@entry=0x555555a916cd "index >= 0 && index < size()", file=file@entry=0x555555a91698 "/usr/include/eigen3/Eigen/src/Core/DenseCoeffsBase.h", 
    line=line@entry=408, 
    function=function@entry=0x555555a9d2e0 <Eigen::DenseCoeffsBase<Eigen::Block<Eigen::Matrix<double, -1, -1, 0, -1, -1>, 1, -1, false>, 1>::operator[](long)::__PRETTY_FUNCTION__> "Eigen::DenseCoeffsBase<Derived, 1>::Scalar& Eigen::DenseCoeffsBase<Derived, 1>::operator[](Eigen::Index) [with Derived = Eigen::Block<Eigen::Matrix<double, -1, -1>, 1, -1, false>; Eigen::DenseCoeffsBa"...) at assert.c:92
#3  0x00007ffff5756412 in __GI___assert_fail (assertion=0x555555a916cd "index >= 0 && index < size()", file=0x555555a91698 "/usr/include/eigen3/Eigen/src/Core/DenseCoeffsBase.h", 
    line=408, 
    function=0x555555a9d2e0 <Eigen::DenseCoeffsBase<Eigen::Block<Eigen::Matrix<double, -1, -1, 0, -1, -1>, 1, -1, false>, 1>::operator[](long)::__PRETTY_FUNCTION__> "Eigen::DenseCoeffsBase<Derived, 1>::Scalar& Eigen::DenseCoeffsBase<Derived, 1>::operator[](Eigen::Index) [with Derived = Eigen::Block<Eigen::Matrix<double, -1, -1>, 1, -1, false>; Eigen::DenseCoeffsBa"...) at assert.c:101
#4  0x0000555555910c4c in Eigen::DenseCoeffsBase<Eigen::Block<Eigen::Matrix<double, -1, -1, 0, -1, -1>, 1, -1, false>, 1>::operator[] (this=0x7fffffffd390, index=1896)
    at /usr/include/eigen3/Eigen/src/Core/DenseCoeffsBase.h:408
#5  0x000055555590e296 in SubjectFixelImport::operator() (this=0x555555e7c240, row=...) at cmd/fixelcfestats.cpp:192
#6  0x00005555558c7a25 in run () at cmd/fixelcfestats.cpp:368
#7  0x00005555558c3e18 in main (cmdline_argc=15, cmdline_argv=0x7fffffffdc28) at ./core/command.h:101
(gdb) 

One possibility is that the number of fixels is not equivalent between the various input fixel data files.

I checked the number of fixels and it is the same for the subjects' input data and for the fixel mask, I mean, when I run mrinfo with the corresponding index.mif files, the output of nfixels is exactly the same, So I don't think this is the case.

Lestropie added a commit that referenced this pull request Apr 15, 2020
As part of #1693, fixelcfestats was modified to construct the subject data matrix based on all fixels, and then restrict processing to only those fixels inside the mask by filling subject data outside of the mask with NaNs and propagating the mask information to various functions. Previously, an index remapping was performed so that the subject data matrix would contain as many columns as there were fixels in the mask, and so input fixel indices would need to be projected to internal fixel indices. It appears as though during this change the allocation of the subject fixel data matrix was not properly updated to reflect its requisite larger size when the -mask option is used.
Reported in #1543.
@Lestropie
Copy link
Copy Markdown
Member Author

@diagiraldo: Please see #2022.

Lestropie added a commit that referenced this pull request Sep 11, 2020
Similarly to how FWE-corrected p-value files had their names changed from "fwe_pvalue" to "fwe_1mpvalue" in d3f2ad3 / #1543, the filename of the uncorrected p-value data should have the same modification in order to better reflect the contents of the file.
Lestropie added a commit that referenced this pull request Aug 30, 2023
As raised in #2696.
Files should have been removed as part of #1543.
Lestropie added a commit that referenced this pull request Aug 30, 2023
As raised in #2696.
Files should have been removed as part of #1543.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants