dwi2response dhollander: add option to select version#1689
Conversation
There have been recent changes to the Dhollander algorithm that alter the results produced. This allows the previous behaviour to be used, and defaults to the old behaviour for now.
|
Wow, critical and everything, all the big guns. I sense all sorts of tension and excitement... Let's just keep the heads cool and look at this, right. This could basically be resolved with a < 30 min live (video/audio) chat, I reckon, which might save us hours of typing essays, stress, miscommunications of all sorts and the like. I'm equally limited for time, and I'm guessing you're having fun ironing out critical stuff to have a reliable code base to run a workshop; so everyone gains from efficiency.
Ok, let's get all the awkwardness just out of the way so we can focus on the rest. Nobody's perfect, we've all made some mistakes and now we're here at this time. To just get it off my chest and over with, so we can move on: what irks me a bit is more that you were already essentially bringing this up "internally" (for lack of a better word) all the way back in July. I know and understand that typing up such a long post and preparing for tensions is never a thing to look forward to, but now we're here with time limits which never make it any better. The long text wasn't really needed, I'd say: you can just chat to me and we could be done in half an hour. Just like others did at ISMRM after my talk when they still had questions, etc... I'd like to just now constructively look into this: if this is all effectively an honest attempt to resolve this efficiently (which I'll put my trust in) and make sure you understand / appreciate that the new algorithm is effectively robust (believe me, it is) and generalises very broadly (believe me, it does), then I'll just sum up the essence here now with the little time I have at the very moment (I'd really like to go home at this time), but then please be a bit more specific about e.g. what kind of data you might still wonder about; because you're otherwise asking me to produce basically several papers of data here, which is really not very realistic at the moment. Believe me, I've got nothing to hide, you can just ask, but please specify. Quickly walking through some of your bits, so I don't overlook the gist:
Just speak to me, really. It's not super complicated, but the talk always makes it slightly rushed for sure. Additionally (and to somewhat counter the fact that saying "it's not super complicated" might accidentally offend a bit; not meant to), I can't know for sure, but I can see you might be stuck in the same thinking initially that I was in for a long time too; in fact, we might all be, since we've been convinced by our earlier discussions you referred to, when we kind of tuned/reasoned the "final" version of the
I have to add though that it's not "truly" superseding. The setting here is fundamentally different since the input data is now not restricted to just 1 b-value (while it was restricted to that in those earlier discussions; so the reasoning there only exists in a universe that's a priori restricted by that). This does make all the difference; since the 2-tissue WM-CSF thing that goes on in this new 2019 version is absolutely key + the fact that it's normalised to sum to 1 in the metric (which you can't really do like that with 1 tissue, due to 1 b-value). Well, actually, you can in a way though, and that's what lead me to the insight of where our reasoning before has a sneaky flaw. I'll give you this as something essential to get you thinking here: you'll surely remember the whole discussion on whether to initialise from a strictly sharp(er than the final) response or rather a strictly fat(ter than the final) response, right? We ended up going for sharp(er), which makes sense (still does; not questioning that; my 2019 version also uses an "extreme" sharpest disk). Here comes the twist: it's the iterating aspect that doesn't make sense, or more specifically the iterating aspect that comes from a too-sharp response and then iteratively would converge to the true response. That's simply not what's happening in the So how is this possible then; where does the theory blow up? Or actually more so: how can it ( Not sure if you followed all of that, but that's a key sort of insight to move beyond some aspects of the logic of the Alright, so I did accidentally already end up typing up a small essay... Anyhow, at this point I should further comment on:
No no, definitely not. You've actually picked a commit there at the point I was designing this multi-shell kind of extension with the So 2 important things at this point:
Ok, so half of the essay is already above in any case. This can really be done far more efficiently in a simple 30 minute live chat, with that little triangle diagram from the presentation on which I can convince you of all reasoning right there, with a mouse pointer to, well, point things out. Apart from all the bits above, some random stuff, if it helps at all:
Once more about those inner workings themselves: the code it really simple here; maybe more simple than it appears at first sight. It's essentially just that one key metric; all the rest is bulk to make that happen in practice. The talk has it literally all on those 2 slides where its explained; there's nothing more to it.
Well, I hope I have already provided some input above...
I was hoping to type less, but now that it sits there typed up in any case... well, at least that's done then.
Yeah of course. But the abstract is really open about it for sure though: there's no hidden mysteries. The code it simple, relatively short, completely open. Even the talk is available publicly. And at this point, this long post too. I don't often see this much materials here or anywhere in fact.
Yeah, even if I say so myself, I think you'd be providing the users with a good service by offering a more robust, accurate (better fit), and as a bonus faster algorithm by default. And if users get better MSMT-CSD fit due to that, it also reflects good on that algorithm of course.
Well, another final time my 2 cents here: I think you can safely move to the new algorithm regardless, but I don't have the time, energy or desire to fight long battles over it. If you need more info, I'm still available for a 30 min chat; I'm long past caring about awkwardness. As I said, there's no secrets here, I can answer whatever question (if by now all of the above doesn't already answer everything). |
|
One I forgot yesterday, just for completeness sake: it also doesn't depend on spatial operations or proximity of stuff, i.e. like the I might've still missed highlighting other benefits or particularities; but as mentioned before, just ask whatever or organise a chat if you're after a fast and constructive resolution. |
|
[EDIT: slip of the finger closed this issue - reopening and completing this post below] You're quite right, I could and should have dealt with this when I first became aware of it. But by the same token, this discussion could and should have taken place prior to merging to As to your approach itself: thanks for taking the trouble to explain the logic behind it, it all makes a lot more sense to me now, and I agree with pretty much everything you say. I'm quite happy to support its inclusion into the next release, potentially as the new default - but there's a few issues to work out before we can do this. First off, if I can recap how I understand the method works (ignoring the second lmax=6 run):
I can see the advantages of this approach: the use of MSMT-CSD indeed means we can exploit other information than the orientation domain to identify the purest WM voxels - and the slowest decaying voxels indeed sounds like quite a good heuristic. Even as a measure of WM-ness, the ratio of WMl=0 / ( WMl=0 + CSFl=0 ) seems like a good indicator: it promotes voxels with the highest proportion of slow decaying signal (assuming WM here refers to the ODF estimated using the idealised response). On top of that, the use of the WM ODF peak in the numerator additionally promotes single-fibre voxels. And there is indeed no influence of bias field or other sources of intensity inhomogeneity (including susceptibility artefacts). So I can see the rationale and the appeal. There are still however a couple of issues I'd like to look into:
But hopefully these are relatively minor. I'll experiment a little bit to get a better feel for it. More to the point though: if we do decide to change the default behaviour to your new approach, what are the consequences for downstream applications? One I'm particularly concerned about is the suitability of the threshold for tractography: if the WM response comes out smaller, then the WM ODF ought to come out larger. Does this mean we have to revisit the default cutoff for tractography like we did in #1228? Are there any other unexpected consequences we haven't thought of...? |
Good to hear 👍
Looks all generally correct. I'll walk through each bit briefly just to tick them all off. Wrt the lmax's: it's basically 2 times the exact same thing, just with a different lmax indeed; first 2, then 6. The lmax=2 case is run only for the "safe WM" set of voxels at that point in the algo, whereas the lmax=6 case is run only for the best voxels already emerging from the lmax=2 case. This whole setup is just to further boost the robustness, find a good balance for the final set of voxels, and as another small side bonus boost speed a bit further (i.e. lmax=6 doesn't have to be run for all WM voxels). Now each step:
Yep, generally correct. Minor details:
That relative location is crucial, since the projection of all signals on that eWM-CSF spectrum should not be cut short at the eWM-end of things, since we'll need that range there to effectively be part of our metric. Note there's still no worries even if in the sketch the eWM-CSF line would sit slightly "below" the WM-CSF line, as long as eWM at least sits strictly to the top-left side of WM. The latter is ensured by the definition of the eWM response. Also definitely not this is just a sketch for reasoning: you can't draw this exact thing with these straight lines and perpendicular anisotropy-SDM axes. The reality is at least a projective transform of the coordinates; but that's a different story. Doesn't break the theory or above reasoning though; so the sketch is a useful reasoning tool.
Yes, correct. The normalisation to total per-tissue signal is what gets rid of signal magnitude dependence (and bias fields, etc...); but also what allows the rest to be as simple as only the peak amplitude (and not needing a ratio of peaks). If the position on the eWM-CSF spectrum is the same for 2 voxels' signals, this normalisation allows the peak amplitude to be a direct assessment of single-fibre-ness. So we get combined non-CSF-ness (or "tissue-ness"), (e)WM-ness, and single-fibre-ness; all in one metric.
Top 0.5%, computed as percentage of (numbers of voxels of) the safe WM mask, for the final lmax=6 case. For the first lmax=2 case, it's twice that, so 1% of safe WM. Users can control the final percentage (the one that's by default 0.5%); but just as with the percentages for the other tissues, there's no good reason to actually change from the default percentages, unless there's some evidence that there's e.g. a severe lack (percentage-wise) of good SF-WM voxels relative to all WM. For adult-sized brains and voxel sizes of 2.5 mm isotropic, 0.5% ends up being 150~250 or so voxels; and so it scales for other resolutions or brain (WM) sizes.
In the I've since moved on for distribution to collaborators and (SS3T-CSD, etc..) beta testers to having the orientation come from the peak orientation of the final lmax=6 step. The difference is minor though; the main reason is the idea / thought of not using the tensor, because "just because". Even if the lmax=6 peak orientation differs a small bit from the principal eigen vector orientation for a few voxels, it's still just a few voxels out of a couple of hundred. So just kind of the good feeling of not using the tensor, I guess.
Yep, as covered in each individual point above.
That's in line with what I'm observing too for the normal adult human cases; and sometimes different scales (but not too different) in other scenarios. All towards lower amplitude though. There's 2 main things at play (and maybe lots of more minor ones too):
So then we arrive at that point indeed. The answer is the same as for the previous point: the combination of my 2 points above. It might differ in individuals, but a prevalent pattern of the
Yep, spot on. I'm already advising all people that use it to consider higher thresholds (again) for tractography and fixel segmentation (e.g. in FBA, on the template). The difference magnitude you were observing as well, lands initial "default" good thresholds to consider in the 0.06~0.07 range, depending on exact cases and preference for a bit more relaxed or stricter outcome. Funny "beneficial" side effect is that this then comes at least a bit closer again to the " However, because of this amplitude trickiness between response function algorithms or estimation sources, and other steps too, I'm even further advising people to always run |
|
Just trying to get to grips with the difference in amplitude issue. Here's a screenshot of the voxels selected for the WM response estimation using the new Looks like the new version selects more of the genu of the corpus callosum in particular, and also selects fewer voxels near the edge of the corpus callosum. Neither versions seem to pick much of the splenium in this case. In general, both approaches give reasonable voxel selections - I'd be hard pressed to pick one over the other on that basis alone. But I'm still struggling to understand how that results in a lower amplitude WM response, across all b-values... Looking at the per-shell mean DW signal images though, I can see that the signal in the genu is indeed substantially lower than the midbody (~280 vs. ~350), which might be enough to explain the bulk of the amplitude difference. I also note that the old version did pick a few voxels in the CC that were a bit too close to CSF for comfort, which could drive the b=0 signal further up - which I can observe in the output responses, above and beyond the overall difference in amplitude. So from that limited dataset, the output of the new version looks perfectly reasonable. The fact that the old version picked a few voxels with potential CSF contamination might be enough to explain the poorer fit in the splenium and other regions: the estimated decay is not as rapid as it would need to be (due to CSF contamination), and since that's the slowest decaying component, there's no way to perfectly fit the signal in these voxels properly. All this argues for moving to the new version by default. So assuming we go with that, we need to discuss:
Any other thoughts welcome... |
Yes, that's generally in line with the pattern I'm seeing for both methods across different datasets. Sometimes the difference is a bit more outspoken, sometimes a bit less, but the relative trends of patterns are similar.
Yep, as above. How much of the splenium ends up being picked by the
True, at least in the traditional sense of thinking about single-fibre voxels; which tends to indeed ignore the signal decay / diffusivity. We're used to look for areas that have a single peak in the FOD; but that's not the complete picture if the (SF-)WM response needs to represent the "purest" (lowest signal decay) WM. It's part of the picture though. I'd also argue that, independent of voxel selection, there's nothing really wrong in the end with the single-b-value response that the
Well, I think most explanations are in what you describe here indeed. Note that the Note this also explains why the new algorithm is able to find a set of voxels with (sometimes a lot) lower signal decay: the
Yep, exactly. In terms of my "triangle" / simplex kind of sketch: it increases the range of signals being able to be fit (with lower residuals). For an extreme example of what happens when the fit is bad due to too-high signal decay of the WM response: see the baby dataset result in the abstract. The gist is that it ends up underestimating the WM compartment size (of "AFD") in those lower signal decay voxels. The same happens in any similar case (but not always that extreme in effect).
What I'm distributing to collaborators / beta testers / ... is But in any case, apart from interface details, what's mostly on my mind is to offer the newer version by default when
The gist (but this is worth thinking about well of course):
I'm definitely communicating it extensively wherever I can, in whatever I distribute. It's big bang time in any case.
Well, so probably not so much changes (FBA pipeline is fine, I reckon). Just |
This also (almost) fixes handling of location with spaces in the filename (#1749)
|
OK, that last commit introduces a More importantly, this also introduces the fix for #1749 - but it still doesn't work as it should. Here is the relevant extract from the error dump: As you can see, the issue is that the argument passed to |
This brings it in line with the fa algorithm
|
OK, that last commit renames the Can't use the |
The problem is that path.from_user() quotes the path by default, which
is the right thing to do most of the time since its output is supposed
to be inserted into the command _string_. But when its output is used
as the mrconvert_keyval argument, it gets inserted into the command
_list_ (run.py: 315), and so shouldn't be quoted. Note this is only a
problem for paths that do need to be quoted, which is why this issue
hasn't come up so far.
More generally, there is a problem that:
quote (quote ('some problematic string'))
will not match:
quote ('some problematic string')
since it'll try to quote whatever additions the previous invocation did
to properly quote the original string. I'm not sure whether there's any
way around that...
Note that this will likely need replication in other commands too.
This changes means that if -number is set to a non-default value, the number of voxels considered at the next iteration scales with it. It was previously hard-wired to 3000, which clearly wouldn't be appropriate if -number was 3000 or larger. This also means that we can remove the explicit -iter_voxels option in the dwi2reponse dhollander call when invoking an alternative -wm_algo, without any change in its behaviour. This makes it possible to use the fa algorithm (would previously fail due to unknown -iter_voxels option). Note that it still can't use the tax algorithm, since it can't handle the -number option.
|
OK, I reckon this is good to go - but required a few changes elsewhere to behave sensibly. See details in the commits. |
I'd rather catch that so that it simply doesn't pass that option down if the |
- Whitelist of single-fibre WM response function estimation algorithms for "dhollander" and "msmt_5tt". - Minor tweaks to documentation. - Import new test data to verify dwi2response dhollander -wm_algo operation. - Update dwi2response tournier tests to reflect 0585d50.
Conflicts: lib/mrtrix3/run.py testing/scripts/data


This is not supposed to be final, but we need to be able to use the previous implementation of
dwi2response dhollander, and to default to the previous behaviour until we've had a chance to properly review and assess the recent improvements made by @thijsdhollander. Unfortunately, these changes made it ontodevwithout proper review, due to their inclusion into #1449 (the extensive changes to the Python scripting library), which was reviewed purely on the basis that the changes were necessary to port existing commands to the new API, rather than changing behaviour as such. My mistake on that one, I hadn't cottoned on to the fact that the improvements being proposed to this algorithm (on #1545) were more than merely cosmetic.All this PR does is add a
-algooption to allow choosing the particular version to be used (2016 or 2019, the dates of the respective abstracts), and sets the default to 2016 for now. The default is not set in stone, and I'd like to review it properly ASAP: if these changes are theoretically sound and lead to genuine improvements in the response produced and the subsequent MSMT-CSD outputs, we should use them by default (I note the computation is also quicker, which would be an added bonus). I note the results produced are not identical tomaster(presumably due to other more minor refinements), but they're close enough that I wouldn't be worried about them.These improvements were presented at ISMRM 2019, but I for one didn't feel I understood the rationale sufficiently to make an informed decision. I'd like to review the changes in line with the discussions we had back in #422 & #426 (which incidentally, the changes under discussion here supersede). As far as I can tell, the changes are more or less entirely self-contained to 2e82541...?
While this PR can be merged as-is for the time being, there are two issues to discuss:
the interface to select the version of the algorithm to be used. I've added a
-algooption with a default of 2016, but this can be done a number of ways. We could have a single-improved(or similar) option with no argument to switch to the new behaviour. Maybe more interesting would be to add the new single-fibre response estimator (which @thijsdhollander's changes effectively introduce) as a full-blown algorithm in its own right, and add a-sfwm_estimatoroption (or similar) to thedhollanderalgorithm, defaulting (for now) totournier. Personally, I reckon this last approach would do the most justice to @thijsdhollander's work on this.the actual inner workings of the new single-fibre response estimator, the rationale behind the decisions made, evidence that the responses obtained are indeed better, and evidence that it is at least as robust as the current approach on a wide range of inputs. This is the bulk of what I would have liked to have seen discussed prior to merging these changes in the first place.
@thijsdhollander, your input here is obviously critical. I expect you won't be ecstatic at the prospect of having to provide all this additional validation, but I really think it's important to provide as much evidence as is necessary so we can all be confident of the methodology and be happy to endorse it as the default (assuming that's the eventual outcome), and to document it openly for any interested users. It's a bit late in the day for this to be included as the default in the next release, but there is still scope for it if we can move on this quickly. We can certainly change the handling of the algorithm switch (as per the first issue to discuss above) if my proposal isn't deemed adequate.