Skip to content

dwi2response dhollander: add option to select version#1689

Merged
jdtournier merged 9 commits intodevfrom
dwi2response_dhollander_version_switch
Jan 20, 2020
Merged

dwi2response dhollander: add option to select version#1689
jdtournier merged 9 commits intodevfrom
dwi2response_dhollander_version_switch

Conversation

@jdtournier
Copy link
Copy Markdown
Member

This is not supposed to be final, but we need to be able to use the previous implementation of dwi2response dhollander, and to default to the previous behaviour until we've had a chance to properly review and assess the recent improvements made by @thijsdhollander. Unfortunately, these changes made it onto dev without proper review, due to their inclusion into #1449 (the extensive changes to the Python scripting library), which was reviewed purely on the basis that the changes were necessary to port existing commands to the new API, rather than changing behaviour as such. My mistake on that one, I hadn't cottoned on to the fact that the improvements being proposed to this algorithm (on #1545) were more than merely cosmetic.

All this PR does is add a -algo option to allow choosing the particular version to be used (2016 or 2019, the dates of the respective abstracts), and sets the default to 2016 for now. The default is not set in stone, and I'd like to review it properly ASAP: if these changes are theoretically sound and lead to genuine improvements in the response produced and the subsequent MSMT-CSD outputs, we should use them by default (I note the computation is also quicker, which would be an added bonus). I note the results produced are not identical to master (presumably due to other more minor refinements), but they're close enough that I wouldn't be worried about them.

These improvements were presented at ISMRM 2019, but I for one didn't feel I understood the rationale sufficiently to make an informed decision. I'd like to review the changes in line with the discussions we had back in #422 & #426 (which incidentally, the changes under discussion here supersede). As far as I can tell, the changes are more or less entirely self-contained to 2e82541...?

While this PR can be merged as-is for the time being, there are two issues to discuss:

  • the interface to select the version of the algorithm to be used. I've added a -algo option with a default of 2016, but this can be done a number of ways. We could have a single -improved (or similar) option with no argument to switch to the new behaviour. Maybe more interesting would be to add the new single-fibre response estimator (which @thijsdhollander's changes effectively introduce) as a full-blown algorithm in its own right, and add a -sfwm_estimator option (or similar) to the dhollander algorithm, defaulting (for now) to tournier. Personally, I reckon this last approach would do the most justice to @thijsdhollander's work on this.

  • the actual inner workings of the new single-fibre response estimator, the rationale behind the decisions made, evidence that the responses obtained are indeed better, and evidence that it is at least as robust as the current approach on a wide range of inputs. This is the bulk of what I would have liked to have seen discussed prior to merging these changes in the first place.

@thijsdhollander, your input here is obviously critical. I expect you won't be ecstatic at the prospect of having to provide all this additional validation, but I really think it's important to provide as much evidence as is necessary so we can all be confident of the methodology and be happy to endorse it as the default (assuming that's the eventual outcome), and to document it openly for any interested users. It's a bit late in the day for this to be included as the default in the next release, but there is still scope for it if we can move on this quickly. We can certainly change the handling of the algorithm switch (as per the first issue to discuss above) if my proposal isn't deemed adequate.

There have been recent changes to the Dhollander algorithm that alter
the results produced. This allows the previous behaviour to be used, and
defaults to the old behaviour for now.
@jdtournier jdtournier added this to the MRtrix3 3.0 release milestone Aug 22, 2019
@jdtournier jdtournier requested a review from a team August 22, 2019 14:18
@jdtournier jdtournier self-assigned this Aug 22, 2019
@thijsdhollander
Copy link
Copy Markdown
Contributor

Wow, critical and everything, all the big guns. I sense all sorts of tension and excitement... Let's just keep the heads cool and look at this, right. This could basically be resolved with a < 30 min live (video/audio) chat, I reckon, which might save us hours of typing essays, stress, miscommunications of all sorts and the like. I'm equally limited for time, and I'm guessing you're having fun ironing out critical stuff to have a reliable code base to run a workshop; so everyone gains from efficiency.

This is not supposed to be final, but we need to be able to use the previous implementation of dwi2response dhollander, and to default to the previous behaviour until we've had a chance to properly review and assess the recent improvements made by @thijsdhollander. Unfortunately, these changes made it onto dev without proper review, due to their inclusion into #1449 (the extensive changes to the Python scripting library), which was reviewed purely on the basis that the changes were necessary to port existing commands to the new API, rather than changing behaviour as such. My mistake on that one, I hadn't cottoned on to the fact that the improvements being proposed to this algorithm (on #1545) were more than merely cosmetic.

Ok, let's get all the awkwardness just out of the way so we can focus on the rest. Nobody's perfect, we've all made some mistakes and now we're here at this time. To just get it off my chest and over with, so we can move on: what irks me a bit is more that you were already essentially bringing this up "internally" (for lack of a better word) all the way back in July. I know and understand that typing up such a long post and preparing for tensions is never a thing to look forward to, but now we're here with time limits which never make it any better. The long text wasn't really needed, I'd say: you can just chat to me and we could be done in half an hour. Just like others did at ISMRM after my talk when they still had questions, etc...


I'd like to just now constructively look into this: if this is all effectively an honest attempt to resolve this efficiently (which I'll put my trust in) and make sure you understand / appreciate that the new algorithm is effectively robust (believe me, it is) and generalises very broadly (believe me, it does), then I'll just sum up the essence here now with the little time I have at the very moment (I'd really like to go home at this time), but then please be a bit more specific about e.g. what kind of data you might still wonder about; because you're otherwise asking me to produce basically several papers of data here, which is really not very realistic at the moment. Believe me, I've got nothing to hide, you can just ask, but please specify.

Quickly walking through some of your bits, so I don't overlook the gist:

All this PR does is add a -algo option to allow choosing the particular version to be used (2016 or 2019, the dates of the respective abstracts), and sets the default to 2016 for now. The default is not set in stone, and I'd like to review it properly ASAP: if these changes are theoretically sound and lead to genuine improvements in the response produced and the subsequent MSMT-CSD outputs, we should use them by default (I note the computation is also quicker, which would be an added bonus). I note the results produced are not identical to master (presumably due to other more minor refinements), but they're close enough that I wouldn't be worried about them.

  1. -algo option: I personally don't think it's needed and you can strictly move on, but if that makes everyone more comfortable for whatever reasons, I'm not going to go nuts over it
  2. Default for that option: yes, I do think you'll be better off with the 2019 version for about every kind of scenario that I've tested, and believe me, I've tested many since long before and all the time further since ISMRM. A few basic bits sit at the end of this post (so the essence sits there together); but again, if you're after something specific, please just specify.
  3. Quicker: that's definitely the case, but I also merely see that as a "bonus"; if it would've taken longer than the 2016 version and still produced the same 2019 results, I'd still go for it. But it being this way, at least that's not something to worry about. It also uses less (maximal) disk space and less moving (large) stuff around while running; again just a coincidental bonus.
  4. results not identical to master: I suppose you're referring to the "old" or original version. Yes, there's a few very minor refinements; but also the "dependency" on e.g. dwi2tensor for the FA map that's used for that initial threshold at the very start of the entire algorithm (and dwi2tensor now uses the WLS initialisation, so minor differences might exist indeed). But the gist is only the change to the SF-WM voxel selection in the very final step, just as presented at ISMRM; also still readily available on YouTube: https://youtu.be/7yPSFgLt8CA

These improvements were presented at ISMRM 2019, but I for one didn't feel I understood the rationale sufficiently to make an informed decision.

Just speak to me, really. It's not super complicated, but the talk always makes it slightly rushed for sure. Additionally (and to somewhat counter the fact that saying "it's not super complicated" might accidentally offend a bit; not meant to), I can't know for sure, but I can see you might be stuck in the same thinking initially that I was in for a long time too; in fact, we might all be, since we've been convinced by our earlier discussions you referred to, when we kind of tuned/reasoned the "final" version of the dwi2response tournier algo:

the discussions we had back in #422 & #426 (which incidentally, the changes under discussion here supersede)

I have to add though that it's not "truly" superseding. The setting here is fundamentally different since the input data is now not restricted to just 1 b-value (while it was restricted to that in those earlier discussions; so the reasoning there only exists in a universe that's a priori restricted by that). This does make all the difference; since the 2-tissue WM-CSF thing that goes on in this new 2019 version is absolutely key + the fact that it's normalised to sum to 1 in the metric (which you can't really do like that with 1 tissue, due to 1 b-value). Well, actually, you can in a way though, and that's what lead me to the insight of where our reasoning before has a sneaky flaw. I'll give you this as something essential to get you thinking here: you'll surely remember the whole discussion on whether to initialise from a strictly sharp(er than the final) response or rather a strictly fat(ter than the final) response, right? We ended up going for sharp(er), which makes sense (still does; not questioning that; my 2019 version also uses an "extreme" sharpest disk). Here comes the twist: it's the iterating aspect that doesn't make sense, or more specifically the iterating aspect that comes from a too-sharp response and then iteratively would converge to the true response. That's simply not what's happening in the tournier algorithm: you might start with too-sharp, but the next iteration you'll either have your final solution or a too-fat response, because not having the final solution means you've got a sub-optimal selection of voxels with not-sharp-enough disks to be the final response. So the whole theory of coming from the "too sharp" end of the spectrum breaks down at iteration 2. You can also indeed observe that by looking at the response at each iteration: you'll see that at iteration 2 it's fatter than the final one, and the next few iterations (even though only gently) make the version at iteration 2 sharper.

So how is this possible then; where does the theory blow up? Or actually more so: how can it (tournier algo) still converge to a sharper disk anyway?? Well, the algorithm is actually not optimising what we might've been thinking it was: remember it also has (by design!) a bias for larger signals. It's not merely shape-based, but also amplitude-encouraged. That design was put in for another different good reason: the tournier algo has to run on whole brains, and only a single b-value, so going only of pure shape, you'd get small noisy signals e.g. in the ventricles and such also coming out. The "amplitude bias" dealt with that. But the surprising side effect is that it masks that the shape-thing in the algorithm is actually not the whole magical thing that's optimised. It essentially does find disks, and then via the iterations it converges to larger disks. But, and this is the big but, in a lot of common (e.g. in vivo adult human and such) cases, the larger disks just so happen to be the sharper ones. But that's just a coincidence. It also "helps" that the larger disks have just a tiny bit less Rician bias, which helps them "accidentally" be the sharper disks.

Not sure if you followed all of that, but that's a key sort of insight to move beyond some aspects of the logic of the tournier algorithm, in particular the fact that iteration is a good thing. It's actually not, if you intend to go just by shape. Another experiment that allows you to see this, is to manually solve the thing the amplitude bias in the tournier algo was designed for: mask to a very safe WM mask first. That then should in principle allow you to not-need the amplitude bias (which you can take out by normalising each voxel's signal to a constant DC term). I did that experiment at some point on a lot of datasets; and what does it show: the tournier algo simply doesn't work without the amplitude-bias cue driving it to converge on the kinds of voxels we expect. It's not low signal (and thus SNR) voxels ruining it, it's just the idea of optimising pure shape iteratively that doesn't check out.

Alright, so I did accidentally already end up typing up a small essay... Anyhow, at this point I should further comment on:

As far as I can tell, the changes are more or less entirely self-contained to 2e82541...?

No no, definitely not. You've actually picked a commit there at the point I was designing this multi-shell kind of extension with the tournier algo as the model still. This is in fact the exact version where I started to realise something just didn't check out with the iterating + shape thing. I already here had the concept of 2 tissues (WM-CSF) in there, and normalising to sum to one. That got rid of some amplitude stuff (bias fields ; other mysterious sources of T2 variation), and at the same time dealt with the thing the tournier amplitude "bias" was designed for (CSF). The version you're picking out there starts with an initial response based of top-FA-valued voxels, and then ("iteratively", but just for 1 extra iteration) uses that response together with the CSF response in the next 2nd iteration. And what happens? The 2nd iteration actually has a worse response than the first one (that naively comes from top FA voxels). I've run that at some point with extra iterations, and the response gradually gets worse. That's when I moved on to my story about about the tournier algo, and the corresponding experiments.

So 2 important things at this point:

  • Don't use that commit as a reference for anything wrt to the 2019 version; if you want to compare, just do it basically of the version here and now, and the old version you've added here via the -algo option. That's the most simple way to compare both. There's no single commit that has all the changes at once; I've kept the process in the commit history on purpose to refer to it and reconstruct reasoning (like we're doing here now).

  • Maybe this is a thing you didn't pick up in the abstract or presentation (maybe also not, excuse me if so), so I'll highlight it to just make sure: the new 2019 version has no iterations that use a previous response from some previous iteration. Yes, there's 2 very similar steps, the first at lmax=2 and the second at lmax=6, but the second one does not use some kind of response from the first one. The first iteration at lmax=2 only narrows down the set of voxels (i.e. refine it already a great deal) and the lmax=6 step does exactly the same "from scratch" again, just limited to those voxels. So both "steps" use the (lmax=2 and then 6) version of that "artificial extreme SF WM response" mentioned in the abstract/talk together with the final CSF response from the data as already obtained earlier. The lmax is only a parameter in the SF-WM metric, you could say. In practice, the intuition of its effect is: higher lmax in the metric will give more preference for sharper disk(s) (for all separate b-values) versus "lower signal decay". The entire metric optimises both, but since disk sharpness and signal decay are in a way apples and oranges, there's always some kind of (by definition arbitrary) weighing between both going to come into play. Even more complicating is the fact that disk sharpness also exists for all shells separately. And all those shells also weigh (inherently) via their respective numbers of gradient directions.

the interface to select the version of the algorithm to be used. I've added a -algo option with a default of 2016, but this can be done a number of ways. We could have a single -improved (or similar) option with no argument to switch to the new behaviour. Maybe more interesting would be to add the new single-fibre response estimator (which @thijsdhollander's changes effectively introduce) as a full-blown algorithm in its own right, and add a -sfwm_estimator option (or similar) to the dhollander algorithm, defaulting (for now) to tournier. Personally, I reckon this last approach would do the most justice to @thijsdhollander's work on this.

  1. -algo is just fine, but I'd still strongly urge to consider the 2019 default (see final bit at the bottom of the post). -improved is too confusing. I'd also, in any case, like to not see the default be 2016 without the user realising. For the workshop, do whatever you need to do as you're probably already out of time as I'm typing this. But for a release, at this point, users expect the 2019 version; so if it's not the default at that point, maybe the -algo has to be mandatory and the user makes a conscious choice. They're all clever people with a mind of their own to make a choice.

  2. New single-fibre response estimator: I'm flattered, but I don't think it's a good idea for a few pragmatic reasons actually. It's actually not really "useful" out of its context (context being 3-tissue response estimation): it needs at least 2 b-values (or idd single-shell + b=0), and optimises for that. There's no use for "only" a single-fibre WM response with 2 b-values but no other tissue response(s). It does optimise inherently for low signal decay, which is one of its main purposes (and indeed a main reason why it leads to better multi-tissue CSD fit, as illustrated in abstract and talk). Also, it even needs the multi-tissue context to do its thing: the key metric uses the CSF response from the data itself. So it just gets messy and also not very useful. There's already so many response function algos as well, while everyone only uses the 2 or 3 or whatever main ones. I get you want to offer choice and all, but at some point some choices do become surpassed, and offering stuff only adds confusion with no benefits (or even adverse side effects). After all my extensive testing all the way since end of last calendar year up to now still, and even providing beta test users with both algorithms so they could consistently compare for all their data as well (yes, yes I've done that), I've literally almost exhaustively convinced myself this is a strict step forward in terms of better fit, more robust behaviour in weird scenarios even, etc...

the actual inner workings of the new single-fibre response estimator, the rationale behind the decisions made, evidence that the responses obtained are indeed better, and evidence that it is at least as robust as the current approach on a wide range of inputs. This is the bulk of what I would have liked to have seen discussed prior to merging these changes in the first place.

Ok, so half of the essay is already above in any case. This can really be done far more efficiently in a simple 30 minute live chat, with that little triangle diagram from the presentation on which I can convince you of all reasoning right there, with a mouse pointer to, well, point things out.

Apart from all the bits above, some random stuff, if it helps at all:

  • The new metric is basically a direct SF-WM metric; no iterations. lmax=6 is already very robust, but lmax=2 is slightly more robust and errs more (stronger) towards lower signal decay. That's the rationale of first lmax=2, then lmax=6 on the subset. In the context of dwi2response dhollander, it's even more robust because at lmax=2 stage, we're already in a refined WM mask. But even in absence of that, the lmax=2 whole brain is really already a very, very, very good direct SF-WM metric. lmax=6 just puts then a bit more emphasis on the sharper disk bit; but it's a balance indeed. I can visually draw you the area and shape of the "zone" in the top left WM "corner" in that triangle diagram that a first lmax=2 then lmax=6 as a combined strategy "carves" out for you in terms of signal selection. It generalises incredibly well across extremely different data.

  • The new metric is not sensitive to absolute signal amplitude, i.e. is also more robust against something you might not have considered in the tournier algo: bias fields. Change the bias field, change the tournier voxel selection. The 2019 metric has zero influence of bias field or any inhomogeneity.

  • Compared to the 2016 version, the win is basically in lower signal decay, but sometimes also in sharper disk shape for the lower b-values (which tournier doesn't consider of course). Both lead to better signal fits of course.

  • I've tested this from high to low b-values, high to (very) low angular resolution, multi-shell, single-shell + b=0, many b=0 images versus just 1 b=0 image, in vivo, ex-vivo, Alzheimer's with large lesions, stroke with large lesions due to ageing in addition to massive stroke lesions, MS lesions, brains with severe atrophy where relative "volume" of tissue types is thrown off balance, baby brains (see also abstract for impact there), neonatal data (dHCP as you know), too-low b-value ex-vivo excised samples of tubers with only a sliver of "healthy" WM still attached and a single b=0, fetal growth restricted lambs (yes, indeed) at 127 days gestation (so delivered by caesarean section at that age) versus twin-lamb controls with extensive multi-shell sampling, ... Well, in all scenarios, it performed perfectly robustly. Sometimes it led to responses that were similar to the ones the 2016 already provides, and most of the time it lead to small to sometimes (e.g. that baby data) very large differences in response that resulting in small to sometimes very large improvements in fit. Those improvements generally follow a similar pattern: improvement (of fit) in the more diffusion restricted areas. For a human, expect the splenium and the middle bits of the corticospinal tracts to improve in fit. For babies, those too, but also (very much) the cerebellum around a certain age or age range.

Once more about those inner workings themselves: the code it really simple here; maybe more simple than it appears at first sight. It's essentially just that one key metric; all the rest is bulk to make that happen in practice. The talk has it literally all on those 2 slides where its explained; there's nothing more to it.

@thijsdhollander, your input here is obviously critical.

Well, I hope I have already provided some input above...

I expect you won't be ecstatic at the prospect of having to provide all this additional validation,

I was hoping to type less, but now that it sits there typed up in any case... well, at least that's done then.

but I really think it's important to provide as much evidence as is necessary so we can all be confident of the methodology and be happy to endorse it as the default (assuming that's the eventual outcome), and to document it openly for any interested users.

Yeah of course. But the abstract is really open about it for sure though: there's no hidden mysteries. The code it simple, relatively short, completely open. Even the talk is available publicly. And at this point, this long post too. I don't often see this much materials here or anywhere in fact.

It's a bit late in the day for this to be included as the default in the next release, but there is still scope for it if we can move on this quickly.

Yeah, even if I say so myself, I think you'd be providing the users with a good service by offering a more robust, accurate (better fit), and as a bonus faster algorithm by default. And if users get better MSMT-CSD fit due to that, it also reflects good on that algorithm of course.

We can certainly change the handling of the algorithm switch (as per the first issue to discuss above) if my proposal isn't deemed adequate.

Well, another final time my 2 cents here: I think you can safely move to the new algorithm regardless, but I don't have the time, energy or desire to fight long battles over it. -algo is simple, and it doesn't have to be complicated. But the one thing I think is at least a bit important here is that (as mentioned somewhere above) users don't run the 2016 version by default without realising. That's not what several of them will be expecting at this point; so I'd just like to avoid that at the very least.

If you need more info, I'm still available for a 30 min chat; I'm long past caring about awkwardness. As I said, there's no secrets here, I can answer whatever question (if by now all of the above doesn't already answer everything).

@thijsdhollander
Copy link
Copy Markdown
Contributor

One I forgot yesterday, just for completeness sake: it also doesn't depend on spatial operations or proximity of stuff, i.e. like the tournier algorithm does with the dilation steps at each iteration. You could essentially feed the voxels in any way or scramble them, and it'd still give the same answer. The voxels are selected purely by their diffusion signal features; independent of intensity homogeneities or spatial proximity assumptions.

I might've still missed highlighting other benefits or particularities; but as mentioned before, just ask whatever or organise a chat if you're after a fast and constructive resolution.

@jdtournier jdtournier closed this Aug 27, 2019
@jdtournier jdtournier reopened this Aug 27, 2019
@jdtournier
Copy link
Copy Markdown
Member Author

[EDIT: slip of the finger closed this issue - reopening and completing this post below]

You're quite right, I could and should have dealt with this when I first became aware of it. But by the same token, this discussion could and should have taken place prior to merging to dev. At least we're now talking about it, even if we could have done so earlier.


As to your approach itself: thanks for taking the trouble to explain the logic behind it, it all makes a lot more sense to me now, and I agree with pretty much everything you say. I'm quite happy to support its inclusion into the next release, potentially as the new default - but there's a few issues to work out before we can do this.

First off, if I can recap how I understand the method works (ignoring the second lmax=6 run):

  • perform 2 tissue lmax=2 MSMT-CSD within a safe WM mask (computed as FA>0.2) with a CSF response and an idealised WM response - idealised in the sense that its l=0 DC component does not decay as a function of b-value, and that its angular dependence is a flat disk for all l>0 SH bands.

  • form a voxel-wise metric of single-fibre-ness, computed as the peak amplitude of the WM ODF normalised to the overall density (sum of the CSF and WM ODF l=0 terms).

  • select the top 0.5% voxels from that metric, and use them as the final single-fibre mask.

  • Extract WM response from this mask using orientations obtained from a tensor fit.

I can see the advantages of this approach: the use of MSMT-CSD indeed means we can exploit other information than the orientation domain to identify the purest WM voxels - and the slowest decaying voxels indeed sounds like quite a good heuristic. Even as a measure of WM-ness, the ratio of WMl=0 / ( WMl=0 + CSFl=0 ) seems like a good indicator: it promotes voxels with the highest proportion of slow decaying signal (assuming WM here refers to the ODF estimated using the idealised response). On top of that, the use of the WM ODF peak in the numerator additionally promotes single-fibre voxels. And there is indeed no influence of bias field or other sources of intensity inhomogeneity (including susceptibility artefacts). So I can see the rationale and the appeal.

There are still however a couple of issues I'd like to look into:

  • how come the WM response that comes out is so much lower in amplitude - by almost a quarter? The last time we looked into this (in Default tckgen FOD amplitude threshold still appropriate? #605), I'd more or less convinced myself that the response amplitude wasn't somehow over-estimated...

  • is there a pattern in the difference between the single-fibre masks produced by both approaches? Is it simply a matter of the tournier approach picking high amplitude voxels, or is there something more?

But hopefully these are relatively minor. I'll experiment a little bit to get a better feel for it.

More to the point though: if we do decide to change the default behaviour to your new approach, what are the consequences for downstream applications? One I'm particularly concerned about is the suitability of the threshold for tractography: if the WM response comes out smaller, then the WM ODF ought to come out larger. Does this mean we have to revisit the default cutoff for tractography like we did in #1228? Are there any other unexpected consequences we haven't thought of...?

@thijsdhollander
Copy link
Copy Markdown
Contributor

As to your approach itself: thanks for taking the trouble to explain the logic behind it, it all makes a lot more sense to me now, and I agree with pretty much everything you say.

Good to hear 👍

First off, if I can recap how I understand the method works (ignoring the second lmax=6 run):

Looks all generally correct. I'll walk through each bit briefly just to tick them all off. Wrt the lmax's: it's basically 2 times the exact same thing, just with a different lmax indeed; first 2, then 6. The lmax=2 case is run only for the "safe WM" set of voxels at that point in the algo, whereas the lmax=6 case is run only for the best voxels already emerging from the lmax=2 case. This whole setup is just to further boost the robustness, find a good balance for the final set of voxels, and as another small side bonus boost speed a bit further (i.e. lmax=6 doesn't have to be run for all WM voxels). Now each step:

perform 2 tissue lmax=2 MSMT-CSD within a safe WM mask (computed as FA>0.2) with a CSF response and an idealised WM response - idealised in the sense that its l=0 DC component does not decay as a function of b-value, and that its angular dependence is a flat disk for all l>0 SH bands.

Yep, generally correct. Minor details:

  • safe WM is FA>0.2 (or what the user chooses), but also some outlier rejection of high signal decay metric (SDM) voxels. The latter isn't really relevant here, since those high SDM voxels would naturally score extremely low on the new SF-WM metric. In the broader algo, those high SDM voxels might partially be returned to the CSF pool (ultimately for the CSF response), but that's a separate matter.

  • Idealised WM response ("extreme WM" or eWM): description is correct. The "kind" of idealised is also important (similar to our old sharpest-or-fattest-disk discussions in a certain way): strictly sharper, or higher anisotropy, than we expect for the final response, as well as strictly less signal decay than we expect for the final response. This puts it, relative to the SF-WM we're after, but also relative to the GM and CSF responses and thus the entire "triangle", in this spot:

sfwm-metric_concept

That relative location is crucial, since the projection of all signals on that eWM-CSF spectrum should not be cut short at the eWM-end of things, since we'll need that range there to effectively be part of our metric. Note there's still no worries even if in the sketch the eWM-CSF line would sit slightly "below" the WM-CSF line, as long as eWM at least sits strictly to the top-left side of WM. The latter is ensured by the definition of the eWM response. Also definitely not this is just a sketch for reasoning: you can't draw this exact thing with these straight lines and perpendicular anisotropy-SDM axes. The reality is at least a projective transform of the coordinates; but that's a different story. Doesn't break the theory or above reasoning though; so the sketch is a useful reasoning tool.

form a voxel-wise metric of single-fibre-ness, computed as the peak amplitude of the WM ODF normalised to the overall density (sum of the CSF and WM ODF l=0 terms).

Yes, correct. The normalisation to total per-tissue signal is what gets rid of signal magnitude dependence (and bias fields, etc...); but also what allows the rest to be as simple as only the peak amplitude (and not needing a ratio of peaks). If the position on the eWM-CSF spectrum is the same for 2 voxels' signals, this normalisation allows the peak amplitude to be a direct assessment of single-fibre-ness. So we get combined non-CSF-ness (or "tissue-ness"), (e)WM-ness, and single-fibre-ness; all in one metric.

select the top 0.5% voxels from that metric, and use them as the final single-fibre mask.

Top 0.5%, computed as percentage of (numbers of voxels of) the safe WM mask, for the final lmax=6 case. For the first lmax=2 case, it's twice that, so 1% of safe WM. Users can control the final percentage (the one that's by default 0.5%); but just as with the percentages for the other tissues, there's no good reason to actually change from the default percentages, unless there's some evidence that there's e.g. a severe lack (percentage-wise) of good SF-WM voxels relative to all WM. For adult-sized brains and voxel sizes of 2.5 mm isotropic, 0.5% ends up being 150~250 or so voxels; and so it scales for other resolutions or brain (WM) sizes.

Extract WM response from this mask using orientations obtained from a tensor fit.

In the dev branch and this pull request, yes.

I've since moved on for distribution to collaborators and (SS3T-CSD, etc..) beta testers to having the orientation come from the peak orientation of the final lmax=6 step. The difference is minor though; the main reason is the idea / thought of not using the tensor, because "just because". Even if the lmax=6 peak orientation differs a small bit from the principal eigen vector orientation for a few voxels, it's still just a few voxels out of a couple of hundred. So just kind of the good feeling of not using the tensor, I guess.

I can see the advantages of this approach: the use of MSMT-CSD indeed means we can exploit other information than the orientation domain to identify the purest WM voxels - and the slowest decaying voxels indeed sounds like quite a good heuristic. Even as a measure of WM-ness, the ratio of WMl=0 / ( WMl=0 + CSFl=0 ) seems like a good indicator: it promotes voxels with the highest proportion of slow decaying signal (assuming WM here refers to the ODF estimated using the idealised response). On top of that, the use of the WM ODF peak in the numerator additionally promotes single-fibre voxels. And there is indeed no influence of bias field or other sources of intensity inhomogeneity (including susceptibility artefacts). So I can see the rationale and the appeal.

Yep, as covered in each individual point above.

how come the WM response that comes out is so much lower in amplitude - by almost a quarter? The last time we looked into this (in #605), I'd more or less convinced myself that the response amplitude wasn't somehow over-estimated...

That's in line with what I'm observing too for the normal adult human cases; and sometimes different scales (but not too different) in other scenarios. All towards lower amplitude though. There's 2 main things at play (and maybe lots of more minor ones too):

  1. The tournier algorithm actively optimises for larger amplitude. This is the main explanation. Even if both algos would find similar shape of response, but in presence of a larger pool of good SF-WM voxels, the tournier algorithm will pick the larger amplitudes ones from the pool.

  2. The 2019 algorithm doesn't care about amplitude directly (only relative amplitude between b-values, but that can also come from lower b=0 amplitude, relative to given b=... amplitude(s), etc...); but also by caring actively about a few other features, such as signal decay (and disk sharpness for intermediate b-values, if multi-shell data), it will naturally have its own "typical" set of voxels. A priori, there's no reason they'd be the same, or different, from the tournier ones. It seems they end up being different though in practice, in adult humans; hence the amplitude difference.

is there a pattern in the difference between the single-fibre masks produced by both approaches? Is it simply a matter of the tournier approach picking high amplitude voxels, or is there something more?

So then we arrive at that point indeed. The answer is the same as for the previous point: the combination of my 2 points above. It might differ in individuals, but a prevalent pattern of the tournier algorithm is erring towards the splenium and sometimes bits of the CST: structures that indeed stand out clearly in amplitude in an average b=3000 or similar image (but are also decent SF-WM voxels at the same time). That, and sometimes that then modulated a bit by bias fields, depending on their strength. A prevalent pattern of the dhollander 2019 version is more other bits of the mid-body of the corpus callosum; in favour of a bit less of the CST or/and less extent of splenium. Interestingly, you might frequently see it avoid some voxels that sit "exactly" in the middle of the corpus callosum, i.e., where it does curve a bit stronger. Another not uncommon feature is a few (really only a few) voxels of the upper cerebellar peduncles. They're indeed surprisingly single-fibre WM; if partial voluming doesn't ruin it for these small structures.

More to the point though: if we do decide to change the default behaviour to your new approach, what are the consequences for downstream applications? One I'm particularly concerned about is the suitability of the threshold for tractography: if the WM response comes out smaller, then the WM ODF ought to come out larger. Does this mean we have to revisit the default cutoff for tractography like we did in #1228? Are there any other unexpected consequences we haven't thought of...?

Yep, spot on. I'm already advising all people that use it to consider higher thresholds (again) for tractography and fixel segmentation (e.g. in FBA, on the template). The difference magnitude you were observing as well, lands initial "default" good thresholds to consider in the 0.06~0.07 range, depending on exact cases and preference for a bit more relaxed or stricter outcome. Funny "beneficial" side effect is that this then comes at least a bit closer again to the "tournier + original CSD" combination / pipeline; because of how the response differences counteract some CSD differences in each pipeline.

However, because of this amplitude trickiness between response function algorithms or estimation sources, and other steps too, I'm even further advising people to always run mtnormalise immediately after any multi-tissue variant; because "normalisation" indeed. You'd want good bias field correction in any case if your subsequent step is going to rely on amplitude, e.g. because tractography amplitude threshold, etc... But normalisation becomes a handy thing to have in any scenario too. mtnormalise is fast in any case, and it seems someone somewhere over there not that far from you guys has been working on speeding up bits of the code even further. I'm mentioning this in general, because mtnormalise output amplitudes typically also differ slightly (but quite consistently) from the inputs for given fixed scenarios (combination of response function set derived from a given algo, and a given multi-tissue model (2-tissue, 3-tissue, ...)). Asking people to run mtnormalise and tune defaults to that might just be more "robust" over time, if response function amplitudes or FOD amplitudes can undergo changes due to e.g. what we're encountering here now, or due to other developments.

@jdtournier
Copy link
Copy Markdown
Member Author

Just trying to get to grips with the difference in amplitude issue. Here's a screenshot of the voxels selected for the WM response estimation using the new dhollander algorithm in red, and the previous (current) version in green, overlaid on the FA map (all computed from the BATMAN data). Same number of voxels in both cases (167 vs 168).

screenshot0000

Looks like the new version selects more of the genu of the corpus callosum in particular, and also selects fewer voxels near the edge of the corpus callosum. Neither versions seem to pick much of the splenium in this case. In general, both approaches give reasonable voxel selections - I'd be hard pressed to pick one over the other on that basis alone. But I'm still struggling to understand how that results in a lower amplitude WM response, across all b-values... Looking at the per-shell mean DW signal images though, I can see that the signal in the genu is indeed substantially lower than the midbody (~280 vs. ~350), which might be enough to explain the bulk of the amplitude difference. I also note that the old version did pick a few voxels in the CC that were a bit too close to CSF for comfort, which could drive the b=0 signal further up - which I can observe in the output responses, above and beyond the overall difference in amplitude.

So from that limited dataset, the output of the new version looks perfectly reasonable. The fact that the old version picked a few voxels with potential CSF contamination might be enough to explain the poorer fit in the splenium and other regions: the estimated decay is not as rapid as it would need to be (due to CSF contamination), and since that's the slowest decaying component, there's no way to perfectly fit the signal in these voxels properly. All this argues for moving to the new version by default.

So assuming we go with that, we need to discuss:

  • ways to make the previous version accessible, which I think is important. I'm happy to keep the -algo option that I've introduced here and make it default to 2019 if everyone is happy enough with that.

  • re-assessing the impact of the difference in the resulting ODF output on downstream applications, particular tractography, FBA threshold, and the like.

  • communicating these changes to users - not necessarily a big deal since the next release should be The Big One, and hence users are likely going to expect major differences.

  • updating recommendations in the docs to match - this will need a bit of work to make sure it's all up to date and self-consistent...

Any other thoughts welcome...

@thijsdhollander
Copy link
Copy Markdown
Contributor

Here's a screenshot of the voxels selected for the WM response estimation using the new dhollander algorithm in red, and the previous (current) version in green, overlaid on the FA map (all computed from the BATMAN data). Same number of voxels in both cases (167 vs 168).

Yes, that's generally in line with the pattern I'm seeing for both methods across different datasets. Sometimes the difference is a bit more outspoken, sometimes a bit less, but the relative trends of patterns are similar.

Looks like the new version selects more of the genu of the corpus callosum in particular, and also selects fewer voxels near the edge of the corpus callosum. Neither versions seem to pick much of the splenium in this case.

Yep, as above. How much of the splenium ends up being picked by the tournier algo sometimes depends on how much other higher-signal voxels are available (or not) in other structures, and their relative sizes. But the absolute numbers don't really matter here: it's more about the relative shift from what's selected by the one algo, to what's selected by the other.

In general, both approaches give reasonable voxel selections - I'd be hard pressed to pick one over the other on that basis alone.

True, at least in the traditional sense of thinking about single-fibre voxels; which tends to indeed ignore the signal decay / diffusivity. We're used to look for areas that have a single peak in the FOD; but that's not the complete picture if the (SF-)WM response needs to represent the "purest" (lowest signal decay) WM. It's part of the picture though. I'd also argue that, independent of voxel selection, there's nothing really wrong in the end with the single-b-value response that the tournier algo comes up with: it's a nice and sharp disk indeed. But for multi-b-value response estimation, we're indeed also care about other properties across those b-values.

But I'm still struggling to understand how that results in a lower amplitude WM response, across all b-values... Looking at the per-shell mean DW signal images though, I can see that the signal in the genu is indeed substantially lower than the midbody (~280 vs. ~350), which might be enough to explain the bulk of the amplitude difference. I also note that the old version did pick a few voxels in the CC that were a bit too close to CSF for comfort, which could drive the b=0 signal further up - which I can observe in the output responses, above and beyond the overall difference in amplitude.

Well, I think most explanations are in what you describe here indeed. Note that the tournier algo indeed pushes quite hard to still include voxels that become quite likely to have some CSF partial volume. Here it's "at least" already restricted a priori to that "safe WM" mask, which has the FA threshold but also the outlier rejection for high signal decay voxels (essentially voxels at the boundary of WM and CSF, or high-FA voxels within the ventricles for low SNR datasets, such as those at high b-value). It's "even worse" when you run the tournier algo just within a whole-brain mask: it definitely tends to pick voxels with CSF partial volume. This kind of shows that, even though CSF partial volume typically means less WM, and that means less "AFD" and thus less amplitude, the higher signal amplitude of some single-fibre structures nearby the ventricles (e.g. the splenium) is up some extent even able to counter that, and make them still "win out" in the tournier algorithm.

Note this also explains why the new algorithm is able to find a set of voxels with (sometimes a lot) lower signal decay: the tournier algorithm's inclusion of CSF partial-volumed voxels definitely increases the signal decay on average in its set of voxels. And again, this is, in the context of the dhollander algo, even already in a "safe WM" mask where high signal decay outliers are already filtered out a priori.

So from that limited dataset, the output of the new version looks perfectly reasonable. The fact that the old version picked a few voxels with potential CSF contamination might be enough to explain the poorer fit in the splenium and other regions: the estimated decay is not as rapid as it would need to be (due to CSF contamination), and since that's the slowest decaying component, there's no way to perfectly fit the signal in these voxels properly. All this argues for moving to the new version by default.

Yep, exactly. In terms of my "triangle" / simplex kind of sketch: it increases the range of signals being able to be fit (with lower residuals). For an extreme example of what happens when the fit is bad due to too-high signal decay of the WM response: see the baby dataset result in the abstract. The gist is that it ends up underestimating the WM compartment size (of "AFD") in those lower signal decay voxels. The same happens in any similar case (but not always that extreme in effect).

ways to make the previous version accessible, which I think is important. I'm happy to keep the -algo option that I've introduced here and make it default to 2019 if everyone is happy enough with that.

What I'm distributing to collaborators / beta testers / ... is dhollander as the 2019 algorithm, and an additional dholl_old algorithm that is the 2016 algorithm. The latter is marked as deprecated, but its there still now for reference and to enable some comparisons (and have a reference implementation for those comparisons). Both dhollander and dholl_old only differ in this final SF-WM step, where dholl_old uses the tournier algorithm, and dhollander the new strategy. I'm myself not fussed by the large bit of code that is duplicated between both then, because I regard dholl_old to be deprecated and intend to remove it again eventually.

But in any case, apart from interface details, what's mostly on my mind is to offer the newer version by default when dwi2response dhollander is run. How exactly then the previous strategy is made available, matters less. If a -algo option is an easy solution here, then that's ok of course.

re-assessing the impact of the difference in the resulting ODF output on downstream applications, particular tractography, FBA threshold, and the like.

The gist (but this is worth thinking about well of course):

  • FBA pipeline is actually almost not affected at all: mtnormalise is the great equaliser here. I even notice this in the lognorm_scale relatively between using old and new response estimation strategy. The FODs are indeed larger with the new responses, and thus the lognorm_scale differs accordingly. Not exactly though. This is a bit harder to explain easily; but the point is that the fixel threshold can be left very, very close to before. Note that, as I mentioned somewhere above, mtnormalise by default already shifts the amplitude range you can expect from WM FODs. With the old responses, this was easily 15~20% upwards for a scenario where the FODs come from responses from the subject itself. With the new responses, the difference is here of course less then: it shifts less than 10% upwards now typically, towards the same kind of end-goal in terms of FOD amplitude range. So that's another kind of accidental bonus-benefit of the new responses in a way. Finally, note that the threshold in the FBA pipeline is very explicitly flagged as only a "starting" kind of value, but the users are urged to experiment with this value (always). So I don't regard it as a default in a way, just a reasonable starting point.

  • Same story for the tractography threshold in the FBA pipeline, because of the same arguments.

  • Tractography itself in general (default in tckgen): yep, can (probably should) be reconsidered accordingly. Note my point in an earlier post about mtnormalise: might be a good idea to factor that into the story immediately, and then it can be "the great equaliser" in the long term.

communicating these changes to users - not necessarily a big deal since the next release should be The Big One, and hence users are likely going to expect major differences.

I'm definitely communicating it extensively wherever I can, in whatever I distribute. It's big bang time in any case.

updating recommendations in the docs to match - this will need a bit of work to make sure it's all up to date and self-consistent...

Well, so probably not so much changes (FBA pipeline is fine, I reckon). Just tckgen probably needs to be looked at carefully. Note there's already the discussion about curvature default anyway: might be an idea to consider both at once maybe.

This also (almost) fixes handling of location with spaces in the
filename (#1749)
@jdtournier
Copy link
Copy Markdown
Member Author

jdtournier commented Dec 11, 2019

OK, that last commit introduces a -wm_algo option to allow choosing whichever SFWM algorithm we might want. Only concern is that this passes the -sf_voxels option to specify the number of voxels desired, I'm not sure all possible estimator support that...? The fa one uses the -number option to do the same thing, it might be worth homogenising that.

More importantly, this also introduces the fix for #1749 - but it still doesn't work as it should. Here is the relevant extract from the error dump:

              Command:  mrconvert voxels.mif '/home/jdt13/data/BATMAN/processed_data/folder with spaces/dwi2response-tmp-EM35B5/voxels_sfwm.mif'

              dwi2response: [ERROR] mrconvert voxels.mif '/home/jdt13/data/BATMAN/processed_data/folder with spaces/dwi2response-tmp-EM35B5/voxels_sfwm.mif' (tournier.py:140)
              dwi2response: [ERROR] Information from failed command:
              dwi2response:
                            mrconvert: [ERROR] Unable to obtain header key-value entries from spec "'/home/jdt13/data/BATMAN/processed_data/folder with spaces/dwi2response-tmp-EM35B5/dwi.mif'"

As you can see, the issue is that the argument passed to mrconvert's -copy_properties was quoted with single quotes (presumably by the shlex.quote() call), but these are then passed along with the argument itself to the command. I can fix this by adding an additional .strip ("'") to this line in run.py, so that it strips both single and double quotes. But this is starting to feel very hacky to me... This was kind of discussed in #928, but I do think that things would really be a lot easier if we passed lists of strings to the run.command() call rather than trying to tokenise a single command string, with all the gotchas involved in doing this...

This brings it in line with the fa algorithm
@jdtournier
Copy link
Copy Markdown
Member Author

OK, that last commit renames the -sf_voxels option in dwi2response tournier to -number, which is what it's called in the fa algorithm. That means we can at least use either the tournier or fa algorithms with dwi2response dhollander -wm_algo.

Can't use the tax algorithm though, since it doesn't have a -number option. Not sure how big a deal that is.

The problem is that path.from_user() quotes the path by default, which
is the right thing to do most of the time since its output is supposed
to be inserted into the command _string_. But when its output is used
as the mrconvert_keyval argument, it gets inserted into the command
_list_ (run.py: 315), and so shouldn't be quoted. Note this is only a
problem for paths that do need to be quoted, which is why this issue
hasn't come up so far.

More generally, there is a problem that:

     quote (quote ('some problematic string'))

will not match:

     quote ('some problematic string')

since it'll try to quote whatever additions the previous invocation did
to properly quote the original string. I'm not sure whether there's any
way around that...

Note that this will likely need replication in other commands too.
This changes means that if -number is set to a non-default value, the
number of voxels considered at the next iteration scales with it. It was
previously hard-wired to 3000, which clearly wouldn't be appropriate if
-number was 3000 or larger.

This also means that we can remove the explicit -iter_voxels option in
the dwi2reponse dhollander call when invoking an alternative -wm_algo,
without any change in its behaviour. This makes it possible to use the
fa algorithm (would previously fail due to unknown -iter_voxels option).

Note that it still can't use the tax algorithm, since it can't handle
the -number option.
@jdtournier
Copy link
Copy Markdown
Member Author

OK, I reckon this is good to go - but required a few changes elsewhere to behave sensibly. See details in the commits.

@Lestropie
Copy link
Copy Markdown
Member

Can't use the tax algorithm though, since it doesn't have a -number option. Not sure how big a deal that is.

I'd rather catch that so that it simply doesn't pass that option down if the tax algorithm is requested. Better than crashing out.

- Whitelist of single-fibre WM response function estimation algorithms for "dhollander" and "msmt_5tt".
- Minor tweaks to documentation.
- Import new test data to verify dwi2response dhollander -wm_algo operation.
- Update dwi2response tournier tests to reflect 0585d50.
Conflicts:
	lib/mrtrix3/run.py
	testing/scripts/data
@jdtournier jdtournier merged commit a46da5d into dev Jan 20, 2020
@jdtournier jdtournier deleted the dwi2response_dhollander_version_switch branch January 20, 2020 14:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants