CI: Fixes for MacOSX ARM64 runner by Lestropie · Pull Request #2890 · MRtrix3/mrtrix3

Lestropie · 2024-04-26T10:56:38Z

Addresses #2885. Working off the linked CI errors there.

Targeting master here; will need to either merge from master back to dev or re-implement there.

dwidenoise should hopefully only require a minor increase to tolerance.
mrcolour tests had no tolerance whatsoever; so would only take a minute difference in floating-point rounding to set off.
tckresample I was a little surprised by given that there's a default tolerance of 1e-5 mm. It is however possible that because of the operation of the two problematic resampling strategies (fixed step size and fixed number of points) floating-point imprecision could accumulate. Will have to wait and see if a 10x increase in tolerance is sufficient.
Still puzzled by the vectorstats failure: the corresponding test very clearly shows use of -frac 1e-6 for that call, but the error message claims an absolute tolerance of zero. Will see what gets generated from this PR.

daljit46 · 2024-04-27T11:14:22Z

Still puzzled by the vectorstats failure: the corresponding test very clearly shows use of -frac 1e-6 for that call, but the error message claims an absolute tolerance of zero. Will see what gets generated from this PR.

I think you're referencing the wrong test. The test that's failing is on line 4. Indeed, appending -frac 1e-6
to testing_diff_matrix tmpouttvalue_t2.csv vectorstats/1/outtvalue_t2.csv makes the test pass (tested on my MacBook).

Lestropie · 2024-04-29T00:03:40Z

Line 4, not the fourth non-commented test; got it 😅

Multiple occurrences of "-Wunused-but-set-variable" that appeared on Mac OSX GitHub CI.

Lestropie · 2024-04-29T01:27:38Z

Two tests remain failing even with fairly relaxed tolerances:

mrcolour applying the hot colourmap (> 1e-5 fractional difference).
This is a pretty simple operation, so not sure why it'd be an issue:
- Determine linear scaling based on minimum & maximum values
- Apply linear rescaling
- Parametric mapping of value in range [0.0,1.0] to RGB components
tckresample with the -step_size option (> 1e-3 mm Haussdorf distance).
This operation I can see how differences in floating-point rounding could accumulate.

Would be thankful if someone could generate these on an ARM64 machine and forward the data so that I can compare them to what mine generates: would prefer to see the differences on these rather than naively dialing back the tolerances further.

bjeurissen · 2024-04-29T12:12:12Z

Would be thankful if someone could generate these on an ARM64 machine and forward the data so that I can compare them to what mine generates: would prefer to see the differences on these rather than naively dialing back the tolerances further.

m1_outputs_of_failed_tests.zip

daljit46 · 2024-04-30T01:11:48Z

Ok, so I have spent a frustratingly long time trying to understand why these precision issues arise in the first place. After playing around with the code in src/colourmap.cpp:35, I realised that, in rare cases, the result 2.7213f * amplitude - 1.0f was being evaluated differently than const auto v = 2.7213f * amplitude; v - 1.0f;.
This suggested that there was some kind of fusing of those two operations by the compiler. Looking around, I found that GCC and Clang provide the -ffp-contract flag. It seems that this flag has been enabled by default since Clang 14 and onwards. From Xcode 14.3 release notes:

By default, -ffp-contract=ON is set. This option enables shorter and faster floating-point code by fusing floating-point operations like multiplies and adds, but it may impact the accuracy of numerically sensitive applications. (105573336)

Other projects also seem to be affected by this problem, e.g. see here.
Now, technically speaking FMA operations should be more accurate than non-FMA operations since they are less likely to be subject to intermediate rounding errors.
I don't know why the flag gives different results for Apple's M1 chips (a possible reason here might be that on x86 the flag doesn't actually force FMA operations but only enables them under certain conditions), but turning it off (-DCMAKE_CXX_FLAGS="-ffp-contract=on/off" or CFLAGS="-ffp-contract=on/off" ./configure for old build script) makes the code behave similarly to x86 CPUs in my testing (indeed this PR is unnecessary if the flag is turned off).

@MRtrix3/mrtrix3-devs should we just turn off FMA operations for ARM64 Mac chips?

- For mrcolour, output images are clamped to [0.0,1.0] range, so use absolute tolerance rather than fractional. - Increase tolerance on tckresample -step_size given the imprecision of the resampling operation itself.

Lestropie · 2024-04-30T01:18:35Z

Thanks @bjeurissen; very helpful.

mrcolour failures were due to me using the wrong test criterion. If a colour map assigns a tiny fractional value to a colour channel, should not then test precision within some fraction of that value; results in a threshold less than 32-bit floating-point precision.

tckresample was essentially as I had predicted. As per code here, resampling to a fixed step size has a limited precision, and differences in floating-point operations could accumulate along the length of a streamline to the point that for a given vertex, the two different platforms yield samples that are on opposite sides of that precision window. The tolerance on the test was smaller than the width of that window. I've increased the tolerance on master here, but on dev I'd additionally advocate for tightening the tolerance on that window; it will be more computationally expensive, but I think it makes sense that if requesting a fixed step size, the inter-vertex distances should be pretty precise with respect to that requested step size.
For completeness, here's the inter-vertex distances for the most problematic streamline in the test set (check the second last step):

0.497605 0.900021 0.899807 0.899928 0.900478 0.900303 0.899573 0.900134 0.899189 0.21678
0.497599 0.900006 0.899828 0.899928 0.900481 0.900300 0.899574 0.900133 0.900815 0.215153

Lestropie · 2024-04-30T01:32:03Z

Ooofffffff, race condition 🤪

Personally I'd say keep this PR. The tests should ideally pass on any system where the code has been compiled, which means both different hardware and different compiler flags. So some amount of tolerance to account for such is reasonable. We're not dealing with differences of a magnitude that is consequential for the respective operations to be performed on user data. If it were a bigger problem we could discuss generating test data with FFP on, and either using that as reference with tolerance for the errors with FFP off or storing outputs with both configurations and comparing generated data to both references, bu I don't think it's worth the trouble here.

What's the respective timelines on compiler support for FFP contractions vs. hardware support for such opcodes? If it's both potentially faster and more precise, there's an argument to be made to, rather than disabling such to have less inter-system variance, instead enable such across the board, including for precompiled binaries. But only if they're long-standing x86 operations that compilers are only now starting to utilise.

daljit46 · 2024-04-30T09:49:58Z

Personally I'd say keep this PR.

If we believe the new tolerances to be more sensible, I agree. My only concern here is that the differences in results between ARM64 and x86 chips could arbitrarily propagate to large values (e.g. in a for loop that involves floating-point operations). If this is the case, then our commands can give significantly different results depending on the platform. I guess we can deal with it when we encounter an issue of this nature.

What's the respective timelines on compiler support for FFP contractions vs. hardware support for such opcodes?

Any modern x86 CPU (e.g. any CPU released in the last 10 years by Intel or AMD) should support FMA operations (C++ even provides implementations in the standard library). Both GCC and Clang now enable these optimisations by default.

Lestropie added MacOSX CI/tests labels Apr 26, 2024

Lestropie self-assigned this Apr 26, 2024

Lestropie force-pushed the ci_precision_macos branch from 0da8161 to a57c888 Compare April 29, 2024 00:15

Lestropie changed the title ~~CI: Relax tolerances for MacOSX ARM64 runner~~ CI: Fixes for MacOSX ARM64 runner Apr 29, 2024

Lestropie force-pushed the ci_precision_macos branch from a57c888 to 3affd5a Compare April 29, 2024 00:41

Lestropie added 2 commits April 29, 2024 10:58

CI: Relax tolerances for MacOSX ARM64 runner

e36d887

Fix compilation on Clang 15.0

6b9ed64

Multiple occurrences of "-Wunused-but-set-variable" that appeared on Mac OSX GitHub CI.

Lestropie force-pushed the ci_precision_macos branch from 3affd5a to 6b9ed64 Compare April 29, 2024 00:58

Lestropie added the help wanted label Apr 29, 2024

Testing: Fix some tolerances

8eaf15b

- For mrcolour, output images are clamped to [0.0,1.0] range, so use absolute tolerance rather than fractional. - Increase tolerance on tckresample -step_size given the imprecision of the resampling operation itself.

jdtournier approved these changes May 1, 2024

View reviewed changes

jdtournier added this pull request to the merge queue May 1, 2024

Merged via the queue into master with commit 0c84d45 May 1, 2024

jdtournier deleted the ci_precision_macos branch May 1, 2024 08:58

daljit46 mentioned this pull request May 2, 2024

MacOS ARM64 testing failures #2885

Closed

Lestropie restored the ci_precision_macos branch August 26, 2025 08:11

Lestropie deleted the ci_precision_macos branch August 27, 2025 00:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI: Fixes for MacOSX ARM64 runner#2890

CI: Fixes for MacOSX ARM64 runner#2890
jdtournier merged 3 commits intomasterfrom
ci_precision_macos

Lestropie commented Apr 26, 2024

Uh oh!

daljit46 commented Apr 27, 2024 •

edited

Loading

Uh oh!

Lestropie commented Apr 29, 2024 •

edited

Loading

Uh oh!

Lestropie commented Apr 29, 2024

Uh oh!

bjeurissen commented Apr 29, 2024

Uh oh!

daljit46 commented Apr 30, 2024 •

edited

Loading

Uh oh!

Lestropie commented Apr 30, 2024

Uh oh!

Lestropie commented Apr 30, 2024

Uh oh!

daljit46 commented Apr 30, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Lestropie commented Apr 26, 2024

Uh oh!

daljit46 commented Apr 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Lestropie commented Apr 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Lestropie commented Apr 29, 2024

Uh oh!

bjeurissen commented Apr 29, 2024

Uh oh!

daljit46 commented Apr 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Lestropie commented Apr 30, 2024

Uh oh!

Lestropie commented Apr 30, 2024

Uh oh!

daljit46 commented Apr 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

daljit46 commented Apr 27, 2024 •

edited

Loading

Lestropie commented Apr 29, 2024 •

edited

Loading

daljit46 commented Apr 30, 2024 •

edited

Loading

daljit46 commented Apr 30, 2024 •

edited

Loading