Skip to content

Screenshot tests: Magick.NET v14 ErrorMetric.Fuzz now detects previously-invisible image differences #34524

@mattleibow

Description

@mattleibow

Summary

After upgrading Magick.NET-Q8-AnyCPU from v13.5.0 to v14.10.4 (PR #34455), 1523 UI screenshot tests fail. The root cause is that ErrorMetric.Fuzz changed its calculation in v14 and now reports much higher distortion values for the same image pairs. This reveals that the CI screenshots have been genuinely different from the baselines all along — v13 just couldn't see it.

Root Cause Analysis

ErrorMetric.Fuzz behavioral change

For the same pair of images (e.g., VerifyEntryClearButtonVisibilitySetToWhileEditing.png baseline vs CI screenshot):

Metric v13 v14
ErrorMetric.Fuzz 0.00248 (0.25%) 0.08129 (8.13%)
ErrorMetric.RootMeanSquared N/A 0.00248 (0.25%)

v14's RootMeanSquared produces the exact same values as v13's Fuzz. This means v14's Fuzz is now a fundamentally different (more sensitive) metric.

The images ARE different

The baseline and CI screenshots are not identical — they contain real visual differences. For example, in VerifyEntryClearButtonVisibilitySetToWhileEditing:

  • Baseline (in repo): Clear button X icon is grey
  • CI screenshot: Clear button X icon is black

Similar color shifts (grey→black, subtle color changes) exist across many test screenshots. v13's Fuzz metric was insensitive enough that these differences fell below the 0.5% threshold. v14's Fuzz now correctly detects them.

Current Workaround (PR #34455)

In PR #34455, we switched the default ErrorMetric from Fuzz to RootMeanSquared to preserve v13 behavior:

// MagickNetVisualComparer.cs - was ErrorMetric.Fuzz
public MagickNetVisualComparer(ErrorMetric errorMetric = ErrorMetric.RootMeanSquared, ...)

// MagickNetVisualDiffGenerator.cs - was ErrorMetric.Fuzz  
public MagickNetVisualDiffGenerator(ErrorMetric error = ErrorMetric.RootMeanSquared)

This makes all tests pass again but hides real differences that v14's Fuzz can now detect.

Recommendation: Switch back to ErrorMetric.Fuzz

We should take advantage of v14's improved sensitivity. This would require:

1. Revert the ErrorMetric workaround

// MagickNetVisualComparer.cs - switch back to Fuzz
public MagickNetVisualComparer(ErrorMetric errorMetric = ErrorMetric.Fuzz, double differenceThreshold = 0.005)

// MagickNetVisualDiffGenerator.cs - switch back to Fuzz
public MagickNetVisualDiffGenerator(ErrorMetric error = ErrorMetric.Fuzz)

2. Regenerate all baseline screenshots

Since the images ARE different, the baselines need updating to match what CI actually produces. This is a large batch operation:

# Find all snapshot directories
find src/Controls/tests -type d -name "snapshots"
# src/Controls/tests/TestCases.Android.Tests/snapshots/android/
# src/Controls/tests/TestCases.Android.Tests/snapshots/android-notch-36/
# src/Controls/tests/TestCases.iOS.Tests/snapshots/ios/
# src/Controls/tests/TestCases.iOS.Tests/snapshots/ios-26/
# src/Controls/tests/TestCases.Mac.Tests/snapshots/mac/
# src/Controls/tests/TestCases.WinUI.Tests/snapshots/windows/

The baseline regeneration needs to happen on CI infrastructure (not locally) since the baselines must match the CI environment exactly.

3. Investigate why baselines differ from CI

The fact that baselines differ from CI screenshots means either:

  • Baselines were generated on different OS/device versions than CI currently uses
  • Rendering has subtly changed over time (Android API updates, iOS version changes)
  • Some baselines were generated locally with different DPI/scaling

This is worth investigating to prevent drift in the future.

4. Consider adjusting the threshold

If regenerating all baselines isn't practical immediately, the threshold could be increased from 0.5% to accommodate the new Fuzz metric. However, this reduces test sensitivity and is not recommended long-term.

To find the right threshold, analyze the v14 Fuzz values across all failing tests and pick a value that passes the "same but slightly different" cases while still catching real regressions.

Verification Script

To compare how v13 and v14 treat any pair of images:

using ImageMagick;

var baseline = new MagickImage("baseline.png");
var actual = new MagickImage("actual.png");

// v14 Fuzz (new, more sensitive)
double fuzz = baseline.Compare(actual, ErrorMetric.Fuzz, Channels.Red);

// v14 RootMeanSquared (equivalent to v13 Fuzz)
double rms = baseline.Compare(actual, ErrorMetric.RootMeanSquared, Channels.Red);

Console.WriteLine($"Fuzz (v14 behavior):           {fuzz:P4}");
Console.WriteLine($"RootMeanSquared (v13 compat):  {rms:P4}");

Files involved

  • src/TestUtils/src/VisualTestUtils.MagickNet/MagickNetVisualComparer.cs — comparison metric
  • src/TestUtils/src/VisualTestUtils.MagickNet/MagickNetVisualDiffGenerator.cs — diff generation metric
  • src/Controls/tests/TestCases.Shared.Tests/UITest.cs — test framework that uses these
  • src/Controls/tests/TestCases.*/snapshots/ — all baseline screenshot directories

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    area-testingUnit tests, device testss/triagedIssue has been reviewed

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions