Screenshot tests: Magick.NET v14 ErrorMetric.Fuzz now detects previously-invisible image differences

## Summary

After upgrading Magick.NET-Q8-AnyCPU from v13.5.0 to v14.10.4 (PR #34455), 1523 UI screenshot tests fail. The root cause is that `ErrorMetric.Fuzz` changed its calculation in v14 and now reports much higher distortion values for the same image pairs. This reveals that the CI screenshots have been **genuinely different** from the baselines all along — v13 just couldn't see it.

## Root Cause Analysis

### ErrorMetric.Fuzz behavioral change

For the same pair of images (e.g., `VerifyEntryClearButtonVisibilitySetToWhileEditing.png` baseline vs CI screenshot):

| Metric | v13 | v14 |
|--------|-----|-----|
| `ErrorMetric.Fuzz` | **0.00248** (0.25%) | **0.08129** (8.13%) |
| `ErrorMetric.RootMeanSquared` | N/A | **0.00248** (0.25%) |

v14's `RootMeanSquared` produces the **exact same values** as v13's `Fuzz`. This means v14's Fuzz is now a fundamentally different (more sensitive) metric.

### The images ARE different

The baseline and CI screenshots are **not identical** — they contain real visual differences. For example, in `VerifyEntryClearButtonVisibilitySetToWhileEditing`:
- **Baseline (in repo)**: Clear button X icon is **grey**
- **CI screenshot**: Clear button X icon is **black**

Similar color shifts (grey→black, subtle color changes) exist across many test screenshots. v13's `Fuzz` metric was insensitive enough that these differences fell below the 0.5% threshold. v14's `Fuzz` now correctly detects them.

## Current Workaround (PR #34455)

In PR #34455, we switched the default `ErrorMetric` from `Fuzz` to `RootMeanSquared` to preserve v13 behavior:

```csharp
// MagickNetVisualComparer.cs - was ErrorMetric.Fuzz
public MagickNetVisualComparer(ErrorMetric errorMetric = ErrorMetric.RootMeanSquared, ...)

// MagickNetVisualDiffGenerator.cs - was ErrorMetric.Fuzz  
public MagickNetVisualDiffGenerator(ErrorMetric error = ErrorMetric.RootMeanSquared)
```

This makes all tests pass again but **hides real differences** that v14's Fuzz can now detect.

## Recommendation: Switch back to ErrorMetric.Fuzz

We should take advantage of v14's improved sensitivity. This would require:

### 1. Revert the ErrorMetric workaround

```csharp
// MagickNetVisualComparer.cs - switch back to Fuzz
public MagickNetVisualComparer(ErrorMetric errorMetric = ErrorMetric.Fuzz, double differenceThreshold = 0.005)

// MagickNetVisualDiffGenerator.cs - switch back to Fuzz
public MagickNetVisualDiffGenerator(ErrorMetric error = ErrorMetric.Fuzz)
```

### 2. Regenerate all baseline screenshots

Since the images ARE different, the baselines need updating to match what CI actually produces. This is a large batch operation:

```bash
# Find all snapshot directories
find src/Controls/tests -type d -name "snapshots"
# src/Controls/tests/TestCases.Android.Tests/snapshots/android/
# src/Controls/tests/TestCases.Android.Tests/snapshots/android-notch-36/
# src/Controls/tests/TestCases.iOS.Tests/snapshots/ios/
# src/Controls/tests/TestCases.iOS.Tests/snapshots/ios-26/
# src/Controls/tests/TestCases.Mac.Tests/snapshots/mac/
# src/Controls/tests/TestCases.WinUI.Tests/snapshots/windows/
```

The baseline regeneration needs to happen on CI infrastructure (not locally) since the baselines must match the CI environment exactly.

### 3. Investigate why baselines differ from CI

The fact that baselines differ from CI screenshots means either:
- Baselines were generated on different OS/device versions than CI currently uses
- Rendering has subtly changed over time (Android API updates, iOS version changes)
- Some baselines were generated locally with different DPI/scaling

This is worth investigating to prevent drift in the future.

### 4. Consider adjusting the threshold

If regenerating all baselines isn't practical immediately, the threshold could be increased from 0.5% to accommodate the new Fuzz metric. However, this reduces test sensitivity and is not recommended long-term.

To find the right threshold, analyze the v14 Fuzz values across all failing tests and pick a value that passes the "same but slightly different" cases while still catching real regressions.

## Verification Script

To compare how v13 and v14 treat any pair of images:

```csharp
using ImageMagick;

var baseline = new MagickImage("baseline.png");
var actual = new MagickImage("actual.png");

// v14 Fuzz (new, more sensitive)
double fuzz = baseline.Compare(actual, ErrorMetric.Fuzz, Channels.Red);

// v14 RootMeanSquared (equivalent to v13 Fuzz)
double rms = baseline.Compare(actual, ErrorMetric.RootMeanSquared, Channels.Red);

Console.WriteLine($"Fuzz (v14 behavior):           {fuzz:P4}");
Console.WriteLine($"RootMeanSquared (v13 compat):  {rms:P4}");
```

## Files involved

- `src/TestUtils/src/VisualTestUtils.MagickNet/MagickNetVisualComparer.cs` — comparison metric
- `src/TestUtils/src/VisualTestUtils.MagickNet/MagickNetVisualDiffGenerator.cs` — diff generation metric
- `src/Controls/tests/TestCases.Shared.Tests/UITest.cs` — test framework that uses these
- `src/Controls/tests/TestCases.*/snapshots/` — all baseline screenshot directories

## Related

- PR #34455 — Magick.NET v14 upgrade (contains the RootMeanSquared workaround)


Metric	v13	v14
`ErrorMetric.Fuzz`	0.00248 (0.25%)	0.08129 (8.13%)
`ErrorMetric.RootMeanSquared`	N/A	0.00248 (0.25%)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Screenshot tests: Magick.NET v14 ErrorMetric.Fuzz now detects previously-invisible image differences #34524

Summary

Root Cause Analysis

ErrorMetric.Fuzz behavioral change

The images ARE different

Current Workaround (PR #34455)

Recommendation: Switch back to ErrorMetric.Fuzz

1. Revert the ErrorMetric workaround

2. Regenerate all baseline screenshots

3. Investigate why baselines differ from CI

4. Consider adjusting the threshold

Verification Script

Files involved

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Screenshot tests: Magick.NET v14 ErrorMetric.Fuzz now detects previously-invisible image differences #34524

Description

Summary

Root Cause Analysis

ErrorMetric.Fuzz behavioral change

The images ARE different

Current Workaround (PR #34455)

Recommendation: Switch back to ErrorMetric.Fuzz

1. Revert the ErrorMetric workaround

2. Regenerate all baseline screenshots

3. Investigate why baselines differ from CI

4. Consider adjusting the threshold

Verification Script

Files involved

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions