Skip to content

Add comprehensive fuzz testing for all diff algorithms and operations#42

Merged
Byron merged 6 commits intomasterfrom
copilot/add-fuzz-testing-for-diff-algorithms
Dec 30, 2025
Merged

Add comprehensive fuzz testing for all diff algorithms and operations#42
Byron merged 6 commits intomasterfrom
copilot/add-fuzz-testing-for-diff-algorithms

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Dec 7, 2025

Implements cargo-fuzz based fuzz testing to verify robustness of all diff algorithms, postprocessing, and unified diff output against arbitrary inputs.

Changes

Fuzz Targets

  • comprehensive_diff: All three algorithms (Myers, Histogram, MyersMinimal) with postprocessing and unified diff output
  • diff_compute_with: Lower-level token-based API with direct token sequence manipulation
  • postprocess_heuristics: Postprocessing with no heuristic, line heuristic, and custom indent heuristic
  • unified_diff_printer: Unified diff output with varying context lengths and format validation

CI Integration

  • Added fuzz job running all targets for 45s each (~3min total)
  • Uses nightly toolchain with explicit contents: read permissions

Documentation

  • fuzz/README.md with target descriptions and local execution instructions
  • Updated main README with testing section

Each target validates invariants (e.g., hunk ranges, diff line prefixes) and exercises all major code paths to catch panics and undefined behavior.

Original prompt

Devise a fuzz test that uses cargo fuzz to fuzz every major algorithm exposed by this crate. This includes all diff algorithms, post-processing, and unidifed diff printing.
Make sure that CI also runs the fuzz test for no more than 3 minutes.


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI changed the title [WIP] Add fuzz testing for major algorithms in the crate Add comprehensive fuzz testing for all diff algorithms and operations Dec 7, 2025
Copilot AI requested a review from Byron December 7, 2025 16:09
@Byron Byron force-pushed the copilot/add-fuzz-testing-for-diff-algorithms branch 2 times, most recently from b510e57 to cec3965 Compare December 7, 2025 19:18
@Byron
Copy link
Copy Markdown
Collaborator

Byron commented Dec 7, 2025

What do you think of this @pascalkuthe?

I wanted it to be able to quickly build a test-suite that runs the code through its paces. It already found two issues, one is fixed and the other one occurs when doing a unified diff with this input:

        Input {
            before: "\n\nA\n\n\n\n",
            after: "",
            context_len: 4294967295,
        }

With a short runtime and without state I don't think CI is going to find more than the roughest of issues, but in theory the local state could be stored.
Ideally, this is run locally for a couple of hours to be reasonable sure all code features are tested and won't panic.

@Byron Byron force-pushed the copilot/add-fuzz-testing-for-diff-algorithms branch from cec3965 to 5271f61 Compare December 7, 2025 19:27
@Byron Byron requested a review from pascalkuthe December 7, 2025 19:28
@Byron Byron added the question Further information is requested label Dec 7, 2025
@pascalkuthe
Copy link
Copy Markdown
Owner

looks nice good addition I think but we would have to get it to actually pass to merge. The one issue you found is regarding that minimum on the tab_width? That seems like a user error (defining tab width to be 0 is not meaningful). I think that is probably the type of issue fuzzing will find most often which you could argue are mainly documentation issue (maybe could use an assert) and then need a fix in the fuzzer definition (possibly similiar for the huge context width altoguh that one is probably a more realistic usecase). Will try to look at this sometime to try and get it to pass, thanks

@Byron
Copy link
Copy Markdown
Collaborator

Byron commented Dec 20, 2025

Thank you, that's great to hear!

Generally I think it's not desirable for a library to panic even on input that makes no sense. Imagine a usecase where clients pass in user data unchecked. They may at least expect an error, which is usually handled gracefully, but panics are then bringing down the whole application. Either it's prevented by the typesystem, or it must be a runtime error. Maybe this whole category could be called 'numerical safety'.

In any case, I hope you find some to fix the context_len related issue, which in the simplest case could be a capping or rejection. It probably doesn't have to be larger than X, which then would naturally be safe to use even on 32 bit platforms.

If you want me to tackle it that way, please let me know @pascalkuthe .

@Byron Byron mentioned this pull request Dec 20, 2025
Byron and others added 5 commits December 30, 2025 10:05
Co-authored-by: Byron <63622+Byron@users.noreply.github.com>
That would panic, so we set it to at least 1.
It's a bit unclear why this really happens, but this is a first fix
to allow using fuzz tests.
@Byron Byron force-pushed the copilot/add-fuzz-testing-for-diff-algorithms branch from 5271f61 to ca8d48c Compare December 30, 2025 09:12
@Byron Byron marked this pull request as ready for review December 30, 2025 09:18
@Byron Byron merged commit 2824e45 into master Dec 30, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

question Further information is requested

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants