add support for inline diagnostic snapshotting for `mdtest`

## Summary

Add support for managing inline snapshots of Red Knot diagnostics to mdtests.

## Background

In astral-sh/ruff#15945, we added support for diagnostic snapshotting to [Red Knot's Markdown test framework](https://github.com/astral-sh/ruff/blob/main/crates/red_knot_test/README.md). Basically, the snapshotting support lets you write something like this:

``````
## Unresolvable module import



```py
import zqzqzqzqzqzqzq  # error: [unresolved-import] "Cannot resolve import `zqzqzqzqzqzqzq`"
```
``````

And then if you run:

```
$ cargo insta test -p red_knot_python_semantic -- mdtest__
```

the test runner will produce a snapshot of any diagnostics emitted by Red Knot on the Python code. This snapshot is then managed as an _external_ file by [`insta`](https://docs.rs/insta).

This works decently well, and it's better than _not_ having something like this, but it kinda inhibits the locality that is otherwise a huge benefit to mdtests. Instead of the diagnostic output being right there in the Markdown file, it's actually in a different snapshot file.

Now, `insta` does support inline snapshots. For example, in _Rust code_, you can do this:

```rust
insta::assert_snapshot!(
    some_op_that_returns_a_string(),
    @"",
);
```

And when you run `cargo insta test`, it will automatically manage the string on the right side of the assertion, _in your source file_, just like it does for external snapshots.

However, we want inline snapshots inside of Markdown files, which, AFAIK, `insta` does not support.

## Proposed design

### `mdtest` interaction

I don't think [this design](https://github.com/astral-sh/ruff/blob/main/crates/red_knot_test/README.md#asserting-on-full-diagnostic-output) (originally @carljm AIUI) has explicit consensus from everyone on the Red Knot team (although I'm not aware of any specific objections to it), so it might require some iteration.

The idea here is that each mdtest would support an optional `output` code fence block (at most 1). And when present, the contents of that `output` block would be treated as an inline diagnostic snapshot. So the above example might look like this:

``````
## Unresolvable module import

```py
import zqzqzqzqzqzqzq  # error: [unresolved-import] "Cannot resolve import `zqzqzqzqzqzqzq`"
```

```output
error: lint:unresolved-import
 --> /home/andrew/astral/ruff/play/diags/play/test.py:1:8
  |
1 | import zqzqzqzqzqzqzq  # error: [unresolved-import] "Cannot resolve import `zqzqzqzqzqzqzq`"
  |        ^^^^^^^^^^^^^^ Cannot resolve import `zqzqzqzqzqzqzq`
  |
```
``````

In contrast to the mdtest README, I propose starting with just one `output` code fence block that captures the full diagnostic output of _all_ preceding _and_ following Python files. This is how the existing external snapshotting mechanism works. We can consider adding more granular testing in the future, but I think starting with something that just always captures the full diagnostic output is simplest and most useful for now.

## Snapshot management

I think this is probably the hardest part of this task and has the least thought put into it. So this section is pretty vague. But a critical part of snapshot testing is the tooling that surrounds the management of the snapshots. `cargo insta` is a decent model to follow. Everyone here at Astral uses it, and AFAIK, folks generally like it.

Inline snapshots do have a bit of a conceptually simpler model than external snapshots. For example, you don't need to worry about "unreferenced" snapshots (snapshot files that exist but which aren't connected to any actual test). But I believe these are the minimal operations that need to be supported:

* An actual assertion needs to be made to check whether the expected output matches the generated output. Bonus points for making an assertion failure look nice with a pleasant diff.
* When an assertion fails, the snapshot tooling provides a convenient way to update the _expected output_ inside the mdtest file with the output that was generated.

With that said, having to use two different snapshot test tools is undeniably annoying. So I think the gold standard implementation here would be to find a way to make this use case work with `cargo insta` itself. Possibly by working with upstream. (cc @mitsuhiko) I wouldn't be surprised if this wasn't feasible though. My guess is that `cargo insta`'s inline snapshot mechanism is probably tightly coupled with Rust source code (which makes sense).

If we can't make integration with `cargo insta` work, then here is what I propose as a starting point:

* Assertions use the [`similar`](https://docs.rs/similar/) crate (or something like it) to produce nice output when a snapshot test fails.
* An environment variable (e.g., `MDTEST_SNAPSHOT_UPDATE`) specifies how to deal with snapshot test failures. It has two values: `no` and `always`. The default is `no`, which results in a test failure when the expected and generated outputs do not match. When set to `always`, the snapshot in the file is rewritten with whatever the generated output is and the test always passes.

Notably absent from this proposal is any sort of review mechanism. I think this is a _nice to have_. Instead, since these are inline snapshots only, we can treat the output of `cargo insta test -p red_knot_python_semantic -- mdtest_` as our review mechanism. And if we only want to accept a single snapshot, we can do something like, `MDTEST_SNAPSHOT_UPDATE=always cargo insta test -p red_knot_python_semantic -- mdtest__diagnostics_invalid_argument_type`. This workflow is _not_ as nice as `cargo insta` because it doesn't de-couple test execution from snapshot updating. And it also doesn't permit updating individual `output` blocks, since I believe all mdtests in a single file get grouped together into a single Rust `#[test]` execution. So... there are definitely some shortcomings, but I think this might be a good state for an MVP that we can iterate on.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add support for inline diagnostic snapshotting for `mdtest` #195

Summary

Background

Proposed design

`mdtest` interaction

Snapshot management

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

add support for inline diagnostic snapshotting for mdtest #195

Description

Summary

Background

Proposed design

mdtest interaction

Snapshot management

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

add support for inline diagnostic snapshotting for `mdtest` #195

`mdtest` interaction