Improve DebuggerDisplay of decision dags by RikkiGibson · Pull Request #53892 · dotnet/roslyn

RikkiGibson · 2021-06-04T22:56:16Z

To date it had been somewhat difficult to understand the meaning of a decision dag on-the-fly while debugging, especially from the context of visiting a specific dag node.

This PR tries to improve the situation by taking the behavior of decisionDag.Dump() and moving it into DebuggerDisplay for individual dag nodes.

It may help to review the commits individually.

6459ecf focuses on introducing DebuggerDisplay to the nodes while preserving the actual format of the dump.
95047f3 introduces the condensed format.

For the following switch statement:

void M(object obj)
{
    switch (obj)
    {
        case "a":
            Console.Write("b");
            break;
        case string { Length: 1 } s:
            Console.Write(s);
            break;
        case int and < 42:
            Console.Write(43);
            break;
        case int i when (i % 2) == 0:
            obj = i + 1;
            break;
        default:
            Console.Write(false);
            break;
    }
}

The original Dump on the bound decision dag looks like this:

State 0
  Test: ?DagTypeTest(t0 is string)
  WhenTrue: 1
  WhenFalse: 8
State 1
  Test: t1=DagTypeEvaluation(t1 as string)
  Next: 2
State 2
  Test: ?DagValueTest(t1 == ConstantValueString("a": String))
  WhenTrue: 3
  WhenFalse: 4
State 3
  Case: case "a":case "a":
State 4
  Test: t2=DagPropertyEvaluation(t2)
  Next: 5
State 5
  Test: ?DagValueTest(t2 == ConstantValueOne(1: Int32))
  WhenTrue: 6
  WhenFalse: 13
State 6
  WhenClause: 
  WhenTrue: 7
State 7
  Case: case string { Length: 1 } s:case string { Length: 1 } s:
State 8
  Test: ?DagTypeTest(t0 is int)
  WhenTrue: 9
  WhenFalse: 13
State 9
  Test: t3=DagTypeEvaluation(t3 as int)
  Next: 10
State 10
  Test: ?DagRelationalTest(t3 < ConstantValueI32(42: Int32))
  WhenTrue: 11
  WhenFalse: 12
State 11
  Case: case int and < 42:case int and < 42:
State 12
  WhenClause: (i % 2) == 0
  WhenTrue: 14
  WhenFalse: 13
State 13
  Case: defaultswitch (obj)
        {
            case "a":
                Console.Write("b");
                break;
            case string { Length: 1 } s:
                Console.Write(s);
                break;
            case int and < 42:
                Console.Write(43);
                break;
            case int i when (i % 2) == 0:
                obj = i + 1;
                break;
            default:
                Console.Write(false);
                break;
        }
State 14
  Case: case int i when (i % 2) == 0:case int i when (i % 2) == 0:

The new Dump looks like:

[0]: t0 is string ? [1] : [8]
[1]: t1 = (string)t0; [2]
[2]: t1 == "a" ? [3] : [4]
[3]: leaf `case "a":`
[4]: t2 = t1.Length; [5]
[5]: t2 == 1 ? [6] : [13]
[6]: when <true> ? [7] : <unreachable>
[7]: leaf `case string { Length: 1 } s:`
[8]: t0 is int ? [9] : [13]
[9]: t3 = (int)t0; [10]
[10]: t3 < 42 ? [11] : [12]
[11]: leaf `case int and < 42:`
[12]: when ((i % 2) == 0) ? [14] : [13]
[13]: leaf `default`
[14]: leaf `case int i when (i % 2) == 0:`

Note that in addition to being more compact it also fixes a few bugs with the visualization, notably that we correctly show the input temp to an evaluation node like t1 = (string)t0.

Hovering on the TopologicallySortedNodes in IDE:

Hovering on a specific dag node:

We may want to consider whether there is a better way to denote that a node's input is the original input besides using the identifier t0, but it seems not too difficult to catch on to that.

src/Compilers/CSharp/Portable/BoundTree/BoundDagTest.cs

src/Compilers/CSharp/Portable/BoundTree/BoundDecisionDag.cs

src/Compilers/CSharp/Portable/BoundTree/BoundDecisionDagNode.cs

src/Compilers/CSharp/Portable/Binder/DecisionDagBuilder.cs

jcouv · 2021-06-04T23:50:26Z

src/Compilers/CSharp/Portable/Binder/DecisionDagBuilder.cs

+                    evaluation.Id = tempIdentifier(evaluation);
+                    if (evaluation.Input.Source is { } source)
+                    {
+                        source.Id = tempIdentifier(source);


Why do we need to do this? I would have assumed (by virtue of topological sort) that all sources (ie. previous temps) would have already been assigned an identifier. Feels like we may be assigning identifiers more than once.
Consider adding assertions to prevent double set. #Closed

Have added a comment. Sometimes the evaluations seem to be the same object instance, other times not, and I'm not sure why, but I do expect these equivalent evaluations to get de-duped during lowering.

I think this needs more understanding. We should only have to process nodes forwards. I don't understand why we'd ever have to look back.
What breaks if this is removed?

Same question on case below (which labels its input on line 680).

Interesting effects from deleting the source.Id assignments here:

If we delete only the one for BoundDagEvaluation then all the tests actually pass

If we delete only the one for BoundDagTest then 1 of the tests fails

If we delete both then 3 of the tests fail

It feels like sometimes the "source" evaluation objects are shared, other times not. Perhaps a matter of convenience during dag construction. I did poke around for a while in binding and lowering of the dags to get a better grasp on this but don't feel 100% on how de-duping works. De-duping seems to sometimes be dependent on the presence/content of when clauses as well.

I would also emphasize that the assumption made here is the same as the assumption made in the original dump code, i.e. it always uses the evaluation object as a dictionary key to look up the id, relying on its equality implementation in order to get the same answer for different instances with equivalent values. I feel like both of the assignments should just be left there since there probably exists a scenario where the absence of just one of them results in uninitialized ids.

One scenario which breaks when we remove these assignments: DecisionDag_Dump_SwitchStatement_01

[1]: t1 = (string)t0; [2] [2]: t1 == ""a"" ? [3] : [4] [3]: leaf `case ""a"":` -[4]: t2 = t1.Length; [5] +[4]: t2 = <uninitialized>.Length; [5] [5]: t2 == 1 ? [6] : [13] [6]: when <true> ? [7] : <unreachable> [7]: leaf `case string { Length: 1 } s:`

I'll give this a try locally.

I stepped through DecisionDag_Dump_SwitchStatement_01 and I think your change is right.

src/Compilers/CSharp/Portable/BoundTree/BoundDecisionDagNode.cs

jcouv

Done with review pass (iteration 2)
The new format is much more readable. Love it! Having the identifiers baked in seems particularly useful.

jcouv

Done with review pass (iteration 8)

alrz · 2021-06-06T08:59:41Z

src/Compilers/CSharp/Test/Semantic/Semantics/PatternMatchingTests4.cs

+            var binder = model.GetEnclosingBinder(@switch.SpanStart);
+            var boundSwitch = (BoundSwitchStatement)binder.BindStatement(@switch, BindingDiagnosticBag.Discarded);
+            AssertEx.Equal(
+@"[0]: t0 == null ? [1] : [2]


Another thing I find really useful is the tree dump of the final dag using rootDecisionDagNode.Dump() after:

roslyn/src/Compilers/CSharp/Portable/Binder/DecisionDagBuilder.cs

Line 652 in 35ebde2

var rootDecisionDagNode = decisionDag.RootNode.Dag;

I usually find myself using that instead - it's easier to reason about and it reflects the final codegen better.
However, currently it uses the generic tree dumper with a lot of noise which makes it hard to see what's going on.
Can we possibly mix this two representations and have a custom dumper impl that only prints relevant fields?

Interesting, do you think you could mock up a before/after of the dump that contains the information you want? I guess I am wondering if it seems like we should, for example, repeatedly print certain nodes in the dump when there are multiple ways to reach that node. It also feels like we would want to cut out all the "source" properties in this representation since those basically always point "backwards".

Something like:

testDecisionDagNode ├─test: <debugger display> ├─whenTrue │ └─evaluationDecisionDagNode │ ├─evaluation: <debugger display> │ └─next | └─testDecisionDagNode | ├─test: <debugger display> | ├─whenTrue | | └─evaluationDecisionDagNode | | ├─evaluation: <debugger display> | | └─next | | └─testDecisionDagNode | | ├─test: <debugger display> | | ├─whenTrue: leaf | | └─whenFalse ...

Instead of:

testDecisionDagNode ├─test │ └─dagNonNullTest │ ├─isExplicitTest: True │ ├─input │ │ └─dagTemp │ │ ├─type: X │ │ ├─source │ │ ├─index: 0 │ │ └─hasErrors: False │ └─hasErrors: False ├─whenTrue │ └─evaluationDecisionDagNode ...

I think I won't have time to do this myself but would be happy to accept a change which changes the implementation of Dump() on dag nodes to work in this manner.

src/Compilers/CSharp/Portable/BoundTree/BoundDagEvaluation.cs

jcouv

Done with review pass (iteration 10)

cston · 2021-06-07T18:31:11Z

src/Compilers/CSharp/Portable/BoundTree/BoundDagTest.cs

+                        }
+                        first = false;
+                        result += $"Item{param.Ordinal + 1}";
+                    }


Minor: perhaps

var parameters = d.DeconstructMethod.Aggregate( "", (s, p) => $"{(s.Length == 0 ? "" : ", ")}Item{p.Ordinal + 1}");

cston · 2021-06-07T18:37:24Z

src/Compilers/CSharp/Portable/BoundTree/BoundDecisionDagNode.cs

+
+        internal new string GetDebuggerDisplay()
+        {
+            var pooledBuilder = PooledStringBuilder.GetInstance();


var pooledBuilder = PooledStringBuilder.GetInstance();

Perhaps just create a new StringBuilder() instead, to avoid the extra complexity evaluating in the debugger.

jcouv

LGTM Thanks (iteration 11)

RikkiGibson added 2 commits June 4, 2021 12:41

Use DebuggerDisplay with BoundDecisionDag.Dump

6459ecf

Condense debugger display of bound dag nodes

95047f3

RikkiGibson requested a review from a team as a code owner June 4, 2021 22:56

ghost added the Area-Compilers label Jun 4, 2021