Skip to content

Use Unicode for OutputEncoding#49721

Merged
tmat merged 3 commits intodotnet:masterfrom
tmat:IWEncondingWorkaround
Dec 4, 2020
Merged

Use Unicode for OutputEncoding#49721
tmat merged 3 commits intodotnet:masterfrom
tmat:IWEncondingWorkaround

Conversation

@tmat
Copy link
Member

@tmat tmat commented Dec 2, 2020

Fixes #47571, #48874 and https://dev.azure.com/devdiv/DevDiv/_workitems/edit/1231577.

It's not clear why using UTF8 does not work (issue https://dev.azure.com/devdiv/DevDiv/_workitems/edit/1253106 is tracking further investigation) but it seems using Unicode does work.
Except when the CLR reports stack overflow error message - it does not encode it properly (see dotnet/runtime#45503).
To avoid printing garbled text we detect known CLR messages and transcode the output back to ASCII.

@tmat tmat requested a review from a team as a code owner December 2, 2020 04:07
@tmat
Copy link
Member Author

tmat commented Dec 3, 2020

@CyrusNajmabadi @davidwengier I had to add another workaround (extra commit). PTAL.

Copy link
Member

@davidwengier davidwengier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's surprising to me that you have to write all of this code yourself. Is that just to avoid allocations that would happen if you just passed everything through System.Encoding like you do in EncodeMarker?

I'm approving though, as I trust that you're right with this stuff, more than I understand it :D

@tmat
Copy link
Member Author

tmat commented Dec 4, 2020

It's surprising to me that you have to write all of this code yourself. Is that just to avoid allocations that would happen if you just passed everything through System.Encoding like you do in EncodeMarker?

I was wondering the same thing. There are 2 operations:

  1. finding the marker in the buffer
    The outer loop needs to be (memory) efficient since the REPL can pump a lot of output to the Interactive Window. So we don't want to allocate too much there. That's why we reuse the buffer. Once we find the marker we don't care about memory since the marker indicates process crash, so we don't expect any more output than what CLR prints for stack overflow. When we find the first character of the marker in the buffer we need to match the whole marker in the buffer - that still needs to be efficient since it could be a valid input. We need to fill the buffer from the reader with the length of the marker because it might not have been all read in yet. ReadAll does that. Then we compare the buffer content with the marker (ArraysEqual).

  2. When we hit the marker we need to transcode the rest of the text coming out of the process. .NET Framework just prints out Process is terminated due to StackOverflowException but Core CLR prints out more info (including stack trace). All this output needs to be transcoded. The transcoding itself can't be done with existing Encoding class since no encoding exists that decodes 2 ASCII bytes stuffed into a UTF16 codepoint to an ASCII character. That's why we need custom Transcode impl.

…rvice.cs

Co-authored-by: David Wengier <david.wengier@microsoft.com>
@tmat tmat merged commit 9165a1d into dotnet:master Dec 4, 2020
@ghost ghost added this to the Next milestone Dec 4, 2020
@tmat tmat deleted the IWEncondingWorkaround branch December 4, 2020 23:09
333fred added a commit to 333fred/roslyn that referenced this pull request Dec 5, 2020
* upstream/master: (143 commits)
  Use Unicode for OutputEncoding (dotnet#49721)
  Fix tests
  Remove unnecessary parens when converting switch statement ot expression.
  Fix flaky test
  Simplify implementation of IsOrInGenericType helper and reduce code duplication. (dotnet#49763)
  Increase CoreCLR dump information
  NRT
  Be resilient to a null-document in the code-folding service.
  Change cache implementation from dictionary to list
  Fix APIs for retrieving solution snapshot from Razor, UT (dotnet#49591)
  Handles optional parameters
  Feature/disambiguate using keyword (dotnet#48898)
  Check all overloads when inferring first argument in an invocation with nothing typed
  Remove optional parameter
  lint
  Clean up option handling
  Remove double blank line
  Add + update tests
  Ensure completion and completion resolve handler always pass the same options
  Change name of telemetry
  ...
@tmat tmat modified the milestones: Next, 16.9.P2, 16.9 Dec 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

IOException setting Console.OutputEncoding

4 participants