Skip to content

Code style to convert byte arrays to UTF8 strings#60647

Merged
davidwengier merged 47 commits intodotnet:mainfrom
davidwengier:UseUTF8Strings
May 3, 2022
Merged

Code style to convert byte arrays to UTF8 strings#60647
davidwengier merged 47 commits intodotnet:mainfrom
davidwengier:UseUTF8Strings

Conversation

@davidwengier
Copy link
Copy Markdown
Member

Relates to #58848
Part of #60582

I've never created an analyzer and fixer from scratch before, nor a code style analyzer, so please let me know where, not if, I've gone wrong.

@davidwengier davidwengier requested a review from a team as a code owner April 8, 2022 13:11
@ghost ghost added the Area-IDE label Apr 8, 2022
return null;
}

private static string GetStringLiteralRepresentation(Rune rune)
Copy link
Copy Markdown
Member

@Youssef1313 Youssef1313 Apr 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private static bool TryReplaceChar(char c, out string replaceWith)
{
replaceWith = null;
switch (c)
{
case '\\':
replaceWith = "\\\\";
break;
case '\0':
replaceWith = "\\0";
break;
case '\a':
replaceWith = "\\a";
break;
case '\b':
replaceWith = "\\b";
break;
case '\f':
replaceWith = "\\f";
break;
case '\n':
replaceWith = "\\n";
break;
case '\r':
replaceWith = "\\r";
break;
case '\t':
replaceWith = "\\t";
break;
case '\v':
replaceWith = "\\v";
break;
}
if (replaceWith != null)
{
return true;
}
if (NeedsEscaping(CharUnicodeInfo.GetUnicodeCategory(c)))
{
replaceWith = "\\u" + ((int)c).ToString("x4");
return true;
}
return false;
}

This is a similar method (not public from the compiler layer, so it cannot be reused). But it does more work:

     if (NeedsEscaping(CharUnicodeInfo.GetUnicodeCategory(c))) 
     { 
         replaceWith = "\\u" + ((int)c).ToString("x4"); 
         return true; 
     } 

So wondering if the extra work is necessary for this feature? I think the answer is no (I think difference is whether we output things like emojis as \uXXXX vs the character itself). I'm not sure so noting that you can confirm.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i do think there is merit in using either \u or \U for things like control/formatting characters. Those will likely show up very weird once in string-literal form.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I deliberately didnt do anything (yet?) about \u because it's not that simple. My HalfEmoji test is a good example. That byte array should be "\uD83D", but whilst that is a valid string literal, "\uD83D"u8 is not a valid UTF8 string literal.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also worth mentioning similar code to this exists in IVirtualCharService.TryGetEscapeCharacter but it was slightly painful to use. We could definitely expand that system if we wanted to (allow it to take in strings, not just syntax, and produce escaped code as well as consuming it) and use it here instead of direct Rune usage potentially

Copy link
Copy Markdown
Contributor

@AlekseyTs AlekseyTs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes under Compilers LGTM (commit 45)

Copy link
Copy Markdown
Contributor

@cston cston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compiler changes LGTM.

# Conflicts:
#	src/Features/CSharp/Portable/CSharpFeaturesResources.resx
@davidwengier davidwengier enabled auto-merge (squash) May 3, 2022 03:06
@davidwengier davidwengier merged commit 424bd8e into dotnet:main May 3, 2022
@ghost ghost added this to the Next milestone May 3, 2022
@davidwengier davidwengier deleted the UseUTF8Strings branch May 3, 2022 05:51
333fred added a commit that referenced this pull request May 4, 2022
…ures/required-members

* upstream/main: (368 commits)
  Use new helpers
  Cleanup
  Update src/EditorFeatures/CSharpTest/StringCopyPaste/StringCopyPasteCommandHandlerTests.cs
  Restore
  Add docs
  Fix
  Update Tools|Options code ordering to match waht is in the UI
  Use correct directory for loghub logs (#61104)
  Move usings on paste off the UI thread (#61092)
  Create shared helper to handle finding extensions in analyzer references
  Code style to convert byte arrays to UTF8 strings (#60647)
  Fixup
  Enable semantic token LSP tests + re-enable quick info tests (#61098)
  Simplify initialization logic
  Simplify common CWT pattern
  Simplify common CWT pattern
  Simplify common CWT pattern
  Simplify common CWT pattern
  Simplify common CWT pattern
  Simplify common CWT pattern
  ...
333fred added a commit to 333fred/roslyn that referenced this pull request May 13, 2022
…o poison

* upstream/features/required-members: (369 commits)
  Update tests after merge
  Use new helpers
  Cleanup
  Update src/EditorFeatures/CSharpTest/StringCopyPaste/StringCopyPasteCommandHandlerTests.cs
  Restore
  Add docs
  Fix
  Update Tools|Options code ordering to match waht is in the UI
  Use correct directory for loghub logs (dotnet#61104)
  Move usings on paste off the UI thread (dotnet#61092)
  Create shared helper to handle finding extensions in analyzer references
  Code style to convert byte arrays to UTF8 strings (dotnet#60647)
  Fixup
  Enable semantic token LSP tests + re-enable quick info tests (dotnet#61098)
  Simplify initialization logic
  Simplify common CWT pattern
  Simplify common CWT pattern
  Simplify common CWT pattern
  Simplify common CWT pattern
  Simplify common CWT pattern
  ...
@Cosifne Cosifne modified the milestones: Next, 17.3 P2 May 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.