Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses critical issues related to character encoding and symbol naming within the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request effectively addresses issues with character encoding in semantic data positions and sanitizes internal symbol names for valid UTF-8 output. The conversion to UTF-16 code units for positions is a crucial fix for handling non-BMP characters correctly. The changes are consistently applied, and the new tests provide good coverage for the fixes. I have one suggestion regarding the implementation of the symbol name sanitization to improve clarity and align it more closely with its documented purpose.
…e symbol names The AST encoder converts node positions to UTF-16 code units via positionMap.UTF8ToUTF16(), but the semantic data (node2sym, node2type) was using raw UTF-8 byte offsets. This caused symbol/type lookup failures for source files containing non-BMP characters (e.g. emoji) where UTF-8 and UTF-16 offsets diverge. Also sanitize internal symbol names by replacing the \xFE prefix (InternalSymbolNamePrefix) with "__" to ensure valid UTF-8 output.
99452c4 to
36c8879
Compare
Summary
positionMap.UTF8ToUTF16(), matching the AST encoder's position encoding. This fixes symbol/type lookup failures for source files containing non-BMP characters (e.g. emoji 💀) where UTF-8 and UTF-16 offsets diverge.\xFEprefix (InternalSymbolNamePrefix) with__to ensure valid UTF-8 output for consumers.snapshotSemantictest helper to also use UTF-16 positions for lookups.Related Links
@rslint/tsgo0.3.1 upgrade in web-infra-dev/lltsChecklist