Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR adds UTF-16 encoding support for source maps to correctly handle multi-byte Unicode characters. Source maps use UTF-16 code units for column positions, so this change ensures proper handling of characters like emojis and CJK characters that occupy multiple UTF-16 code units.
- Added
utf16_len()methods toRopeandSourceTexttrait for calculating UTF-16 length - Modified
WithIndicesto build index mappings based on UTF-16 code units instead of UTF-8 characters - Updated column offset calculations throughout to use UTF-16 lengths instead of byte lengths
- Added comprehensive test coverage for UTF-16 handling in multiple modules
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| src/with_indices.rs | Modified character indexing logic to handle UTF-16 code units and updated test to use emoji characters |
| src/rope.rs | Added utf16_len() method to calculate rope length in UTF-16 code units |
| src/replace_source.rs | Updated column offset calculations to use UTF-16 lengths and added UTF-16 handling test |
| src/helpers.rs | Added utf16_len() to SourceText trait, updated final column calculations, and added comprehensive UTF-16 tests |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
CodSpeed Performance ReportMerging #198 will degrade performances by 11.21%Comparing Summary
Benchmarks breakdown
Footnotes
|
9b65c00 to
adace86
Compare
2b20f05 to
de1a5d0
Compare
de1a5d0 to
3939bc2
Compare
This PR adds UTF-16 encoding support for source maps to correctly handle multi-byte Unicode characters. Source maps use UTF-16 code units for column positions, so this change ensures proper handling of characters like emojis and CJK characters that occupy multiple UTF-16 code units.
utf16_len()methods toRopeandSourceTexttrait for calculating UTF-16 lengthWithIndicesto build index mappings based on UTF-16 code units instead of UTF-8 characters