Search results for "UTF-8 with BOM" files shifted on first line by a character#66189
Search results for "UTF-8 with BOM" files shifted on first line by a character#66189roblourens merged 3 commits intomasterfrom unknown repository
Conversation
There was a problem hiding this comment.
If it's a UTF-8 with BOM file and _fullPreviewRange.startLineNumber is 0, _fullPreviewLines[0] will starts with BOM, then _fullPreviewLines does not match _fullPreviewRange.
There was a problem hiding this comment.
But the BOM should be stripped already by ripgrepTextSearchEngine right?
There was a problem hiding this comment.
I can move it to ripgrepTextSearchEngine.
There was a problem hiding this comment.
But it's already there, right? That file should be stripping the BOM correctly in all cases with your change.
There was a problem hiding this comment.
Why change this? The BOM is 3 bytes long.
There was a problem hiding this comment.
Because there is toString() here, Buffer.from([0xEF, 0xBB, 0xBF]).toString().length is 1.
|
I found another solution: Remove then ripgrep will not output BOM character in JSON. Or I can manually strip the BOM character in ripgrepTextSearchEngine. Which do you prefer? |
|
I want to keep that because I found that search was faster in some cases without it. Also, you could probably search with a different encoding but still end up finding results in UTF8 files with BOMs. Also I don't trust ripgrep to keep that behavior forever. So, let's keep the check to strip the BOM on our end. |
|
I stripped |
|
Looks great, thanks! |
Fix #66188
This is my first pull request in this project. Please let me know if my solution is not good enough, I am willing to improve it.