Search results for "UTF-8 with BOM" files shifted on first line by a character · Pull Request #66189 · microsoft/vscode

ghost · 2019-01-08T00:53:02Z

This is my first pull request in this project. Please let me know if my solution is not good enough, I am willing to improve it.

msftclas · 2019-01-08T00:53:13Z

All CLA requirements met.

roblourens · 2019-01-08T17:49:57Z

src/vs/workbench/parts/search/common/searchModel.ts

If it's a UTF-8 with BOM file and _fullPreviewRange.startLineNumber is 0, _fullPreviewLines[0] will starts with BOM, then _fullPreviewLines does not match _fullPreviewRange.

But the BOM should be stripped already by ripgrepTextSearchEngine right?

I can move it to ripgrepTextSearchEngine.

But it's already there, right? That file should be stripping the BOM correctly in all cases with your change.

roblourens · 2019-01-08T17:52:00Z

src/vs/workbench/services/search/node/ripgrepTextSearchEngine.ts

Why change this? The BOM is 3 bytes long.

https://github.com/Microsoft/vscode/blob/7c8361ef698d9ed491612cd786952aba2ab47c87/src/vs/workbench/services/search/node/ripgrepTextSearchEngine.ts#L258

Because there is toString() here, Buffer.from([0xEF, 0xBB, 0xBF]).toString().length is 1.

Ok, that's correct thanks.

ghost · 2019-01-10T00:47:35Z

I found another solution: Remove && options.encoding !== 'utf8'

https://github.com/Microsoft/vscode/blob/f0f3b922bcb081c6488b7299dbef4076a1cfde82/src/vs/workbench/services/search/node/ripgrepTextSearchEngine.ts#L333

then ripgrep will not output BOM character in JSON.

Or I can manually strip the BOM character in ripgrepTextSearchEngine.

Which do you prefer?

roblourens · 2019-01-10T01:38:40Z

I want to keep that because I found that search was faster in some cases without it. Also, you could probably search with a different encoding but still end up finding results in UTF8 files with BOMs. Also I don't trust ripgrep to keep that behavior forever. So, let's keep the check to strip the BOM on our end.

ghost · 2019-01-10T06:40:27Z

I stripped fullText at
https://github.com/Microsoft/vscode/blob/c22caf616a9d6baa90ab904292bddc14a7b09ffd/src/vs/workbench/services/search/node/ripgrepTextSearchEngine.ts#L285
So it's not need to edit searchModel.ts.

roblourens · 2019-01-10T17:04:41Z

Looks great, thanks!

RMacfarlane assigned roblourens Jan 8, 2019

roblourens reviewed Jan 8, 2019

View reviewed changes

Fix #66188

c22caf6

Victorique Ko added 2 commits January 10, 2019 16:19

Strip utf-8 bom from matchText

4f151da

Index can't be negative

0be5df1

roblourens mentioned this pull request Jan 10, 2019

Links overflow the description comment box microsoft/vscode-pull-request-github#806

Closed

roblourens merged commit 8c2cef9 into microsoft:master Jan 10, 2019

github-actions bot locked and limited conversation to collaborators Jul 25, 2020

Conversation

ghost commented Jan 8, 2019

Uh oh!

msftclas commented Jan 8, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ghost commented Jan 10, 2019

Uh oh!

roblourens commented Jan 10, 2019

Uh oh!

ghost commented Jan 10, 2019 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

roblourens commented Jan 10, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

msftclas commented Jan 8, 2019 •

edited

Loading

ghost commented Jan 10, 2019 •

edited by ghost

Loading