Improve NormalizedFilePath by kokobd · Pull Request #453 · haskell/lsp

kokobd · 2022-09-02T03:26:33Z

No description provided.

pepeiborra · 2022-09-02T05:04:34Z

lsp-types/src/Language/LSP/Types/Uri/OsPath.hs

+
+We always store UTF-8 encoded file path in 'NormalizedFilePath'. This function first convert 'OsPath'
+to 'FilePath' using current system encoding (see 'decodeFS'), then convert 'FilePath' to an UTF-8
+encoded 'ShortByteString'. Due to the encoding conversion, this function can fail. But DO NOTE THAT
+encoding mismatch doesn't always mean an exception will be thrown. [Possibly your encoding simply won't
+throw exception on failure](https://hackage.haskell.org/package/base-4.17.0.0/docs/src/GHC.IO.Encoding.html#initFileSystemEncoding).
+Possibly the conversion function can't find any invalid byte sequence, giving a sucessful but wrong result.


Why do we need OsPath at all?

Why do we not store an OsPath directly in NormalizedFilePath?

Storing OsPath in NormalizedFilePath will make the conversion functions from/to FilePath having a MonadThrow constraint. Then we need to modify plenty of code in HLS to adapt it. And we can not avoid encoding/decoding completely before we adopt OsPath everywhere in HLS, so using OsPath has no significant benefit.

Since conversion functions to/from OsPath in lsp-types is provided now, we can make the migration to OsPath in HLS gradually when GHC 9.6 is released (or maybe when 9.6 becomes the oldest version of GHC we support? then we can avoid plenty of CPP)

Storing OsPath in NormalizedFilePath will make the conversion functions from/to FilePath having a MonadThrow constraint.

I don't understand, why can't you just write something like the below that instantiates the MonadThrow constraint and immediately fails:

fromNormalizedFilePath (NormalizedFilePath _ osPath) = either (error . show) id (OsPath.decodeWith localEncoding osPath)

I don't see how this can ever error assuming that decoding is the inverse of encoding.

And we can not avoid encoding/decoding completely before we adopt OsPath everywhere in HLS, so using OsPath has no significant benefit.

Maybe it would help clarify what is the adoption plan for HLS. Are you saying that this is an intermediate state? Then lay out the plan in the comment so that we can ensure it gets executed when the time comes!

why can't you just write something like the below that instantiates the MonadThrow constraint and immediately fails:

The idea is to avoid partial functions, and report errors properly.

I don't see how this can ever error assuming that decoding is the inverse of encoding.

Yes, but toNormalizedFilePath can fail if a given Char is not representable in the system encoding.

Then lay out the plan in the comment so that we can ensure it gets executed when the time comes!

Sure.

And note that OsPath is system-dependent. On Windows, it uses UTF16, which consumes more memory. On Unix, it depends on the system locale. Using OsPath inside NormalizedFilePath will lead to significantly different memory footprints on different systems.

In summary, OsPath doesn't provide enough benefits compared to UTF-8 encoded ShortByteString now for these reasons:

can't avoid encoding/decoding, just like ShortByteString

consume more memory on Windows and other non-UTF-8 systems.

incurs partial functions

1 and 3 are solvable when OsPath becomes mainstream. 2 is worth discussing when we decide to use OsPath. So, I'm really not so sure about whether this is an intermediate situation or not. It will be more clear in the future.

In summary, OsPath doesn't provide enough benefits compared to UTF-8 encoded ShortByteString now for these reasons.

Thanks, that clarifies the current choices a little bit for me. So this change is not about migrating to OsPath, but to a compact encoding.

Then I would vouch to simplify and use just the text library. To encode, use Text.pack and then extract the Array from inside, and to decode wrap in a Text constructor and call. Or simplify and just store a Text value.

This is compact only when text-2.0 is used, and semi-compact in earlier versions of the text library. I think that's fine, still a massive improvement over Strings.

Interesting, the results are quite close. (SBS stands for ShortByteString). This is using `text` prior to 2.0 because some dependencies don't support the latest `text`, including stylish-haskell and floskell.

fendor

Drive-by comments :)

lsp-types/src/Language/LSP/Types/Uri/OsPath.hs

lsp-types/lsp-types.cabal

kokobd · 2022-09-02T11:47:40Z

lsp-types/src/Language/LSP/Types/Uri.hs

-encodeFilePath = BS.toShort . T.encodeUtf8 . T.pack
+-- | Convert 'FilePath' to a UTF-8 encoded 'ShortByteString'
+encodeFilePath :: FilePath -> ShortByteString
+encodeFilePath = BS.pack . UTF8.encode


My local benchmark results show that this takes more time than BS.toShort . T.encodeUtf8 . T.pack in most cases. Quite surprising. Waiting for benchmark results on CI. haskell/haskell-language-server#3067

Ok, maybe not that surprising given that text is likely much more optimised than utf8-string.

The best option here would be to leverage the fact that text-2.0 is internally isomorphic to a ShortByteString plus a pair of ints for slicing, and encoded in UTF-8. So you could just do:

#if min_version_text(2,0) import Data.ByteString.Short import qualified Data.Text as Text import Data.Text.Array(Array(..)) import Data.Text.Internal (Text(Text)) ... encodeFilePath :: FilePath -> ShortByteString encodeFilePath fp = case Text.pack fp of Text (ByteArray sbs) _ _ -> SBS sbs decodeFilePath :: ShortByteString -> FilePath decodeFilePath (SBS fp) = Text.unpack $ Text (ByteArray fp) 0 (length fp)

michaelpj

Thanks everyone for the great discussion 👍

* improve NormalizedFilePath * add adoption plan * try Text * finalize

pepeiborra reviewed Sep 2, 2022

View reviewed changes

fendor reviewed Sep 2, 2022

View reviewed changes

lsp-types/src/Language/LSP/Types/Uri/OsPath.hs Outdated Show resolved Hide resolved

lsp-types/src/Language/LSP/Types/Uri/OsPath.hs Outdated Show resolved Hide resolved

kokobd force-pushed the improve-os-path-doc branch from 7b38c20 to 4a009b3 Compare September 2, 2022 08:36

kokobd changed the title ~~add more docs to NormalizedFilePath~~ Improve NormalizedFilePath Sep 2, 2022

kokobd mentioned this pull request Sep 2, 2022

use OsPath in NormalizedFilePath #446

Merged

michaelpj reviewed Sep 2, 2022

View reviewed changes

lsp-types/lsp-types.cabal Show resolved Hide resolved

improve NormalizedFilePath

77a4340

kokobd force-pushed the improve-os-path-doc branch from 4a009b3 to 77a4340 Compare September 2, 2022 08:44

kokobd commented Sep 2, 2022

View reviewed changes

kokobd added 3 commits September 3, 2022 13:03

add adoption plan

9350d8d

try Text

3b7c659

finalize

9ab88de

kokobd added merge me and removed merge me labels Sep 4, 2022

michaelpj approved these changes Sep 4, 2022

View reviewed changes

kokobd added the merge me label Sep 4, 2022

mergify bot merged commit b0f8596 into haskell:master Sep 4, 2022

kokobd mentioned this pull request Sep 8, 2022

Improve performance of NormalizedFilePath haskell/haskell-language-server#3067

Merged

thomasjm pushed a commit to codedownio/lsp that referenced this pull request Nov 3, 2022

Improve NormalizedFilePath (haskell#453)

5971e5f

* improve NormalizedFilePath * add adoption plan * try Text * finalize

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve NormalizedFilePath#453

Improve NormalizedFilePath#453
mergify[bot] merged 4 commits intohaskell:masterfrom
kokobd:improve-os-path-doc

kokobd commented Sep 2, 2022

Uh oh!

pepeiborra Sep 2, 2022

Uh oh!

pepeiborra Sep 2, 2022

Uh oh!

kokobd Sep 2, 2022

Uh oh!

pepeiborra Sep 2, 2022

Uh oh!

kokobd Sep 3, 2022

Uh oh!

pepeiborra Sep 3, 2022 •

edited

Loading

Uh oh!

kokobd Sep 3, 2022 •

edited

Loading

Uh oh!

fendor left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kokobd Sep 2, 2022

Uh oh!

pepeiborra Sep 2, 2022

Uh oh!

michaelpj left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

kokobd commented Sep 2, 2022

Uh oh!

pepeiborra Sep 2, 2022

Choose a reason for hiding this comment

Uh oh!

pepeiborra Sep 2, 2022

Choose a reason for hiding this comment

Uh oh!

kokobd Sep 2, 2022

Choose a reason for hiding this comment

Uh oh!

pepeiborra Sep 2, 2022

Choose a reason for hiding this comment

Uh oh!

kokobd Sep 3, 2022

Choose a reason for hiding this comment

Uh oh!

pepeiborra Sep 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kokobd Sep 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fendor left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kokobd Sep 2, 2022

Choose a reason for hiding this comment

Uh oh!

pepeiborra Sep 2, 2022

Choose a reason for hiding this comment

Uh oh!

michaelpj left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

pepeiborra Sep 3, 2022 •

edited

Loading

kokobd Sep 3, 2022 •

edited

Loading

fendor left a comment •

edited

Loading