use OsPath in NormalizedFilePath#446
Conversation
✅ Pull request refreshed |
|
@mergify refresh |
✅ Pull request refreshed |
|
I added a dimension to the test matrix, but Mergify is waiting for the old one. Strange. |
|
|
||
| toNormalizedFilePath :: FilePath -> NormalizedFilePath | ||
| toNormalizedFilePath fp = normalizedFilePath nuri nfp | ||
| toNormalizedFilePath :: MonadThrow m => OsPathCompat -> m NormalizedFilePath |
There was a problem hiding this comment.
@michaelpj This change leads to about 70 compile errors in HLS. Do we have to fix them one by one?
There was a problem hiding this comment.
So the failure here ultimately comes from decodeUtf, right? Which ultimately represents the fact that a filepath is bytes and a URI should be text. So this is a meaningful failure. We could offer a partial version that really throws, but that also seems like a footgun for us in future.
The only other alternative I can see is to have URI/NormalizedUri also contain an OsPath i.e. bytes rather than text. Unclear how bad that would be.
bea0b16 to
02fe4a8
Compare
|
@michaelpj After some rewrites, I'm finally happy with the current version. The most notable change is that now we always store a UTF-8 encoded And please only approve if you feel this is good. Let Mergify squash and merge it for us. |
michaelpj
left a comment
There was a problem hiding this comment.
Looking plausible! I worry that maybe this extra encoding/decoding might lose some of the benchmark improvements you were getting?
| - name: Cabal configure | ||
| shell: bash | ||
| run: | | ||
| if [ ${{ matrix.ospath }} = "true" ]; then |
There was a problem hiding this comment.
Maybe we should include this as a flag in the package definition? force-ospath or something?
| Right v' -> | ||
| return (NormalizedFilePath (internalNormalizedFilePathToUri v') v) | ||
|
|
||
| encodeFilePath :: String -> ShortByteString |
| fromNormalizedFilePath :: NormalizedFilePath -> FilePath | ||
| fromNormalizedFilePath (NormalizedFilePath _ fp) = fp | ||
| -- | Extracts 'FilePath' from 'NormalizedFilePath'. | ||
| -- The function is total. The 'HasCallStack' constraint is added for debugging purpose only. |
There was a problem hiding this comment.
This is a little confusing as it immediately calls error! I get the argument: the invariant of NFP is that it contains a correctly encoded string so this should never fail.
However, I also bet we call this a lot. What would happen if we also cached the decoded filepath in NormalizedFilePath? Would it blow out memory usage?
There was a problem hiding this comment.
Yes, memory usage is the point of using ShortByteString or OsPath instead of FilePath. Despite the encoding/decoding, both CPU time and memory usage decreased in benchmarks!
I saw most use sites of toNormalizedFilePath just extracts the FilePath and pass it to some IO function, so the CPU is not the bottleneck.
| normalizedFilePathToOsPath :: MonadThrow m => NormalizedFilePath -> m OsPath | ||
| normalizedFilePathToOsPath = unsafePerformIO' . encodeFS . fromNormalizedFilePath | ||
|
|
||
| unsafePerformIO' :: (MonadThrow m, NFData a) => IO a -> m a |
|
|
||
| normalizedFilePathSpec :: Spec | ||
| normalizedFilePathSpec = do | ||
| normalizedFilePathSpec = beforeAll (setFileSystemEncoding utf8) $ do |
There was a problem hiding this comment.
Does something actually fail if you don't do this?
There was a problem hiding this comment.
Yes, default file system encoding ignores the failure. See: https://hackage.haskell.org/package/base-4.17.0.0/docs/src/GHC.IO.Encoding.html#initFileSystemEncoding

So in the real world, it won't throw any exception. (But I want to test the exception)
| Constructs 'NormalizedFilePath' from 'OsPath'. Throws 'IOException' if the conversion fails. | ||
| -} | ||
| osPathToNormalizedFilePath :: MonadThrow m => OsPath -> m NormalizedFilePath | ||
| osPathToNormalizedFilePath = fmap toNormalizedFilePath . unsafePerformIO' . decodeFS |
There was a problem hiding this comment.
I'm getting a little confused, would you mind writing a comment somewhere explaining the deal with the various encodings? The idea is something like: we always decode OsPaths using the system encoding into a string, and then we re-encode them as UTF8 to store them in NFP?
There was a problem hiding this comment.
I have missed all the discussion in the PR, so I need a comment explaining why we don't do the obvious thing (which would be storing the OsPath inside NormalizedFilePath).
|
Oops, I missed the label! oh well, if you want to address any of the comments you can do a follow-up |
Does this still use old directory/base for file operations? Because all of them do encoding/decoding at the FFI boundary. The only thing you'll gain is lower memory footprint due to avoiding String. Can't say which one has the bigger impact. |
|
| normalizedFilePath :: NormalizedUri -> FilePath -> NormalizedFilePath | ||
| normalizedFilePath nuri nfp = NormalizedFilePath nuri nfp | ||
| decodeFilePath :: ShortByteString -> Either UnicodeException String | ||
| decodeFilePath = fmap T.unpack . T.decodeUtf8' . BS.fromShort |
There was a problem hiding this comment.
We can always speed this up with something like utf8-string or rolling our own utf-8 decoding
There was a problem hiding this comment.
| osPathToNormalizedFilePath :: MonadThrow m => OsPath -> m NormalizedFilePath | ||
| osPathToNormalizedFilePath = fmap toNormalizedFilePath . unsafePerformIO' . decodeFS |
There was a problem hiding this comment.
Instead of decodeFS you could use the pure decodeWith and isolate the unsafePerformIO in a single global:
osPathToNormalizedFilePath :: MonadThrow m => OsPath -> Either EncodingException NormalizedFilePath
osPathToNormalizedFilePath = fmap toNormalizedFilePath . decodeWith systemEnc
{-# NOINLINE systemEnc #-}
systemEnc :: TextEncoding
systemEnc = unsafePerformIO getFileSystemEncoding
* use filepath-compat * make NormalizedFilePath use OsPath * remove filepath-compat, use ShortByteString instead * fix ci * enable OS_PATH * upgrade ghc & don't set OS_PATH for ghc 8.6.5 in CI * remove outdated comment * skip a bad ci combination * extract OsPath related CPP to a standalone module * fix OsPath.Compat * add empty NormalizedUri and NormalizedFilePath * add partial function for compatibility & use decodeFS * run stylish-haskell * lock index-state * bump versions of lsp and lsp-types * fix self version bound * always use utf8 ShortByteString in NormalizedFilePath * revert test changes * remove OsPath.Compat * bump lsp-test version according to pvp * add test cases for OsPath * fix test case * add some doc comments
Closes #445. Current status:
filepath-compatto avoid CPP, and always useOsPathinNormalizedFilePathNormalizedFilePath. One forFilePath, another forOsPath