Some data formats (git pack files) store not the amount of compressed data, but the size that data is uncompressed. One would suppose that using the streaming-commons toolkit would easily handle this case, it does not seem to.
While it does decompress the correct amount, it does not return the unused input to be read again as demonstrated by the following example program:
{-# LANGUAGE OverloadedStrings #-}
module Main where
import Data.Bits
import Control.Monad.Trans
import qualified Codec.Compression.Zlib as Z
import qualified Data.ByteString as BS
import qualified Data.ByteString.Lazy as BSL
import qualified Data.ByteString.Builder as BSB
import qualified Data.Text as T
import qualified Data.Text.Encoding as TE
import Data.Conduit (($$), (=$))
import qualified Data.Conduit as C
import qualified Data.Conduit.Binary as CB
import qualified Data.Conduit.Zlib as CZ
main :: IO ()
main = do
let c = TE.encodeUtf8 "This data is stored compressed."
let u = TE.encodeUtf8 "This data isn't."
let encoded = writeExample c u
(c', u') <- CB.sourceLbs encoded $$ readExample
print (c, u)
print (c', u')
putStrLn $ "Input and output matched: " ++ show (c==c' && u==u')
readExample :: C.Sink BS.ByteString IO (BS.ByteString, BS.ByteString)
readExample = do
sbs <- CB.take 4
let size = case map fromIntegral . BSL.unpack $ sbs of
[s0, s1, s2, s3] -> (s3 `shiftL` 24) .|. (s2 `shiftL` 16) .|. (s1 `shiftL` 8) .|. s0
_ -> error "We really had to read 4 octets there."
-- We know how large it should decompress to, but not how large it is compressed.
-- We proced to decompress untill we have decompressed enough data.
c <- (CZ.decompress CZ.defaultWindowBits) =$ (CB.take size)
-- Immediately following the compressed stream is more data we need.
u <- CB.sinkLbs
return (BSL.toStrict c, BSL.toStrict u)
writeExample :: BS.ByteString -> BS.ByteString -> BSL.ByteString
writeExample cdata udata =
let c = Z.compress . BSL.fromStrict $ cdata
in BSB.toLazyByteString . mconcat $
[ BSB.int32LE . fromIntegral . BS.length $ cdata -- We record the size of the uncompressed data
, BSB.lazyByteString c -- but store it compressed.
, BSB.byteString udata -- Then we store other important data with no delimination.
]
example output:
("This data is stored compressed.","This data isn't.")
("This data is stored compressed.","")
Input and output matched: False
Some data formats (git pack files) store not the amount of compressed data, but the size that data is uncompressed. One would suppose that using the streaming-commons toolkit would easily handle this case, it does not seem to.
While it does decompress the correct amount, it does not return the unused input to be read again as demonstrated by the following example program:
example output: