Skip to content

Linear getByteString #76

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 28, 2015
Merged

Conversation

bitonic
Copy link
Contributor

@bitonic bitonic commented May 27, 2015

getByteString exhibited quadratic behaviour -- a classic case of bad "appending". This commit fixes it.

@bitonic bitonic force-pushed the linear-getByteString branch from bea0981 to b6cf044 Compare May 27, 2015 22:40
@bitonic
Copy link
Contributor Author

bitonic commented May 27, 2015

Note that this is probably better done with a Builder -- but it's getting late and I wanted to make sure to submit this :).

@bitonic bitonic force-pushed the linear-getByteString branch from b6cf044 to bf8eb3e Compare May 27, 2015 23:01
Chains of `B.append`s were being created by repeated calls to
`demandInput`.

Try the following program, which writes and read 100MB, to appreciate
the difference:

```
import qualified Data.ByteString as BS
import qualified Data.ByteString.Lazy as BSL
import Data.Binary (encode, decode)
import Data.Char (ord)

main :: IO ()
main = do

  let inBs = BS.replicate 100000000 (fromIntegral $ ord 'a')
  BSL.writeFile "bs.bin" (encode inBs)
  putStrLn "writing done"

  bin <- BSL.readFile "bs.bin"
  -- This takes around 30 seconds and causes more than 10GB to be
  -- allocated.
  let outBs = decode bin
  print $ inBs == outBs
```
@bitonic bitonic force-pushed the linear-getByteString branch from bf8eb3e to e46b7b6 Compare May 28, 2015 08:59
@kolmodin
Copy link
Member

Hey! Thanks for the fix. Indeed, there's a problem with ensureN if the chunks are small and a lot of data is requested.
It's essentially the same issue as #73 which was reported a little while ago. I've been working on a fix for that bug, but not quite ready to submit it.
Your patch fixes the problem, but makes ensureN recursive which we want to avoid since it no longer will get inlined.

I'm not sure why a Builder would be better to use, all you need is to concat some ByteStrings.

@bitonic
Copy link
Contributor Author

bitonic commented May 28, 2015

Wasn't the previous ensureN recursive too?

@kolmodin
Copy link
Member

Part of it was inlineable. That part is used whenever there is enough input (99% of the time).
The fallback when there isn't enough input was recursive.

...by making the common case (when the input is big enough) non-recursive.
@bitonic
Copy link
Contributor Author

bitonic commented May 28, 2015

I pushed a commit that makes ensureN inlinable in the way you mention.

@kolmodin
Copy link
Member

Thanks! I'll have a look later today.

@kolmodin kolmodin merged commit f947333 into haskell:master May 28, 2015
hvr added a commit to ghc/ghc that referenced this pull request Jun 1, 2015
Quoting the changelog, this pulls in the following fixes:

binary-0.7.5.0
--------------

- Fix performance bug that was noticable when you get a big strict ByteString
  and the input to the decoder consists of many small chunks.
    - haskell/binary#73
    - haskell/binary#76
- Fix memory leak when decoding Double and Float.
    - Commit 497a181c083fa9faf7fa3aa64d1d8deb9ac76ecb
- We now require QuickCheck >= 2.8. Remove our version of arbitrarySizedNatural.

binary-0.7.4.0
--------------

- Some invalid UTF-8 strings caused an exception when decoded. Those errors will
  now now fail in the Get monad instead. See issue 70.
  Patch contributed by @ttuegel.
hvr added a commit to ghc/ghc that referenced this pull request Jun 9, 2015
Quoting the changelog, this pulls in the following fixes:

binary-0.7.5.0
--------------

- Fix performance bug that was noticable when you get a big strict ByteString
  and the input to the decoder consists of many small chunks.
    - haskell/binary#73
    - haskell/binary#76
- Fix memory leak when decoding Double and Float.
    - Commit 497a181c083fa9faf7fa3aa64d1d8deb9ac76ecb
- We now require QuickCheck >= 2.8. Remove our version of arbitrarySizedNatural.

binary-0.7.4.0
--------------

- Some invalid UTF-8 strings caused an exception when decoded. Those errors will
  now now fail in the Get monad instead. See issue 70.
  Patch contributed by @ttuegel.

(cherry picked from commit 7dd0ea7)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants