5
\$\begingroup\$

upd: I am very sorry about my mistake. The old version of data is download from luogu, but the website do NOT allow me to download the data of the worst case. In fact, when facing same amount of operations, the memory usage depends on the type of operations and their orders. I re-generate some data, and update the link and profiling logs.


I am new in Haskell, and I try to implement an AVL tree. To test my code, I submit it to an Online Judge, which has a problem of balanced-tree. However, I get a MLE (Memory Limit Exceeded).

In fact, when facing 10^5 times of operations, my code requires 58 MB memories, while native C++ code only need about 2 to 3 MB. The speed of C++ code is also 10x faster.

I wonder if haskell can deal with such large data structure while having performance close to native code. What should I do?

Here is my code:

type Info = (Int, Int)
data Tree a = Empty | Ord a => Node (Tree a) a Info (Tree a)

just :: Maybe a -> a
just (Just a) = a

height :: Tree a -> Int
height Empty = 0
height (Node _ _ (h,_) _) = h

size :: Tree a -> Int
size Empty = 0
size (Node _ _ (_,s) _) = s

maintainInfo :: Tree a -> Tree a
maintainInfo (Node ls v _ rs) = Node ls v (max (height ls) (height rs) + 1, size ls + size rs + 1) rs

rotateL :: Tree a -> Tree a
rotateL (Node (Node ll lv _ lr) v _ rs) = 
    maintainInfo (Node ll lv (0,0) (maintainInfo (Node lr v (0,0) rs)))

rotateR :: Tree a -> Tree a
rotateR (Node ls v _ (Node rl rv _ rr)) = 
    maintainInfo (Node (maintainInfo (Node ls v (0,0) rl)) rv (0,0) rr)

rotateL_ :: Tree a -> Tree a
rotateL_ (Node (Node ll lv _ lr) v _ rs)
    | height ll >= height lr  = rotateL (Node (Node ll lv (0,0) lr) v (0,0) rs)
    | otherwise               = rotateL (Node (rotateR (Node ll lv (0,0) lr)) v (0,0) rs)

rotateR_ :: Tree a -> Tree a
rotateR_ (Node ls v _ (Node rl rv _ rr))
    | height rl > height rr   = rotateR (Node ls v (0,0) (rotateL (Node rl rv (0,0) rr)))
    | otherwise               = rotateR (Node ls v (0,0) (Node rl rv (0,0) rr))

maintain :: Tree a -> Tree a
maintain (Node ls v info rs)
    | height ls > height rs + 1 = rotateL_ (Node ls v info rs)
    | height rs > height ls + 1 = rotateR_ (Node ls v info rs)
    | otherwise                 = maintainInfo (Node ls v info rs)

insert :: Ord a => a -> Tree a -> Tree a
insert val Empty = Node Empty val (1,1) Empty
insert val (Node ls v _ rs)
    | val < v     = maintain (Node (insert val ls) v (0,0) rs)
    | val >= v    = maintain (Node ls v (0,0) (insert val rs))

iter :: Tree a -> [a]
iter Empty = []
iter (Node ls v _ rs) = iter ls ++ v : iter rs

maxi :: Tree a -> Maybe a
maxi Empty                 = Nothing
maxi (Node _ v _ Empty)    = Just v
maxi (Node _ _ _ rs)     = maxi rs

mini :: Tree a -> Maybe a
mini Empty                 = Nothing
mini (Node Empty v _ _)    = Just v
mini (Node ls _ _ _)     = mini ls

eraseSwap :: Tree a -> Tree a
eraseSwap Empty = Empty
eraseSwap (Node ls _ _ Empty) = ls
eraseSwap (Node ls _ _ rs) = 
    let next = just (mini rs) in 
        maintain (Node ls next (0,0) (erase next rs))

erase :: a -> Tree a -> Tree a
erase _ Empty = Empty
erase val (Node ls v _ rs)
    | val < v   = maintain (Node (erase val ls) v (0,0) rs)
    | val > v   = maintain (Node ls v (0,0) (erase val rs))
    | val == v  = eraseSwap (Node ls v (0,0) rs)

next :: a -> Tree a -> Maybe a
next _ Empty = Nothing
next val (Node ls v _ rs) 
    | val < v   = Just (maybe v (min v) (next val ls))
    | val >= v  = next val rs

prev :: a -> Tree a -> Maybe a
prev _ Empty = Nothing
prev val (Node ls v _ rs)
    | val <= v  = prev val ls
    | val > v   = Just (maybe v (max v) (prev val rs))

rank :: a -> Tree a -> Int
rank _ Empty = 1
rank val (Node ls v _ rs)
    | val <= v  = rank val ls
    | val > v   = size ls + 1 + rank val rs

select :: Int -> Tree a -> Maybe a
select _ Empty = Nothing
select k (Node ls v _ rs)
    | size ls >= k      = select k ls
    | size ls == k - 1  = Just v
    | otherwise         = select (k - size ls - 1) rs

update :: [Int] -> Tree Int -> Tree Int
update [1,x] = insert x 
update [2,x] = erase x

ask :: [Int] -> Tree Int -> Int 
ask [3,x] tree = rank x tree
ask [4,x] tree = just (select x tree)
ask [5,x] tree = just (prev x tree)
ask [6,x] tree = just (next x tree)

solve :: Int -> Tree Int -> IO ()
solve 0 _ = return ()
solve n tree = do
    inputQuery <- getLine
    let l = map read (words inputQuery) :: [Int]
    if head l < 3 then do 
        solve (n-1) (update l tree)
    else do
        print (ask l tree)
        solve (n-1) tree
    return ()

main :: IO ()
main = do 
    inputN <- getLine
    let n = read inputN :: Int
    solve n Empty
    return ()

I humbly accept all suggestions, including those that are not aimed at performance but at coding style.


To test my code, I submit it to Luogu P3369, which is in Chinese. The simple translation is as follow:

You need to dynamically maintain a re-set M and provide the following operations:

  1. Insert a number x into M.
  2. Delete a number x from M (if there are multiple identical numbers, only delete one).
  3. Query how many numbers in M are smaller than x, and add one to the answer.
  4. Query the number that ranks in the x-th position after M is sorted from small to large.
  5. Query the predecessor of x in M (the predecessor is defined as the largest number that is smaller than x).
  6. Query the successor of x in M (the successor is defined as the smallest number that is greater than x).
  • For operations 3, 5, and 6, it is not guaranteed that the number x exists in the current re-set.
  • The first line is n, which indicates the number of operations. Each of the following n lines has two numbers opt and x. opt indicates the sequence number of the operation (1 <= opt <= 6)
  • For operations 3,4,5,6, each line outputs a number, indicating the corresponding answer.

If anyone has difficulty to login luogu, here is also some data files to download:

Here is the output of ghc -O2 -rtsopts main.hs && ./main < avl.in > tmp.out +RTS -s:

   1,579,041,840 bytes allocated in the heap
     309,624,880 bytes copied during GC
      19,305,064 bytes maximum residency (21 sample(s))
         186,640 bytes maximum slop
              58 MiB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       360 colls,     0 par    0.324s   0.329s     0.0009s    0.0055s
  Gen  1        21 colls,     0 par    0.419s   0.425s     0.0202s    0.0500s

  INIT    time    0.001s  (  0.001s elapsed)
  MUT     time    0.897s  (  0.907s elapsed)
  GC      time    0.743s  (  0.754s elapsed)
  EXIT    time    0.000s  (  0.009s elapsed)
  Total   time    1.641s  (  1.671s elapsed)

  %GC     time       0.0%  (0.0% elapsed)

  Alloc rate    1,759,782,364 bytes per MUT second

  Productivity  54.7% of total user, 54.3% of total elapsed
\$\endgroup\$
0

1 Answer 1

9
\$\begingroup\$

As I said in a comment, I can't verify that your program as written uses 40MB - your profiling data and my own runs of your program show 3MB, in the same range as the C++ performance you refer to. Even with -O0, I see only 6MB residency for your input file with 10k lines.

So I can't make any recommendations that are guaranteed to greatly reduce memory usage: it looks reduced enough to me already. But the first few things I would try are:

Improve strictness

  1. Add {-# LANGUAGE Strict #-} to the top of the file. You don't need any laziness, and this will help prevent space leaks. When I try this I see no change in memory usage, and only a 5% or so speedup (not really statistically significant, I imagine)
  2. Replace your Info tuple with a real data type. This will let the runtime avoid using the existing tuple type, which is of course lazy, and use a type that's equivalent but can be strict (thanks to the Strict language extension).
  3. Replace all of your (0,0) expression with a reference to a top-level constant defined as untracked = Info 0 0. I think you did this in the version of this question you posted to Stack Overflow, calling it nilInfo instead. It's a good idea, so you don't waste time building this same boring value over and over. When I combine these 3 improvements, I again see the same memory usage, and a larger but still not significant speedup.

Reduce use of dummy Info values

Next, I note that many of your uses of this untracked "dummy Info" value are followed immediately by a call to maintainInfo, which ignores the Info field and constructs its own. So you are building a Tree node just to tear it down and build a new one. Instead, define a function that takes all the other Tree fields as parameters, and constructs a Node with the appropriate Info:

mkNode :: Ord a => Tree a -> a -> Tree a -> Tree a
mkNode ls v rs = Node ls v (Info (max (height ls) (height rs) + 1)
                                 (size ls + size rs + 1))
                      rs

The call sites are easy to update: just replace maintainInfo (Node x y untracked z) with mkNode x y z.

You should be able to do something similar with the rotate family of functions, and maintain, all of which ignore Info on the nodes they are given, and construct new Info values based on their subtrees after rotation, but still waste time constructing Info metadata to pass around to each other. That's more surgery than I'm interested in doing myself for this answer, though.

Participate in Prelude typeclasses

Instead of bespoke functions iter, maxi, and mini, I suggest participating in Foldable. Its maximum and minimum functions don't match signature with your maxi and mini - you return a Maybe, in the modern style, while these functions with much older specifications simply crash when given an empty list. Fortunately, you never call maxi at all, and you assume that mini always returns a Just anyway. So you might as well make your Tree Foldable. This won't help performance at all, but it's just a good practice.

extremum :: String -> (forall a. Tree a -> Tree a) -> (forall a. Ord a => Tree a -> a)
extremum name next = impl
  where impl Empty = error (name <> ": empty Tree")
        impl n = go n
        go n = case next n of
          Empty -> let Node _ v _ _ = n
                   in v
          n' -> go n'

instance Foldable Tree where
  foldMap _ Empty = mempty
  foldMap f (Node ls v _ rs) = foldMap f ls <> f v <> foldMap f rs
  maximum = extremum "maximum" $ \(Node _ _ _ rs) -> rs
  minimum = extremum "minimum" $ \(Node ls _ _ _) -> ls
  length = size

I am being more "clever" than necessary with the extremum type signature and implementation. If you don't understand it or don't like it, a simpler way to implement maximum and minimum would be fine of course - it's nothing critical.

More idiomatic use of comparisons in guard clauses

In erase, you compare two values three times, checking for <, then for >, then for ==. Instead, use compare, which returns a type with 3 values, so you can simply fall into the right case directly.

erase :: a -> Tree a -> Tree a
erase _ Empty = Empty
erase val (Node ls v _ rs) = case compare val v of
  LT -> maintain (Node (erase val ls) v untracked rs)
  GT -> maintain (Node ls v untracked (erase val rs))
  EQ -> eraseSwap (Node ls v untracked rs)

In many other functions, you compare with two mutually exclusive conditions, such as >= and then <. That could be replaced with compare in a similar way (match LT in the first case and _ for the other case), but it is more common to use otherwise instead. Either way, you remove a redundant comparison and make the code easier to understand.

next :: a -> Tree a -> Maybe a
next _ Empty = Nothing
next val (Node ls v _ rs)
    | val < v   = Just (maybe v (min v) (next val ls))
    | otherwise = next val rs

Your select implementation can avoid some redundant function calls by using a where clause to span all the guards:

select :: Int -> Tree a -> Maybe a
select _ Empty = Nothing
select k (Node ls v _ rs)
    | lSize >= k      = select k ls
    | lSize == k - 1  = Just v
    | otherwise       = select (k - lSize - 1) rs
  where lSize = size ls

Consider a different data structure

AVL trees are a fine data structure to implement for an exercise like this. But if you are worried about performance, they are not famous for being the fastest tree to keep balanced. Their invariant is very strict, and so a lot of time is spent keeping them perfectly balanced, while a data structure with a more relaxed invariant can let the trees get a little lopsided without letting them get too unbalanced. Red/black trees are another classic, simple example, and will spend less time maintaining balance at the cost of a slightly deeper average tree.

Or you could use some types from the standard library. Data.IntMap should support all the features you need, for example, and I imagine it's more efficient than anything you could do yourself. Or there's probably a fun way to build this yourself out of a FingerTree, which would be somewhere in between doing it all yourself and having it all done for you. It depends on how much you want to do by hand for the exercise, and how much you're happy to have handled for you.

\$\endgroup\$
1
  • \$\begingroup\$ Thanks a lot! I also realized the problem of my "Info" tuple, but I cannot find out a better solution. BTW, I have updated the new data, which has following feature: first inserting 50000 numbers, and then making queries randomly. After applying all of your optimization, the memory usage reduced from 58M to 46M. That is a great step, but the performance is still far from native code. I am going to check Data.IntMap and FingerTree. Maybe rotate-based tree just not fit haskell. \$\endgroup\$
    – Haowen Shi
    Commented Apr 2 at 1:45

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.