The Wayback Machine - https://web.archive.org/web/20120905211926/http://www.futurechips.org/thoughts-for-researchers/quick-post-linked-lists.html
Jun 252011
 
linked-list

This question has been bugging me for the last few days. Why would anyone use a linked-list instead of arrays? I argue that linked lists have become irrelevant in the context of a modern computer system. I have asked around a few colleagues and I include their counter arguments and my rebuttal for your reference. Please point out if I am missing something.

Lists have several disadvantages that did not exist in traditional systems:

1. They reduce the benefit of out-of-order execution. Out-of-order execution gains a lot of benefit by parallelizing independent memory accesses. Accesses in  linked lists are dependent as the address of each node is only know after the current element has been loaded into the cache. This effectively eliminates the benefit of out-of-order executing when traversing the list.

2. They throw off hardware prefetching. Most processors today employ hardware prefetchers. Hardware prefetchers are good at prefetching regular data such as arrays but they are unable to prefetch linked data structures.

3. They reduce DRAM and TLB locality. Linked lists spread data elements all over the memory. Consecutive elements can be in different DRAM pages. Since modern DRAMs take 3x more cycles to fetch random memory locations, each memory access of a linked list is potentially more expensive (Explained in my post on What every programmer should know about the DRAM memory). For the same reason, a more dispersed memory footprint require more pages to be brought into the DRAM memory which increases the usage of physical memory. Lastly, if the list is dispersed across more pages, there are a larger number of TLB misses with a list.

4. They cannot leverage SIMD. In operations like lookup, multiple elements of an array can be compared to a value using a single SIMD instruction. Since elements in a linked list have to be fetched one at a time, it is unable to use the power-efficient SIMD model.

5. They are harder to send of GPUs.  Todays GPUs have a different memory address space than the CPU. Thus, when sending a data over to the GPU, all pointers have to converted from one memory space  to the other. Linked lists have lots of pointers which makes it considerable expensive to send then to the GPU. Note: This advantage will go away once GPUs share the virtual memory with the CPU.

When I gave the above reasons to my colleagues, they gave me some counter arguments about why linked lists are better. Below is their list and my rebuttal to each point.

1.  Resizable

Linked lists can be sized dynamically.
My argument: Array with amortized doubling has solved this problem.

2. Different element types

Linked lists are superior to arrays as they allow each node to be of a different type.
My argument: I agree except that this property is rarely exploited.

3. Cheaper insertions at the ends

It is cheaper to insert an element at the head or tail of a list.
My argument: Arrays can be used with a head and tail pointers and buffers at both ends for faster insertions.

4. Cheaper insertions in the middle

In ordered lists, inserts are often needed in the middle of the data structure. A link list insert is cheaper as on average it requires traversing only 1/2 of the list, finding the right spot, and inserting the node. Inserting in an array seems expensive because it requires traversing half of the array to find the insertion point and then shifting the other half of the array to make space for the incoming element.
My argument: The analysis is naive. Arrays are actually faster. The improved memory-level parallelism, locality, prefetching, and the ability to use SIMD far offsets the fact that arrays requires 2x the operations.

5. Pointers in the middle

We can have jump pointers to make list accesses faster.
My argument: This is called a hash table, not a list.

Do you have any other arguments that support the linked lists?

 

Loading ... Loading ...

  24 Responses to “Quick Post: Should you ever use Linked-Lists?”

  1. I agree with the general tone of this post — programmers need to understand hardware better, and in a broad algorithmic sense, not just a narrow implementation one.

    Still, I disagree with some of your arguments. Overall, you seem to have narrowly restricted what constitutes a ‘linked list’ while allow ‘arrays’ to be rather broadly interpreted.

    Furthermore, you are performing your analysis with what seems to be a fairly narrow usage model; perhaps the data structure access/traversal cost is insignificant compared to the work actually done at each list entry. Hardware considerations are important, but the rule of optimizing bottlenecks is still essential!

    Comments on your individual points:

    1. The out-of-order execution argument is somewhat lacking; first of all, not all modern architectures use out-of-order execution; in particular, some of the lower-end ARM processors, Intel’s Atom processor, and the individual cores in the forthcoming Intel Many Integrated Core architecture.

    2. Where hardware prefetching fails, there are software prefetching instructions (and intrinsics) that can be used. This can even be necessary with an array (say, of pointers).

    3. The TLB/DRAM locality argument assumes that you have no indirection in your array (no arrays of pointers) and your that linked list cannot take advantage of pooled allocations. It depends greatly the usage model, but the cost of a TLB miss for a single element being added to a list vs. 100s of misses as you copy a gigantic array to insert into the middle is worth considering.

    4. Again, this depends on the usage. Most people probably don’t use SIMD for their vector traversal anyway. Notwithstanding, it should be possible to traverse multiple linked lists at once in SIMD given gather support (announced in the AVX2 instructions to appear in the upcoming Haswell microarchtecture)

    You argument against ‘cheaper insertions in the middle’ is missing a lot of details. Clearly inserting into the middle of a linked list (given the position) is O(1) while inserting into the middle of a vector is O(n). Searching is a different matter; at least separate the two notions in your analysis. Where does the ’2x the operations’ argument come from?

    The argument about ‘Pointers in the middle’ is nonsensical to me. What you’ve described _isn’t_ a hash table, it’s more like Pugh’s ‘Skip lists’.

    You’ve also omitted all of the interesting things you can do with lists, like have them be circular, or element be members of multiple lists. I agree that these are unusual methods of use, but they are still interesting and potentially useful.

    I think a more interesting angle is ‘how to do you implement linked lists (or linked list behavior) in a way that is architecture-friendly?’. Things like B-Trees can be viewed as sort of hybrid arrays/lists.

    Anyway, thanks for the interesting post; just thought I’d weigh in.

    Jason

    • Hi Jason,

      Thanks for an excellent comment This is exactly what I was hoping that someone will add some sense to my arguments as I was sure I was missing some details. I agree with most of what you have said. Individual responses follow:

      I do not think I have restricted the list as such. I just added amortized doubling and a head/tail pointer to the array which are both present in the linked list already (it does mallocs are has head/tail pointers). It just makes the comparison a bit fair to arrays. What do you think?

      I agree with your comment about the traversal being a minor cost in some. It may not matter at all in many cases. However, if an optimization comes for free, then I would argue that it should be done even if it does not always benefit and does not increase code’s complexity.

      Now the points:

      1. I agree that out-of-order is not in every core but the list does restrict opportunities to increase memory-level parallelism (MLP). Some in-order cores like Sun Rock employ techniques to get MLP and lists are still bad for those.

      2. In my experience, software prefetching a linked list is actually a lot harder than an array. Letting software prefetches get ahead of the actual stream requires a lot more ninja. Also, the null pointer at the end makes prefetching a bit tricky as well.

      By the way, I think we should not not compare an array of pointers to a linked list unless the link list is also a list of pointers, in which case they will be equally bad.

      3. Totally agreed with the pooled allocation argument. However, as I said, pointers of arrays should be kept out of the picture (as it is unfair to compare it with a list of data values). See my response below to the insertion being O(1).

      4. I agree SIMD is not used commonly but that trends is changing. Intel C compiler 10 is pretty good at auto vectorizing which makes it pretty automatic. I think gcc is getting better as well. ARM’s compiler is already good at it. Thus, I think arrays do win this argument. While it is possible to do multiple lists as SIMD, it requires more programmer effort. Also, I want to clarify that a gather still pretty expensive compared to a vector load. nVdia has the best implementation of gather out there and yet they discourage its use.

      Inserting in the middle of a list is O(1) in theory but only if you already know where to insert. Most likely I am missing something but I am thinking that it is almost always preceded by a a traversal of some sort. I think I need a correction here so please let me know what I am missing.

      Lastly, skip lists vs. Hash tables: Nah! it is more like a hash table and not a skip list. Perhaps we disagree on the definition of jump pointers. To me, it is a _direct_ shortcut pointer to the middle of a data structure in order to reduce traversal. Although short, a skip list still requires traversal to get to the part of the list where you think your data is. I see your point about a skip list and again an array can also have jump pointers and skip list like optimizations.

      I have indeed omitted the whacky lists. I was just thinking about a traditional list for the most part.

      I agree that we should think more about how a list should be build rather than whats wrong with it. Based on our discussion, I am thinking that a list of small array chunks can be pretty good. What say?

      Thanks for your excellent insights and correcting me. I enjoyed reading your comment and have now enjoyed writing this response. I really hope to hear from you again on this topic as I am sure I am again biased and need to be corrected:-).

      Aater

      • Don’t forget that allocating arrays with extra space (eg, doubling the size when you run out of space) can sometimes give you pooled memory for free – insertions into the array already have their memory allocated.

        The only time I’d normally (abnormal situations would be where the data is distributed across machines, for example, or where fine-grained low level performance isn’t as important as the time complexity) consider using a linked lists is when I want to avoid false sharing between nodes in multithreaded code and cache-line aligning each array element would be too memory wasteful. Of course, this will only work if there is other data that the linked list nodes can share cache lines with and the memory is carefully managed to allow this… and of course, this only really makes sense if accessing individual nodes is more common than traversing or manipulating the list itself.

        • Dan,

          Excellent point about pool memory and also about distributed machines. I do not write distributed code generally which is why it did not occur to me. Thanks for teaching me something new.

          Aater

          • No problem. This is something I’ve been thinking about recently, so I really enjoyed your post.

            I’ve been doing a bunch of multithreaded programming with Intel’s Threading Building Blocks and for that, linked list based structures are pretty inefficient, because processing linked lists generally requires sequential traversal, while array based structures can be more easily be split into chunks to process in parallel. (Sure, you can walk a linked list once and split it into chunks and then walk the chunks in parallel as they are processed, but that just creates more complexity). From a multithreaded point of view (which is my main interest in this discussion), linked lists and other inherently sequential data structures, as you know, are often a bad idea. I guess for the same reasons you gave regarding GPU’s.

            But I don’t think that linked lists are useless either – they do have their uses, its just that there are fewer of them than a lot of people think (and, sadly, are taught in computer science courses…)

          • “But I don’t think that linked lists are useless either – they do have their uses, its just that there are fewer of them than a lot of people think (and, sadly, are taught in computer science courses…)”

            I think you hit the nail here. It is less useful than it seems like. By the way, if you talking about the queue structures in TBB, I know that they are slow. I wrote my own for that reason. I can share some code if it will help.

          • “I wrote my own for that reason. I can share some code if it will help.”

            If you feel like making them the topic of a future blog posts, I would be very interested to hear about it! I would be interested in taking a look at code too :)

        • Nice discussion – just a quick reply;

          0) I think Aater mentioned how arrays-of-pointers aren’t ‘fair’ for comparison, but what happens when you have huge records to handle? With arrays that directly store them, you pay a lot to copy those big entries around, while lists can stay ‘where they are’. This is a big deal when inserting into the middle of an array. Furthermore, the persistence of pointers is pretty handy. If I give a pointer to someone, it’s going to remain valid for a linked list. If you have to move data around, as in an array, you’ll have to find some other way to make id that data.

          For that matter, it’s pretty handy to be able to have a pointer – just one – to be able to traverse a list. You can start right in the middle and traverse until you hit the sentinel/tail value. You need a sizes and a base pointer in addition to an index for arrays.

          1) Much talk has been made of exponential array allocation, which is also taught in CS courses. It’s a very useful technique, but it can also waste memory. There are some situations where you need to be very careful about space, and having 2x the space allocated than you need can be a problem – for example, on huge datasets on workstations, or even modest-sized problems on embedded devices.

          2) Linked lists can be modified (inserts, deletds, etc) by multiple threads asynchronously (with some pain, locks, etc), without much contention, while arrays generally cannot. Sometimes quite useful!

          3) Much of what we are taught in CS is based on theory of computation, and asympototic analysis. This blog, and our discussion, is very motivated by modern hardware considerations. I think that is a good thing, and that CS classes should do more to inform students about how hardware informs algorithm design/implementation, but I don’t think that should be added to the curriculum to /replace/ the theoretical model. That stuff is useful. Even with hardware considerations, there is a point at which the asymptotic complexity of some linked list operations outperform arrays.

          I’m really just playing devil’s advocate here — I honestly haven’t had to use linked lists for a while. I have done some stuff with trees, which can sometimes be interpreted as hybrid array/lists. For short to medium sequences of scalar data where performance is key, arrays are absolutely the way to go. I’m just calling you out on the ‘never’ generalization in this post :)

          • Jason,

            I agree with most of what you say. Specific answers below:

            1) True, I have always wondered if there is a better middle ground, something that isn’t exponential. Just for completion, I want to add two facts: (a) lists also waste space by saving points, (2) memory wastage for “unused” memory is can actually be zero overhead because it is in virtual memory which may never be pages if its never used. This is even more true for 64-bit addressing.

            2) Hmm.. I could argue that you can also optimize arrays but from a practical standpoint, I totally agree.

            3) I could not agree more. It shouldn’t replace the theory (my blog also has many analytic models:-) ) but it should make theory more realism aware. In fact, CS theory is actually tied to hardware IMO even today, but its tied to old hardware. The whole computes and memory concept is from hardware but when they were first generated, hardware used to be a lot simpler.

            Thanks for the great comment. I really enjoy learning new insights.

          • If doubling an array takes too much RAM or address space, you can also scale it by a smaller factor. You will get proportionally more allocations, but big O is unaffected. Factors > 2 work just as well. Doubling is still a good trade-off that works for most purposes, almost certainly better than wasting time bothering. But if it doesn’t work for you, you can still reap the benefits of exponential scaling with a different factor.

            And yeah, linked lists have their uses. Although I rarely use them, indeed.

  2. A correction in 5: this is called a “skip list” and they are useful sometimes.

  3. For hardware prefetching I think a counter example would be if you have a garbage collector that iterates through used memory. It will probably compact the linked list and put all the nodes sequentially, so the prefetchers would catch it just fine. It’s still an inefficient use of memory though, and if the data is primitives it will force a traversal for the garbage collector when it wouldn’t for the array (I guess if you care about garbage collector traversals you should have stopped caring about the performance of the data structure a long time ago).

    Disagree with the different data type argument. You can just have an array of pointers to handle different data types, so linked lists don’t make something possible that would otherwise have been impossible.

    One use I can think of for using linked lists is in a text editor. Each node handles a page or so of text. Adding in a character only requires playing with the current page. You don’t have to touch the next million characters. Using a skip list or some other data structure would be helpful but in general it’s a form of linked list. Another use I can think of is that it keeps you from using huge chunks of memory. That may or may not be acceptable, but from what I understand most file systems use linked lists because it’s harder to provide contiguous memory at all times. A linked list may also be considered for having better real-time properties than amortized doubling.

    Another use I was thinking was something like a game, in which you have to iterate over a list of missiles in flight anyway and update their state, but some missiles in the middle may explode and are removed from the list, but for all we know, it may be faster to mark for removal in the first iteration, and copy the remaining missiles in the next iteration, or when the number of active missiles to tracked missiles drops below a certain ratio garbage collect.

    Agree on the general point behind the post, for the most part just use an array and thank you to you, and both Dan and Jason for the valuable insights.

    I’d guess there would be more use cases for programs getting user input, because it really is useful over an array only when you know a position that you want to insert or remove from out of order, while preserving the order (if order does not matter then use an array and swap with the last element). In HPC too many things are just a stream in which you just iterate from front to back and definitely in those cases a linked list will be expensive.

  4. I think you’re coming at it from one very specific angle, e.g., thinking about GPUs. Those of us doing symbolic computation use linked lists all the time. I’ve got entire programs that basically do nothing but manipulate heterogeneous linked lists — not a rare case at all. This really is the killer feature: exploratory symbolic programming. I don’t care about CPU performance 99.9% of the time.

    A natural extension of heterogeneity is that I can also make it out of more lists — thus promoting an algorithm that works on a list to one that works on a tree, with almost no work. Promoting an array algorithm to work on trees, either by storing the tree in the array, or by writing a new class, seems like a lot more work to me.

    Insertion in the middle of an array isn’t 2x more operations. It’s O(N) times more operations. I don’t know where you got 2x.

    Another advantage: they have a simpler mutability model. I can declare that my lists are immutable (or use a language which enforces this), and when I want to prepend something and share the tail, I can do it safely, even with threads. I’m sure this is possible with arrays and there’s got to be some libraries out there to help with this, but I’m not familiar with any.

    • I agree with you regarding this article, and it very much reflects my usage of lists, at least when I’m programming in Scheme/Erlang/Lisp/Haskell/Prolog/etc… Heck, even when I program in python I do a ton of symbolic computation.

      I almost always use homogeneous lists though, and use vectors or tuples for heterogeneous values.

    • I’d also add that the performance of lists is not that bad if you know how to use them properly. For the stuff that I do, that I assume you do as well, the performance is actually better, for the reasons that you noted as well as reasons that I noted.

  5. Anyway, your article was interesting and thought-provoking, although I don’t really agree with your point, overall.

    I use lists as my bread and butter data structure, along with tuples. (immutable arrays) I assume most of the programmers commenting are system level programmers, because they seem to not use them very much at all.

    Lists are an inductive data structure, trivial to make immutable implementations, with O(1) cons/insert at the front, O(1) first and O(1) “rest”/”tail”. Since they enable tail-sharing, it is trivial to fold over a list and produce a new one without throwing the original away, while sharing many contents with the original. Lists are not intended for random access, so I don’t think that is a fair criticism at all since no advocates of lists insist on them being used for such.

    Cons/append to any part of an array is extremely expensive, since you have to copy the entire array, except to the end of a vector or array”list”, (the worst terminology ever) and folding or recursing over an array persistently requires a lot of extra copying. Besides that, folding over an array is going to require a non O(n) append to front operation, which makes it prohibitively expensive.

    In contrast to what you say, the cost of resizing a linked list is much cheaper than an array, unless you have some special array. I think you’re comparing special array structures, which still have drawbacks relative to lists, while comparing them against literally regular mutable linked lists.

    Even appending to the end of an array isn’t necessarily free, since you might have to pay the amortized cost, which would make resizeable arrays completely unsuitable for real-time applications, which I assume many of the people on this blog would care about.

    Operations over lists are not very hard to parallelize. (and trees work even better for this purpose) For example, in the programming language Erlang, I wrote an email client, and to compute the summary view, I process a list of several thousand email data. Since I use higher order functions fold/map/filter/etc, I simply used a parallel list library, (eg. using plists:map vs lists:map) and it was able to compute much of the mail data using all of the cores on my machine, without making me change my algorithm at all.

    Finally, for random access. Arrays are obviously king here, but there are “Random Access Lists” librarires available (based on a datastructure designed by Chris Okasaki) for Erlang, Scheme and Emacs Lisp, which essentially treat the lists internally as a tree like structure, which has O(1) for cons, first/rest, and O(log(n)) for append, search, etc… So basically I have no idea why this library does not replace linked lists in every language, (except that every language it is used in would use pattern matching, and so the pattern matching syntax would have to be changed to pretend that this tree structure is a list) because unlike arrays, it has O(1) for the operations that lists are heralded for.

    From a high level perspective, it is extremely rare that I actually say “gee, I wish I could have the 4th element in that set, not the 5th”. If I want that, I probably really want a tree, hash table, or a dictionary. In the case of a collection, unless I’m searching for something, I want to apply a function or predicate to the elements of a collection, since they would most likely be the same type, and a linked list is just much better than an array for that. It really depends on what programming language you are programming in, though. In C, I use arrays for almost everything. Same thing for C++.

    In a java game project I’ve worked on, although I find the java linked list library absolutely atrocious, (because it has no way that I know of to get the rest of the list without destroying the head, iterators notwithstanding) the programmers I worked with used them a lot, and used them where they were appropriate.

    So, I’m not saying that lists are an end all data structure, but lists are definitely not a replacement for them, or at least they are not as good at the things lists do well. The “random access lists” actually provides a superset of the advantages of lists, however. I’m also a big fan of trees. (I wrote a red-black tree library earlier this year)

  6. Oops! Typo… When I said

    “but lists are definitely not a replacement for them, or at least they are not as good at the things lists do well.”

    I meant to say “but arrays are definitely not a replacement for them, or at least they are not as good at the things lists do well.”

  7. Also when I said “a non O(n) append to front operation”, I meant to say “an O(n) append to front operation”.

  8. One point I want to highlight here. Linux, which is one of most portable operating system uses linked lists extensively in its internal data structures.

  9. Hi

    For real-time systems, a linked list may be useful when you need to ensure that an insert will take a bounded length of time. If the time constraints on an insert are such that the time to create/copy the elements in the existing array fits within your time bounds, then an array would be acceptable.

 Leave a Reply