At least, the two sequential containers I’m going to talk about.
Vector or List?1
The classic use case for a linked list is frequently inserting (or removing2) elements at arbitrary locations. Inserting into a vector anywhere but the back requires moving existing elements out of the way. But a list only needs to update the links held by the new element’s neighbors. This makes insertion an O(n) operation for a vector and an O(1) operation for a list. The following plot shows how the latency of random insertions scales with size for std::vector and std::list.

std::vector is faster than std::list at all sizes and exhibits linear growth4 starting around 8 kiB. std::list is heavily encumbered by filling cache levels5, becoming nearly 200 times slower than std::vector at 16 MiB.“Uh oh. That’s not O(1) performance. And why is std::list slower than std::vector?”
When people say, “insertion into a list is an O(1) operation,” the fine print is that you need to know where you’re inserting first. Here, we can’t know the insertion point in advance because it’s random. Before we insert each element, we need to traverse the list until we arrive at the right place. Traversing a list is an O(n) operation, and an especially slow one because of the unpredictable memory accesses. The point of using the list was to sacrifice organized data for a simpler insertion algorithm, but we got neither.
Larger Elements
So std::list is terrible for inserting 4-byte integers, but what if we need to insert larger elements? As elements get bigger, so does the cost of moving them. It’s also less helpful to keep elements next to each other if they’re not going to fit in the same cache line anyway. These effects weaken std::vector, but it maintains its lead for a while.

std::vector slows down when it fills caches, but not as badly as std::list.
std::vector performs similarly to std::list.
std::list beats std::vector at most sizes.You might be thinking the takeaway is to use std::list when performing frequent insertions and the elements are large. However, it’s often better to use a std::vector that holds std::unique_ptrs to the data.
Vector of Pointers
“Doesn’t that defeat the purpose of a vector? How is a vector of pointers different from a list?”
The difference is that a vector of pointers allows random-access, which means the processor isn’t forced to read elements in order. While traversing the vector, branch prediction lets the processor start reading the next element before it knows to continue past the current one. You still get cache misses, but now they overlap in time.


std::list after the L2 cache is filled.You can also use a hybrid approach, with “hot” data stored directly and “cold” data stored via std::unique_ptr.
Sorting
So far, we’ve been looking at benchmarks of “repeatedly inserting random elements in sorted order,” better known as, “insertion sort.” Since running insertion sort on four million elements is an absurd thing to do6, let’s also look at the standard sorting algorithms.

std::vector, vector of pointers, and std::list, with 4-byte elements7. The vector of pointers and list are both slower than the ordinary vector and visibly impacted by filling caches, with the list sharply worsening as it fills the L3 cache.
std::vector implementations perform almost identically from 32 elements8 to where they fill the L2 cache. The list is slower than both vectors and, again, sharply worsens as it fills the L3 cache.
std::list::sort() is faster than std::sort() on the ordinary vector, but slower than std::sort() on the vector of pointers.Sorting a list is never the fastest option, and is slower than both vector implementations until the elements are 1,024 bytes. Part of the reason is that std::sort() and std::list::sort() use different algorithms. The former uses introsort while the latter uses merge sort. Maybe you think it’s cheating to use a slower algorithm for std::list, but the point is that we don’t use the same algorithm because we can’t — std::sort() only works on random-access containers. std::list being a bidirectional container limits what we can do with it.
Conclusion
Linked lists don’t make much sense on modern processors. Their strength is that they represent abstract sequences, unconcerned with where the elements are. But modern processors rely on data being organized and predictable to avoid twiddling their thumbs waiting for memory to respond. Vectors represent the simple mappings from sequences to locations that processors need to retrieve data quickly. Maintaining a contiguous array often means more work upfront, but less time waiting9.
You can find a use case for anything if you look hard enough10, but you’re much more likely to regret using a linked list than a vector. Optimizing for “insertion” without considering other operations is often a mistake. If maintaining a vector is too cumbersome, there are ways of lowering the costs. There’s no way to make a list not cause cache misses — it’s what they do. Before you reach for a list, it’s usually a better idea to ask yourself how you could make a vector work, and then do that instead.
- In this post, I use C++ terminology.
std::vectoris the C++ standard library’s dynamic array andstd::listis a doubly-linked list. Other languages might use different names. For example, a Pythonlistis a vector of pointers. ↩︎ - For simplicity, I only talk about insertion, but removal has similar performance considerations. ↩︎
- Each container started empty and was filled by inserting elements from a pre-computed random sequence in sorted order. All benchmarks in this post were performed on an Intel i9-9900K CPU. ↩︎
- On a log-log plot, “linear growth” is indicated by the slope of 1, not merely the appearance of a straight line. Quadratic growth would appear as a straight line with a slope of 2. ↩︎
std::listalso fills the cache levels early due to the extra memory for the links. With 4-byte elements, the total memory consumption is six times the size of the data (4 B data + 8 B predecessor link + 8 B successor link + 4 B padding). ↩︎- The single
std::listinsertion benchmark with four million 4-byte elements took three days to complete. ↩︎ - The sorting benchmarks use a larger maximum size than the insertion benchmarks because they don’t take several days to finish. ↩︎
std::sortchanges behavior between 16 and 32 elements because this is where introsort switches from insertion sort to quicksort. ↩︎- As I pointed out in my previous post, “doing extra work is beneficial if that work can be done in an amount of time that would otherwise have been wasted.” ↩︎
- A non-performance reason to use
std::listmight be to prevent iterator invalidation. ↩︎