Look, I know it’s the fastest way to copy data from one location to another1. But hear me out:
- It breaks your code.
- If it doesn’t, you were probably going to get the speed anyway.
The problem is that memcpy copies raw bytes, regardless of what those bytes represent. This is called a “shallow copy.” Almost always, what you really want is a “deep copy.” When memcpy works, it’s usually because a shallow copy happens to the be same as a deep copy, not because memcpy is an honest representation of what you meant to do.
Take the following struct:
struct Person {
static constexpr int MAX_NAME_LENGTH = 32;
int age;
char name[MAX_NAME_LENGTH];
};
This defines Person as a trivial data type. If we want to copy an array of Person objects, either memcpy or std::ranges::copy will work.
auto people = std::array<Person, NUM_PEOPLE>{};
...
auto people_copy = std::array<Person, NUM_PEOPLE>{};
std::memcpy(people_copy.data(), people.data(), sizeof(people));
// OR
std::ranges::copy(people, std::begin(people_copy));
But when we eventually decide that storing Person::name in a fixed-size array was a terrible idea, we’ll want to replace it with a string.
struct Person {
int age;
std::string name;
};
std::ranges::copy still does the right thing, but now memcpy produces a “copy” whose name member points to the original underlying array. At minimum, this causes a double free when the objects go out of scope2.
So why not take memcpy‘s speed for trivial data, and update the code for deep copies when needed? Firstly, that update is easier said than done for a large codebase. You might have a giant mess to clean up. Secondly, you don’t need to explicitly write memcpy to get a call to memcpy. Optimizing compilers are smart enough to do that for you. The latest GCC and Clang can replace std::ranges::copy with memcpy at -O134.
Even when pointer aliasing forces the compiler to use memmove instead, you won’t see a difference. The distinction between memcpy and memmove is that calling memcpy on overlapping arrays is undefined behavior. “Undefined behavior” means anything can happen, which includes just calling memmove in the first place. Every platform I checked treats memcpy as an alias for memmove5.
memcpy falls into the category of “failed micro-optimizations that only make life harder.” It won’t make your program faster, but it will be another thing you need to update while reorganizing your data structures. It’s especially counterproductive when a trivial layout pollutes the cache with seldom-used data. Moving that data out-of-line might speed things up a lot, but you can’t do that if every function requires a trivial type. It’s tempting to try to maximize performance for a known data layout, but that may just lock you into using an inefficient data structure with no tangible benefit. Write what you actually intend to do first, and don’t micro-optimize unless you know you need to. The more you couple your algorithms to your data structures, the harder it is to fix them later.
So please… stop using memcpy.
- On my machine,
memcpyis up to 90 times faster than an unoptimized for-loop. ↩︎ - Assuming the Small String Optimization (SSO) doesn’t apply. If it does, it will hide the bug. ↩︎
- https://godbolt.org/z/sqhYoP4KG. ↩︎
- I’ve been told, “We can’t turn on optimization because of undefined behavior.” If that’s the case, then you don’t have a C++ program. Period. ↩︎
- On my Linux (x86_64) machines,
memcpygets linked to__memmove_avx512_unaligned_ermsor__memmove_avx_unaligned_erms, depending on the CPU (you might see__memcpy..., but these are aliases of the__memmove...versions, which you can confirm by comparing their addresses). On Windows (x86_64) and MacOS (ARM) machines, I gotntdll!memmoveand_platform_memmove, respectively. Note that this does not excuse usingmemcpyin an undefined manner — it just means that forcing a call tomemcpyinstead ofmemmovewon’t improve performance. ↩︎