This morning I was reading the umpteenth std-proposals thread proposing some variety of
unstable_remove
and it occurred to me that one odd thing about a swap-and-pop-based unstable_remove
is that it tends to replace large swaths of contiguous removals by reversing the
elements that are kept. For example (Godbolt):

Authored by Arthur O'Dwyer, Quuxplusone offers  explorations of C++ programming language features, standard library design, and modern C++ development practices. Readers can delve into advanced topics such as template metaprogramming, design patterns, and effective use of language features. Additionally, they can learn about best practices for writing efficient and maintainable C++ code, optimizing performance, and leveraging the latest language standards.

Arthur O'Dwyer

An investigation into whether replacing element-by-element assignments in std::remove_if with bulk memmove operations (chunked approach) yields a performance improvement. Benchmarks on a million-element int array across various removal densities show the chunky variant is consistently slower — by 11–33% — than the traditional smooth loop. The author attributes this to branch predictor behavior: when few elements are removed, the predicate branch becomes highly predictable, which dominates any potential memmove speedup. The conclusion is that chunking moves into memmoves is a pessimization, not an optimization, and that cache and branch-prediction effects outweigh raw move-assignment count.

Does bulk memmove speed up std::remove_if? (No.)