An investigation into whether replacing element-by-element assignments in std::remove_if with bulk memmove operations (chunked approach) yields a performance improvement. Benchmarks on a million-element int array across various removal densities show the chunky variant is consistently slower — by 11–33% — than the traditional smooth loop. The author attributes this to branch predictor behavior: when few elements are removed, the predicate branch becomes highly predictable, which dominates any potential memmove speedup. The conclusion is that chunking moves into memmoves is a pessimization, not an optimization, and that cache and branch-prediction effects outweigh raw move-assignment count.
Sort: