A deep dive into SmallVector:push_back
23 points
1 day ago
| 2 comments
| maskray.me
| HN
dzaima
2 hours ago
[-]
The crappiness of shrink-wrapping in gcc and clang (but especially clang) annoys me a lot. It feels like there should be a quite decent amount of general performance to be gained from properly pushing more into slow paths (or, not necessarily even slow paths, but generally paths with high register pressure / uninlined function calls), never mind calling conventions in general.

On the push impl in the article - for non-x86 (and perhaps even on x86 for performance, though not size/instruction count) it'd be better to allow the size increment to reuse the size read done by the capacity check; with C++'s lack of suitable aliasing information, the interleaved memcpy/store prevents the compiler from deciding this itself.

reply
RossBencina
2 hours ago
[-]
> prevents the compiler from deciding this itself.

Interesting. I understand why it does that, but it makes me realise that I usually think "the compiler will reuse the loaded value/perform CSE" without considering the cases where it won't. Are there tools that will detect and warn/indicate this situation? e.g. "warning: could not reuse previously loaded value of 'foo' due to aliasing hazard 'memcpy' at line 234."

reply
dzaima
1 hour ago
[-]
Not that I know of; and such would necessarily have false-positives (...or, rather, entirely consist of potential false-positives) because you may actually want the re-read.
reply
im3w1l
1 hour ago
[-]
So one thing I thought of when reading this is that very often it is known ahead of time (at runtime or even compile-time) how many push_backs will be done. The programmer could make a reserve call but doesn't bother since the efficiency gain is minimal.

The gain is minimal for doing this optimization at one location. But doing it everywhere, that could matter. Pushing back in a loop could maybe be optimized to a single allocation and a memcopy.

reply
tialaramex
48 minutes ago
[-]
In C++ programmers are often taught not to use their reservation API for this purpose because it's designed in such a way that if you don't have perfect foresight you can destroy amortization and thus get much worse performance.

For example Bjarne Stroustrup suggests you should use reservation for "avoiding invalidation of iterators" instead.

reply