Once you’ve found a small enough chunk of logic that’s worth optimizing, I’d recommend relying more on benchmarks and disassembly.
The practical implication is that trying to optimize "calls to f32::clone" because they're in a profile is a wild goose chase -- unless you have debug anti-optimization turned on, there shouldn't be any calls to f32::clone.
I was looking at one of the diffs, and thinking at a sufficiently advanced compiler should be able to generate the same efficient code for both -- and indeed it does, if you turn the optimiser on: https://godbolt.org/z/hjP5qjabz
- let shift = if (i / 32) % 2 == 0 { 32 } else { 0 };
+ let shift = ((i >> 5) & 1) << 5;
let shift = ((~(i >> 5) & 1) << 5);
EDIT:
The compiler uses "vpandn" with the conditional version and "vpand" with the bitwise version. The difference is it includes a bitwise logical NOT operation on the first source operand. It looks like the compiler and I are correct, the author's bitwise version is inverted, and the incorrect code was merged in the author's commit. Also, I think this could be reduced to just (~i & 32).From my days as a junior member of a team developing a compiler and run-time libraries, I really like the approach we took there: if the compiler generated sub-optimal code for a straightforward implementation, we'd aim to fix the compiler instead of tweaking the code. That's more difficult if you're not already maintaining your own compiler, of course. And algorithmic improvements are still valuable.
In this case, the optimiser already generated efficient code. Makes me wonder if any observed speed-up might have been because the incorrect code needed to do less work?
This goes through seven iterations of optimization an algorithm in rust, comparing it to the equivalent c++ at each stage.
I can't quite imagine it's the exact same though in all metrics. Some graphs of measurement statistics would be even better.
(I've also always had a sneaking suspicion I did something wrong in my example, so if anyone knows let me know)
Profiling native code with optimizations on is very very tricky.
How much? And did the parts in safe Rust make up/protect the unsafe parts?
I’d be concerned that the reason there are fewer errors is because of the experience the team already had with the existing system. Porting or rewriting it would allow them to avoid many of the errors that were already fixed in the C implementation and errors they knew about ahead of time… assuming there’s more unsafe/FFI than safe.
What I would like to know personally is how complex is each code base.
Complex is nebulous so some metrics like
line counts or other code quality metrics would help.
- let shift = if (i / 32) % 2 == 0 { 32 } else { 0 };
+ let shift = ((i >> 5) & 1) << 5;
Uh, that’s just `let shift = i & 32;`, right? Much easier to read, too, in my opinion.(Edit: fixed from i % 32 to i & 32.)