5. Remove unrequired behaviour.
6. Negotiate the required behaviour with stakeholders.
7. UX changes. E.g. make a synchronous flow a background job and notification. Bring back quick parts of the operation sooner (e.g. like a progressive jpg)
8. Architecture changes. E.g. monolithification, microservification. Lambdas vs. VM vs. Fargate etc.
And some more technical:
9. Caches?
10. Scalability, more VMs
11. Move compute local (on client, on edge comoute, nearby region)
11a. Same as 11 for data residency.
12. Data store optimisation e.g. indices, query plans, foreign keys, consistent hashing (arguably a repeat of data structures)
13. Use a data centre for more bang/buck than cloud.
14. Compute type: use GPU instead of CPU etc. I'll bundle here L1 cache etc.
15. Look at sources of latency. proxies, sidecars, network hops (and their location) etc.
16. GC pauses, event loops, threading, other processed etc.
The usual culprit is "premature modularization", where code that is used in one place and is never going to be extended is nonetheless full of indirections.
In principle, I don't think people would lump it in.
Something that I'd like to add is that it's helpful to understand the optimization capabilities of our compiler. Ideally, we would like to write program that doesn't have what are called 'optimization-blockers' - these make it hard for the compiler to generate optimized executables.
I like the pointer to the blog on accidentally quadratic implementations. I find that the following pattern is often a landmine:
for (int i = 0; i < strlen(s); i++) // code in loop
strlen(s) gets computed every iteration, incurring O(n) time.
Finally, being aware that I/O latencies are major source of bottlenecks leads to nice optimizations. One advantage of multiple threads is that they can sometimes hide the I/O latency. In general, writing programs with good memory locality is one of the better levers for optimization.
Recent discussion on the follow-up, "The Fifth Kind of Optimisation": https://news.ycombinator.com/item?id=43555311
This deserves to be a headline in most optimisation discussions. Fast enough or small enough is often all that matters, start there.
This can include more advances VM or using other advanced problem oriented systems like databases or problem solvers.