[1]: https://cacm.acm.org/research/always-measure-one-level-deepe...
https://news.ycombinator.com/item?id=27686818
for more in this vein.
When they find a bug, they don't just fix the bug, they fix the engineering process that allowed the bug to occur in the first place.
It's true in software, it's true in physical infrastructure (read about the sorry state of most dams).
Until we root cause that process I don't see much progress coming from this direction, on the plus side CS principles are making their way into compilers. We're a long way from C.
I think it's part of the reason stocks almost always dip after positive earnings reports. No matter how positive it's always less than idealised.
You might think there's a trick where you can sell maintenance as a new thing but you've just invented the unnecessary rewrite.
To answer your question more directly, once something has been achieved it's safe to assume someone else can achieve it also, so the focus turns to the new thing. Why else would we develop hydrogen or neutron bombs when we already had perfectly good fission ones (they got commoditised).
And security work is rewarded even less!
While I do recognize that this is a pervasive problem, it seems counter-intuitive to me based on the tendency of the human brain to be risk averse.
It raises an interesting question of "why doesn't the risk of security breaches trigger the emotions associated with risk in those making the decision of how much to invest in security?".
Downstream of that is likely "Can we communicate the security risk story in a way that more appropriately triggers the associated risk emotions?"
What's the risk? The stock price will be back up by next week.
Look at the crowdstrike failure as a recent example, but there's plenty more in the past.
Code review is one example of addressing the engineering process, but I also find it very helpful to consider business and political processes as well. Granted, NASA's concerns are very different than that of most companies, but as engineers and consultants, we have leeway to choose where and how to address bugs, beyond just the technical and immediate dev habits.
Soft skills matter hard.
I'm hoping that example holds up, but I'm not well versed in that area so it may be a terrible counter-example but my overarching point is this: overly engineered code often produces less value than quickly executed code. We're not in the business of making computers do things artfully just for the beauty of the rigor and correctness of our systems. We're doing it to make computers do useful thing for humanity.
You may think that spending an extra year perfecting a pace-maker might end up saving lives, but what if more people die in the year before you go to market than would've ended up dying had you launched with something almost perfect, but with potential defects?
Time is expensive in so many more ways than just capital spent.
My argument (and I’m just thought experimenting here) is that without NASA’s rigor, their programmes would have failed. Public support, and thus the market for soace projects, would have dried up before SpaceX was able to “do it faster”.
(Feel free to shoot this down: I wasn’t there and I havn’t read any deep histories of the conpanies. I’m just brainstorming to explore the problem space)
Doing things right takes less time in my experience. You spend a little more time up front to figure out the right way to do something, and a lot of the time that investment pays dividends. The alternative is to just choose the quickest fix every time until eventually your code is so riddled with quick fixes that nobody knows how it works and it's impossible to get anything done.
I've never found a good way of screening for the ability, and more, for when not to go deep, because everyone will come up with some example if you ask, and it's not the sort of thing that I can see highlighting in a coding test (and _certainly_ not in a leet-code test!). If anyone has any suggestions on how to uncover it during the hiring process I'd be ecstatic!
Where have you been all my life? It seems most of teams I've been on value speed over future proofing bugs. The systems thinking approach is rare.
If you want to test for this, you can create a PR for a fake project. Make sure the project runs but has error, code smells, etc. Have a few things like they talk about in the article, like a message of being out of disk space but missing critical message/logging infrastructure to cover other scenarios. The best part is, you can use the same PR for all levels that you're hiring for by expecting senior to get X% of the bugs, mids to get X/2% and noobs to get X/4%.
So, obviously, if one team is future proofing bugs, and the other team just blasts out localized short-term fixes as quickly as possible, there will come a point where the first team will overtake the second, because the second team's velocity will by necessity has to slow down more than the first as the code base grows.
If the crossover point is ten years hence, then it only makes sense to be the second team.
However, what I find a bit horrifying as a developer is that my estimate of the crossover point keeps coming in. When I'm working by myself on greenfield code, I'd put it at about three weeks; yes, I'll go somewhat faster today if I just blast out code and skip the unit tests, but it's only weeks before I'm getting bitten by that. Bigger teams may have a somewhat farther cross over point, but it's still likely to be small single-digit months.
There is of course overdoing it and being too perfectionist, and that does get some people, but the people, teams, managers, and companies who always vote for the short term code blasting simply have no idea how much performance they are leaving on the table almost immediately.
Established code bases are slower to turn, naturally. But even so, I still think the constant short-term focus is vastly more expensive than those who choose it understand. And I don't even mean obvious stuff like "oh, you'll have more bugs" or "oh, it's so much harder to on board", even if that's true... no, I mean, even by the only metric you seem care about, the team that takes the time to fix fundamental issues and invests in better logging and metrics and all those things you think just slow you down can also smoke you on dev speed after a couple of months... and they'll have the solid code base, too!
"Make sure the project runs but has error, code smells, etc."
It is a hard problem to construct a test for this but it would be interesting to provide the candidate some code that compiles with warnings and just watch them react to the warnings. You may not learn everything you need but it'll certainly teach you something.
If quick fix works it is most likely a proper fix, if it doesn’t work then you dig deeper. It is also case if feature to be fixed is even worth spending so much time.
There is bunch of stuff that could be “fixed better” or “properly” if someone took a better look but also a lot of times it is just good enough and is not somehow magically impeding proper fix.
It is a false dichotomy in that in the Aristotelian sense of "X -> Y" means that absolutely, positively every X must with 100% probability lead to Y, it is absolutely true that "This is a quick fix -> This not the best fix" is false. Sometimes the quick fix is correct. A quick example: I'm doing some math of some sort and literally typed minus instead of plus. The quick fix to change minus to plus is reasonable.
(If you're wondering about testing, well, let's say I wrote unit tests to assert the wrong code. I've written plenty of unit tests that turn out to be asserting the wrong thing. So the quick fix may involve fixing those too.)
It is true in the sense that if you plot the quickness of the fix versus the correctness of the fix, you're not going to get a perfectly uniformly random two dimensional graph that would indicate they are uncorrelated. You'll get some sort of Pareto-optimal[1] front that will develop, becoming more pronounced as the problem and minimum size fix become larger (and they can get pretty large in programming). It'll be a bit loose, you'll get occasional outliers where you have otherwise fantastic code that just happened to have this tiny screw loose that caused a lot of problems everywhere and one quick fix can fix a lot of issues at once; I think a lot of us will see those once or twice a decade or so, but for the most part, there will develop a definite trend that once you eliminate all the fixes that are neither terribly fast nor terribly good for the long term, there will develop a fairly normal "looks like 1/x" curve of tradeoffs between speed and long-term value.
This is a very common pattern across many combinations of X and Y that don't literally, 100% oppose each other, but in the real world, with many complicated interrelated factors interacting with each other and many different distributions of effort and value interacting, do contradict each other... but only if you are actually on the Pareto frontier! For practical purposes in this case I think we usually are, at least relative to the local developers fixing the bug; nobody deliberately sets out to make a fix that is visibly obviously harder than it needs to be and less long-term valuable than it needs to be.
My favorite "false dichotomy" that arises is the supposed contradiction between security and usability. It's true they oppose each other... but only if your program is already roughly optimally usable and secure on the Pareto frontier and now you really can't improve one without diminishing the other. Most programs aren't actually there, and thus there are both usability and security improvements that can be made without affecting the other.
I'm posting this because this is one of those things that sounds really academic and abstruse and irrelevant, but if you learn to see it, becomes very practical and powerful for your own engineering.
Sometimes I get “hey you did too quick requests” while posting.
Proper fix would be making better check if I am really a bot or I just casted a vote and wrote quick comment - but no one is going to care enough.
Whatever the long time dead dude was saying.
Weirdly, teams seem to adapt better to bad code. But that adaptation occurs through meetings. And meetings just destroy a team productivity.
The greenfield team usually adapts well to its own buggy code. They know the system so well inside-out that if a bug pops up they have a general idea why.
This is bad, because with natural fluctuation in team members this institutional knowledge is lost. New members don't have the benefit of knowing about the whole evolution with all its quirks, and don't have the unit tests from the previous team to prevent regressions.
This then slows velocity to near zero as the team gets replaced, and leads to the inevitable rewrite.
But that implies on a division of work that is not aligned with any communication-reducing objective.
They would be vocal about it and then spend weeks delivering nothing “tweaking db indexes” while I immediately have seen code was crap and needed slight changes but I also don’t have time to fight all the fights in the company.
I used to do it a lot too and I kind of had a "shit, I'm getting old" moment the other day when I was telling him something along the lines of "yeah, we could probably fix that deeper but it's going to take 6 weeks of meetings and 3 departments to approve this. Is that really what you want to spend your time on?"
Like you said, it's definitely a balancing act and the older I get, the less I care about "doing things the right way" when no one actually cares or will know.
I get paid to knock out tickets, so that's what I'm going to do. I'll let the juniors spin their wheels and burn mental CPU on the deep dives and I'm around to lend a hand when they need it.
I've been on a where I had 2 weeks left and they didn't want me working on anything high priority during that time so it wouldn't be half finished when I left. I had a couple small stories I was assigned. Then I decide to cherrypick the backlog to see how much tech debt I could close for the team before I left. I cleared something like 11 stories out of 100. I was then chewed out by the product owner because she "would have assigned [me] other higher priority stories". But the whole point was that I wasn't suppose dto be on high priority tasks because I'm leaving...
The product owner can decide how much time would be worth it given a probable timeline, risks and benefits, but the experienced developer is needed to provide that input information. The developer has to present the case to the product owner, who can then make the decision about if, when, and how to proceed. Or, if the developer has sufficient slack and leeway, they can make the decision themselves within the latitude they’ve been given.
Are these deeply technical product owners? Which ones would be best to make this decision and which less?
But their manager likely believes that deeper fixes aren't possible or useful for some shortsighted bean-counter reason. Not that bean counting isn't important, but they are often cout ed early and wrong.
This is where a developer goes from junior to serior.
Casey Muratori was interviewing him at HandmadeHero Con back in 2016. Here is the snippet: https://youtu.be/qWJpI2adCcs?si=ezSKud42PC3Ub-UO&t=3112
Most people can go into a deep dive if you force them to do it, but how they conduct themselves while doing it can show you if this is a thing they would do on their own.
- How did this bug make it to production? Where’s the missing unit test? Code review?
- Could the error have been handled automatically? Or more gracefully?
That is an absolutely stellar quote!
It's also more broadly applicable to life / problem solving / goal setting (if we replace the word 'bug' with 'problem' in the above quote):
"There’s a problem! And it is sort-of obvious how to fix it. But if you don’t laser-focus on that, and try to perceive the surrounding context, it turns out that the problem is valuable, and it is pointing in the direction of a bigger related problem."
In other words, in life / problem solving / goal setting -- smaller problems can be really valuable, because they can be pointers/signs/omens/subcases/indicators of/to larger surrounding problems in larger surrounding contexts...
(Just like bugs can be, in Software Engineering!)
Now if only our political classes (on both sides!) could see the problems that they typically see as problems -- as effects not causes (because that's what they all are, effects), of as-of-yet unseen larger problems, of which those smaller problems are pointers to, "hints at", subcases of, "indicators of" (use whatever terminology you prefer...)
Phrased another way, in life/legislation/problem solving/Software Engineering -- you always have to nail down first causes -- otherwise you're always in "Effectsville"... :-)
You don't want to live in "Effectsville" -- because anything you change will be changed back to what it was previously in the shortest time possible, because everything is an effect in Effectsville! :-)
Legislating something that is seen that is the effect of another, greater, as-of-yet unseen problem -- will not fix the seen problem!
Finally, all problems are always valuable -- but if and only if their surrounding context is properly perceived...
So, an an excellent observation by the author, in the context of Software Engineering!
1. Diagnosing the "real causes" one level deeper
2. Implementing a "real fix" fix one level deeper
Sometimes they have huge overlap, but the first is much more consistently-desirable.
For example, it might be the most-practical fix is to add some "if this happens just retry" logic, but it would be beneficial to know--and leave a comment--that it occurs because of a race condition.
In large and complex codebases, its often more pragmatic to build a guard in your local area against that bug, than following the bug all the way downthe stack.
Its not optimal, and doesn't make the system better as a whole. but its the only way to get things done.
That doesn't mean you should be silent though, you do need to contact the team that looks after that part of the system
Fun article, good mantra!