Meanwhile, my perfectly purring department was struggling to keep the lights on.
It's a serious problem in this industry due to the disconnect between non-technical management (who understands how to double click) and engineering (who holds the company standing).
<insert IBM story about IT department cost cuts>
I'm not sure how we solve this, other than having management come from engineering.
Building the right incentives around that can be tricky, those incentives need to ensure the highest levels of management aren't themselves disincentivising their directs & their departments from surfacing pain & problems - but it's also pretty common for people to mask those signals purely out of a well-intentioned desire to help. It's important to coach people on the idea that in large group sizes, it's more efficient to let certain kinds of problems play out and not be so reactive to them.
Too many companies ground their performance incentives & processes around oversimplified ideas that don't match the reality of human behaviour
Often, 'leaders' make mistakes and people below suffer the consequences. It is important to let these leaders deal with the pain caused by their decisions from their cluelessness about how things work.
At some point in one's early single-digit they learn that touching hot stuff hurts. They start to avoid stuff that they know is hot, but still come in contact with hot stuff accidentally. Later they learn techniques minimizing probability of touching hot stuff even by accident. By the time one reaches twenty or so, the only times a person burns themselves is really by being way too reckless.
> Like it or not, sometimes the best thing for an organization isn't to just fix every problem and prevent it from bubbling up; it needs to be treated like a learning opportunity for org leadership, which means sending the pain signals upward before just repairing it.
Should we accept that management as a whole is in general more clueless than your average teenager? The "learning opportunity" should, ideally, happen exactly once, realistically once in a very rare while.
> It's important to coach people on the idea that in large group sizes, it's more efficient to let certain kinds of problems play out and not be so reactive to them.
You are conflating two things here, I guess. Yes, some "problems" are not worth to be fixed proactively or at all, but that has very little to do with group sizes, it's a "simple" cost-benefit tradeoff. As groups grow the left hands tend to become increasingly unaware of what the right is doing and that is the primary reason why we have management class in the first place.
The problem OP raises is attention span of the metaphorical gold fish in the management layers. Even if a department does everything in their power to communicate impending problems, do risk weighed cost-benefit analyses, get proactive treatments pre-approved by higher management, the same higher management forgets the risks and costs savings once they have been mitigated, effectively incentivizing firefighting. Some teams gradually fall into eternal firefighting and burn out, others start manufacturing fires to get rewarded. The biggest problem is that it is nearly impossible to tell the two apart.
This is a pain signal. Some IT dude saying things are crap in every meeting is not.
More often than not it is some IT dude observing network crap-out once a month, performing analysis, noticing an upward trend and then saying in every meeting that things are crap and there will be issues twice a week in some time.
> If you're an underfunded IT department and your network has an issue twice a week, you will get that funding.
More often than not, if the IT department is already neglected they will not get that funding. Things will be delayed until the crap outs eventually actually happen twice a week and then some external heroic consultants will be hired to fix the issue underfunded IT department "could not".
I don't think that's it. Emergent problems require attention and action from leadership, who in turn can make the problem visible to higher ups. This creates signal, and positive feedback when the problem is fixed or mitigated.
If the problem doesn't exist to begin with, there is no signal. Managers don't get to show their fast-acting skills, and there are no heroics to speak of.
So ultimately poorly maintained and managed projects who deliver fixes for problems of their own doing create a perverse incentive, whereas no one is lauded or promoted for doing normal day-to-day things.
The difference is how it was communicated. Most non-Tech/non-infrastructure-people got no clue about these things. If they know you're battling the demons of plumbing on their behalf they will thank you, if you're the weird guy that has smeared dirt in the face and is seen once a week while the plumbing fails ever so often, guess what.
That means even if the problems and their fixes remain the same, the communication around them really matters. Tech people can be extremely bad with this. And if we're talking IT it is really the plumbing that holds the company together.
I’ll ask for something preventative or that otherwise hardens our systems. They ask “is it a need?” and I’ll say something like “we can function without, but that means we have a 5-10% chance in the next 6mo of having a major failure and embarrassing ourselves in front of a live audience in the thousands as well as our client.” They then decide how much that risk is worth to them, and whatever they decide is kind of out of my hands at that point. If the thing I warned them of comes because they didn’t pay for it, I can point to the receipts (though I’ve never had to, we’re small enough people remember those conversations).
60% of the time they just get what I need maybe? But ultimately it’s about CYA. Tell them what’s up, tell them what the solution is, tell them what the consequences are if they don’t do the solution, and make them decide.
Again this obviously depends on company culture and structure, but I can’t imagine on the only person who can do this!
My colleague in the IT department had one idea, replace our commercial certificates with Let's Encrypt and drop the EV requirement. In total he'd stand to get a bonus of a little over €2000. He never got the money, because things like that was part of his job apparently.
When things come up with other teams, you’ll have a catalog of tasks that were done to show why you didn’t have the same issue. The work was done, just at a better time to avoid downtime.
Speaking from experience, this does nothing. If you're at a company that is okay with average performers, then absolutely, 100%, fix all the bugs in advance, make the system rock solid and stable, prevent downtime, be a good engineer.
If on the other hand if you're at a company where 10% of people must get stack ranked and PIP, or at a company where "meets expectations" actually means you're going to get the stick, and you're supposed to be "redefining" expectations every year ... then yeah, don't do anything preventative. The optics are better when you take the 3am on-call and fix the issue (that you secretly knew in the first place would happen some time in the future in your coworker's code, and already knew how to fix -- but don't actually fix it until it surfaces). Be the savior that the VPs praise in the next meeting, that's your insurance against the PIP.
They set the rules of the game, you just play the game. The rules were their choice. They could have chosen different rules.
Personally, I only rehire people from projects that went smoothly, not ones where I had to make the urgent phone call.
Teams that "just work" are highly valued. They clear up my attention for other things.
Which means that everyone is playing the game to not be cut.
But I'm not sure the author of this thread works in such a place. In that case the game is different.
In the case where the "urgent midnight fix" is important, it's necessary to promote the visibility of your (just working) team. If visibility is the game, then be visible.
You know how test-driven-dev was always "write the test first"? In that environment a test is always written before any code.
Well in the "ticket closing" scenario it's important to open a ticket, regardless of how trivial, for every code action taken. For every meeting attended. For every scenario dodged. If tickets are the way to score then write tickets.
If "being a hero" is the valuable thing, then be a hero. Be prepared to champion your team every chance you get. Every time you interact with management stress the emergency you just fixed (before it became an emergency.) Tomorrow do it again with the next thing.
Management needs visibility. Be visible. I know, this seems stupid and beneath you. But that's why they call it a job, not playtime.
If someone is constantly playing the hero, I see that as incompetence. If the boss can’t see that, they are also incompetent. I have no respect for “leaders” who don’t know how to get out of the firefight.
I’ve made some high profile appearances, working 18 hour days on 4 day long outages, from vendor issues I was no part in causing. I figure that gives me some good will on playing hero without willingly creating problems for myself. I’m too old to manufacture stress for the optics.
For what it’s worth, with the right boss, I have had proper reporting work. Everything ran smooth and work was relaxed. My boss would regularly tell me I should take 3 months off because we were so far ahead of everyone. He would occasionally get bored and lob a grenade into the works to cause some chaos, but since everything else was running so smooth we were able to sort them out and keep going. People who couldn’t explain what they were doing were always getting yelled at and assumed to be doing nothing.
Yeah, but then I wouldn't have been able to pay for my healthcare. A certain toxic company's health insurance paid for my care, though. Prior to joining said toxic company I'd be racking up $6000+ in healthcare bills a year with shitty startup-sponsored insurance.
After 2 years, it was decided I didn't play the hero well enough though, and ended up having to leave. I work for a less toxic company now, but the next time I need a heart-related surgery (likely in ~5-10 years) I'll join a toxic company in the months leading up to pay for it.
The rules of the US, I guess.
> I’m too old to manufacture stress
My point was less about manufacturing artificial stress. I don't do that. But many times I see issues in coworkers' code. If the company will value and praise me for catching and fixing them early, then by all means I'll do that. But if fixing issues in the codebase early for prevention only gets me criticism of "you haven't met expectations, we expect you to exceed expectations every performance cycle" then hell, I don't feel like fixing anything proactively. In that world I'd rather be the hero that fixes it when it surfaces, that's more likely to nail the rating.
I will say my motivation for helping other people avoid issues has dropped. If they want to make problems for themselves, they can. Me helping them hasn’t worked so far, so maybe some sleepless nights will be a better teacher.
I had a former boss call me Brent after reading the Phoenix Project. That made me step back and stop helping so much. Everything seems worse, but whatever… if that’s what they want.
It's also why US car companies are a wreck.
Nothing ever breaks. - "What are we paying you for?"
Management can choose their burden.
It’s not a problem in this industry, it’s a problem everywhere.
> I'm not sure how we solve this, other than having management come from engineering.
You mean the engineers who are causing the chaos you’re complaining about?
Engineers aren’t some magic group of people who know better than others - we’re just as fallible as other people.
We've become too comfortable, since actual toil is no longer seen in the company: Manufacturing is overseas, customer support is overseas, logistics is an afterthought with established guarantees. Thus we want the mild weather and smooth meetings. If your engineering team is too smooth, maybe you should already branch out to help other related but "struggling" teams to get your hands dirty and noticed.
s/in this industry//
Alas, for many parts of society there is a large amount of people that would rather be reactive than proactive. It means it is easier today but harder long term.
You can't fix this. Out of sight, out of mind. It is hard-wired into us. It's all about the optics, and will always be.
This is the same thing. We need to reward things never going wrong as a society since this is pervasive.
Respectfully, the solution is don't smoke, exercise, eat well, sleep, avoid stressors... These aren't easy problems but their solution isn't at the individual patient level and is a simple question of capital and political will.
The 'hope' envisages a product to temporize the solution while extracting large payments.
> ...avoid stressors...
Most stress is caused by a conflict between our expectations/motivations and the reality (everyone else's).
I finally moved on to be an IC. Same story, same pressure :) You need to present to directors not because they need to know, but because your managers have a quota of N presentations per quarter, and if you back out, someone else needs to step up.
Needless to say my productivity reduces by half and sometimes to almost zero during the week or fortnight of presentations every quarter.
The business defines it as "meetings, presentations, support, coding, whatever".
Your productivity remains at 100% when you are doing what they want.
I get that you thought you were hired as a coder, and thus measure your productivity by that. That's what I thought too. I ended up doing a lot of support (which is good, but that's another thread). Until I recalibrated my definition of productivity that frustrated me. When I realized that support was productivity I got much less frustrated.
I have been on the industry for 35 years. I have seen my share of technology evolutions and o have seen the work from a dozen different dimensions. If after all that time, I find the process painful, just trust me -- they can't change me, and I can't change them. You take the warts with the wins and move on. 2-3 bad weeks, 10 good weeks. Life moves on to next quarter. Complete CEO mindset :)
Given the whole point of management is to work to ensure their own survival and growth, it would in their interest to kill genuine competition when its coming up.
Who wants to raise their new competition and lose to them, no one!
E.g. that time a central media controls power supply broke down which would have made using one of the most prestigious rooms impossible. I fixed it myself by swapping in a spare power supply from a used unit, then went on to remind them twice a year that we are now living on borrowed time and I take no responsibilities if a fault I predict to happen and get no funds to fix will in fact happen. 4 years later I got the funds.
Having stuff costs money. Everybody wants to invest funds once, but nobody wants to keep paying for maintenance.
Heroes are lauded even if they solve problems they themselves are the cause of - which is conveniently either forgotten or denied - or they are solving non-issues that are deemed important by the ignorami-class. Politics, for example, is utterly dominated by this dynamic.
It's the first instinct: let the expert run the show. However, one of the (many) ways to let a complex project fall apart completely is to hand over full control to engineers. I'm one myself, but I know what I'm good at and what not. Dunning-Kruger is often mentioned in these discussions, but don't forget it also applies to engineers that often lack any management or leadership experience of any appreciable kind. They vastly overestimate their ability to handle management and organization-wide issues and tend to not only miss the forest for the trees but actually miss the trees for the leaves.
"Unix: A History and a Memoir" by Brian Kernighan actually mentions how proper management was crucial to their success. It's a detail that's frequently conveniently forgotten by the engineers who think themselves better than the "suits". For the record, I don't claim engineers are the primary problem, but it's not just management's either. Quotes like "who holds the company standing" and "who understands how to double click" are enormous smells and IMO make quite clear what's happening here.
I don't have ready-made solutions unfortunately, but I do wish we would look further than "it's the suits". It's a systemic, human problem that I believe is a natural result of operating under informational constraints and, very human, cognitive biases on all sides.
My favorite is how elegant solutions often look simple in retrospect. So if you noodle on a problem for a while and then come up with a clever solution: once you explain it to someone they'll be like, "yeah, of course."
Meanwhile the guy next to you that overcomplicates the problem ends up getting kudos for building something so difficult :D
("I have made this longer than usual, only because I have not had the time to make it shorter.")
Blaise Pascal
— Antoine de Saint-Exupéry
― Dorothy Sayers
But when someone comes up with something simple but effective, it always looks so obvious in retrospect.
The director then clearly advised that they should use the complicated way because that's how you get published: not because you're clever, but because your solutions sound complicated.
It resonates perfectly with your comment and it's an unfortunate reality that most people don't bother for beautiful solutions and praise complicated processes. That's how we neded up with bureaucracy, probably :D
H: So, Watson.
W: Hmm.
H: You do not propose to invest in South African securities?
W: How on earth do you know that?
H: Now, confess, you are utterly taken aback.
W: I am!
H: I should make you sign a paper to that effect.
W: Why?
H: Because in a few minutes you will say it is all so absurdly simple.
W: I should say nothing of the kind!
H: You see, my dear Watson, it is not really difficult to construct a series of inferences, each dependent upon its predecessor and each simple in itself. If, after doing so, one simply knocks out the central inferences and presents one's audience with the starting point and the conclusion, one may produce a startling, though possibly a meretricious, effect.
H: I can tell by an inspection of the groove between your left forefinger and thumb, that you have decided not to invest your small capital in the gold fields.
W: I can see no connection.
H: Very likely not; but I can quickly give you a close connection.
H: Here are the missing links in the very simple chain: You had chalk between your forefinger and thumb when you returned from the club last night. You put chalk there when you play billiards, to ease the cue. You never play billiards except with Thurston. Now, Thurston, you told me, four weeks ago, had an option on some South African security which expired in a month, and which he desired you to share with him. Your checkbook is locked in my drawer, and you have not asked for the key. So, you do not propose to invest your money in that manner.
W: How absurdly simple!
H: Quite so. Every problem is absurdly simple when it is explained to you.H: "How often have I said to you that when you have eliminated the impossible, whatever remains, however improbable, must be the truth?"
In the long run, IME, you'll be recognized either by management or your peers if you keep doing that over and over again.
> elegant solutions
My favorite is how people will yell at you about how elegance doesn't matter, that they "just care that it works", and "keep it simple". I'm certain all the sayings repeated in industry are metastasized variants of actually good practices repeated by those who can't be bothered to understand what they mean.And of course that's true. We push for speed, absent of direction, while praising velocity. To be honest, at this point I'm disappointed the engineers gave up and just started becoming business people.
I couldn't even get my own dad to pay for network support for his company since he would never pay my rate for anyone no matter what. After 2 other people failed to solve his problem I fixed it in 15 minutes and then he "really" didn't want to pay because it only took 15 minutes.
I was very good at what I did but got no appreciation for keeping things from breaking, only for fixing things after they broke. Marketing paid better, and I could point at real world numbers daily and justify my pay. I don't like it anywhere near as much, but at least it gets more respect than any other IT work I did.
"This is what my regular customers pay me. If I hired one of my friends or relatives I see it as my duty to pay them at least what they are worth, this is the way you raised me."
I believe this to be true btw. If someone is really your friend, you want them to do well and that means you pay what they usually get or you don't bother them and get someone else.
Every place I've worked rewards the firefighter over the person who made sure nothing ever caught fire. And the worst part is the math is obvious to everyone except the people who set the incentives.
Note that there is also the flip side of the coin, people who spend all their time worrying about things that never happen, so it's not like you can just reward a defensive attitude things are more complicated than that.
And people ask why I hate humans.
That's what the moneys for ...
If online comments are anything to go by, I'm not alone.
If you're in the Bay Area and you can get a Sonic fiber connection, I would highly recommend them over AT&T/Comcast/etc.
If you only do middle of the pack(for one reason or another-cost, talent, etc) you become incentivized to cause problems then fix it.
Thus a net negative to society
*Also recommend sonic-their pricing and service is top tier
Also, telsa self-driving. yes, we know about the greatly publicized accidents, or the tweets of the founder, but the avoided incidents not so much.
Originally it was engineers from the top down, but over the last 15-20 years those leaders with engineering backgrounds have retired and been replaced by non-engineer MBA's. And the more I look around, the more I see that as a common trope is the US.
It read "This fixes a bug that hasn't happened yet".
It seemed really smart at first, but later I learned that the developer that added that code also had a pattern of appending spaces to the start and end of user input and comparing the length to 2 to determine whether the value was empty or not...
So I'm fairly sure "that hasn't happened yet" was probably more a case of "that I personally haven't introduced unnecessarily yet" :)
No one notices when you cut 20% of some expensive process but cause no regressions.
Sterman, Repenning and other collaborators wrote several papers after this one. All fascinating and almost entirely depressing.
Especially since MIT's Sloan school, where system dynamics first became a discipline, is just around the bend from Harvard Business school, where system dynamics first became ignored.
E.g. if the problems are quantifiable and there's a record, like dropping homicides from 100 per year to 20 per year in a city. Those extra homicides "didn't happen", but the improvement is understood.
For an one-off problem, it depends on how clear the path to the problem is. An electrician doing an inspection and noticing and fixing big electrical issues in the installation, would be appreciated, even if the accidents didn't happen.
Alas, that doesn't always fly with the populace.
Plus, even if you did overreact, that can still be the better side to have erred on, in moderation.
https://news.ycombinator.com/item?id=8940820 - 24 Jan 2015, 50 comments
https://news.ycombinator.com/item?id=39472693 - 22 Feb 2024, 434 comments
On other hand. for software engineering some of the signals that can be used to measure such a management itself can be
1. On call requirement, outages and team burnout - A well written software should not require on-calls from the dev team
2. Ask them about the "concrete" roadmap for next 6 months to a year - Absence of concrete items is a bad sign
Human groups work on shallow signalling and distributed confusion.
But in recent years I have seen people (elsewhere, not on HN) claim that Y2K was a big nothingburger, and all the money spent on fixing the bug was wasted. No, that's not true either. All the money spent on fixing the bug was why it turned into a big nothingburger. Sure, some of that money was wasted, by executives who wanted an "official" Y2K-certified certificate, issued by a consulting firm that had nothing "official" about it except their own say-so. And so they spent $2 million learning what their own employees could have told them for $2,000. THAT money was wasted. But a lot of banks were running old COBOL code that used 2-digit years, and needed to be fixed. The fact that in January 2000, everyone's bank interest was still calculated correctly, and not calculated as if it was January 1900? THAT was entirely due to the vast amounts of money spent paying old COBOL coders to come out of retirement and fix the 2-digit years.
The lesson I learned from that is that it's possible for a problem to be overhyped, even massively overhyped, and yet still be a serious problem. The other lesson I should have learned is that people rarely get credit (I won't go so far as the article authors and say "nobody ever gets credit") for fixing problems that never happened.
So when someone predicts something will happen with a 90% probability, and then the 10% chances happens and the predicted event does not happen, people will talk about what a bad prediction that was and how they were clearly wrong.
It's the same logic that causes people to say vaccines don't work because they don't stop a disease with 100% effectiveness, or that there is no point to wear a seatbelt because people still die while wearing one.
A couple of possible confounding factors I can think of:
1. Plenty of countries use software developed elsewhere.
2. I suspect that the more recently you computerised your economy, the less likely it would be to have code vulnerable to Y2K.
The assertion may have been unfounded, but I think it's just as unreasonable to assert the opposite. Bugs have cascading effects and in a sufficiently complex piece of software they can create chaos with unpredictable outcomes.
https://news.ycombinator.com/item?id=4224707 has some discussion of the events, including the fact that the control design (where each pilot has an independent stick) was part of the problem. On a design like Boeing uses where both sets of controls move together, the experienced pilot would have noticed the less-experienced pilot pulling up on the stick because his own stick would be moving, and he would have said "No, nose down." And if they had nosed down to recover speed while still high enough in the air, they almost certainly could have regained control of the plane and saved 228 lives (including their own).
So in retrospect, I think my first sentence was wrong. The software did not glitch, it did exactly what it was supposed to do. It was pilot error that caused the initial stall, and multiple pilot errors that caused the failure to recover from the stall.
There may be examples of software error that has caused planes to fall out of the sky, but I don't know of any. The only plane crashes whose cause I know were due to hardware failure or pilot error, usually a combination of the two.
I earned my first house deposit helping the team fixing the water and gas company in Wales, UK. Their entire system was running off a set of COBOL programs on a mainframe, none of which had been properly documented over the years, and the whole thing used 2-digit dates. It would have caused actual deaths if not fixed; everything would have shut down, and no water and no heating in a British winter is potentially lethal. And then it would have sent everyone in Wales a bill for 100 years of water and gas.
They were bribing retired software devs to come out of retirement with huge stacks of money, because that was cheaper than training new COBOL devs and getting them familiar with the spaghetti system.
It worked, no-one died, life went on. So obviously it was all fake rolls eyes
In other words, it’s not just a tool problem, any more than it’s a human resources problem or a leadership problem. Instead it is a systemic problem [...]
Shades of an LLMism, a bit padded, a quarter of a century ago. These days someone could easily give it a stink-eye. I'm sure that training has ingested this along with countless similar examples.
Then we soon see non-technical people start leading the same, pushing for some people to be recognized for this every sprint. Meaningless recognition starts coming in. The process fades out in just a couple weeks.
The problem is there are these people in the mix, often leading, who do not understand.
Did you change from a quiet diligent one to manipulating and playing the game (now that you know the game)? Did you go from quiet and diligent to quiet and not diligent (why do good work when meh work does the trick)? Another path?
When everyone is technical to some degree, I find that credit for technical rescue is forthcoming.
You could spin up a team of 6 engineers and have them go away and try some greenfield project. They could come up back in 6 months having shipped nothing. Which of these descriptions fits the facts?
1. The team learned a lot and ultimately decided there was no product-market fit and decided it was best to reallocate resources elsewhere. The learnings from that project will help a whole bunch of other projects across the division; and
2. They failed to ship and get subpar performance ratings for having no impact.
The answer is... both. Or either. How you are treated will depend on how you are viewed by your management chain and that's a social function. We've all encountered people who never shut up about how hard their job is. Often they end up solving problems that they created, often by not listening to anyone that those problems would occur. And they get credit for it.
You could say to people who anticipate problems to stop because it gets you nowhere. Let people fail. If only it worked that way. Instead you'll get blamed for not seeing a problem someone else created because you're viewed as competent but you aren't liked through no fault of your own.
Google seems to be the posterchild for a company that briefly solved this problem and then forgot what made them successful. I am referring to Project aristotle [1], which ultimately determined that psychological safety was the key ingredient in a team's success.
Now amplify all of this with constant rounds of layoffs where the environment isn't just for pay bumps and opportunities but where the cost of failing is losing your income. What you've created is an environment where office politics is everything.
- Arnold bought a fleet of mobile hospitals that would have been perfect for covid response, but the next governor didn’t want to pay 1% the fleet cost per year to maintain it, so he scrapped it.
- Under Obama, SARS v1 was stopped by US health workers that Trump fired because it was a “bad deal”. In the absence of that team, we got SARS v2, which was renamed to COVID 19.
There’s also the related category of “never blamed for fixing problems poorly, creating even bigger problems”.
Thanks to 9/11, plane cockpits can now be locked from the inside. Now, we have examples of commercial passenger airline pilots locking the doors and committing mass-murder-suicide by plane crash.
For some reason, these stories don’t make the news.
If you frame it this way in a meeting, you will get the attention you want. Don't say I didn't warn you because that comes with a lot of scrutiny you might not want.
erhm, if this figure is close to true i can see what market ai companies is after.
Which loop it belongs to in the model is left as an exercise for the reader.
Nobody ever gets credit for fixing problems that never happened (2001) [pdf] - https://news.ycombinator.com/item?id=39472693 - Feb 2024 (424 comments)
Nobody Ever Gets Credit for Fixing Problems That Never Happened (2001) [pdf] - https://news.ycombinator.com/item?id=8940820 - Jan 2015 (50 comments)