If you think 10x devs are unicorns consider how much harder it is to get someone 10x at the intersection of both domains. (Personally I have never met one). You are far better off with people that can work together across the bridge, but that requires actual mutual trust and respect, and we’re not able to do that.
The goal of ops is to have a strong infra that has the fewest changes possible.
They are opposite and usually there are more devs than ops but the first respondent to an issue are ops.
You can only have devops if both roles are intertwined in the same team AND, the organization understands the implications.
Everywhere I've been, devops was just an excuse to transfer ops responsibilities to dev because dev where cheaper. Dev became first respondents without having the knowledge of the infrastructure.
So dev insisted to have docker so that they would be the one managing the infra.
But everyone failed to see that whichever expensive tools you buy, the biggest issue was the lack of personal investment to solve a problem.
If you are a 1.5x dev in a 0.9x team, you get all the incidents, and are still expected to build new stuff.
And building new stuff is fun.
Spending 2 days to analyze a performance issue because a 0.3x dev found it easier to do a .sort() in Linq instead of Sql is fun only once.
Expecting Devs or Ops to do both types of work, is usually asking for trouble, unless the organization is geared up from the ground up for such seamless work. It is more of a corporate problem, rather than a team working style or work expectations & behavior problem.
The same goes for Agile vs Waterfall. Agile works well if the organization is inherently (or overhauled to be) agile, otherwise it doesn't.
Could you expand on this? How would an organization be geared up for this?
Wasn't that the original goal of DevOps? Getting dev and ops not being siloes and get them collaborating? The "make devs do ops" definition seemed to come along later.
The way I have seen it in my carreer is to have operational and development capabilities within the same team. And the idea of a „DevOps guy“ is a guy „developing operations integrations“.
As opposed to completely siloing ops nd dev.
Anyone who thinks they can hire a devop or declare that they do devops is as deluded as 97% of the folks who claim that they are doing Agile. (If you are firmly on the other side of each of the four principles of the Agile Manifesto, you may or may not be doing great software development, but it's not Agile.)
The problem with the typical DevOps team is that there's no operations expertise.
Are you claiming it's fundamentally impossible for people to get along, or just that positive interpersonal relationships can't be reliably forced at scale?
If a programmes doesn't care about the system, I already know he's shit at his job.
It would be like asking an Amazon delivery drivers to care about oil changes and tire rotations. It's much easier to have a team of mechanics whose primary responsibility is enabling drivers to just drive and focus on delivering packages.
The reality is that most devs do not consider a holistic picture that includes the infrastructure they will be deploying to. In many cases, it's certainly a skill issue; good devs are hard to find. And to flip the coin, it's hard to find good ops people too.
The reason DevOps continues to linger, however vague a discipline it is, is because it allows the business to differentiate between revenue generating roles and cost center roles. You want your dev resources to prioritize feature work, at the beckon of PMs or upper management, and let your "DevOps" resources to be responsible for actually getting the product deployed.
In essence, it's a ploy to further commoditize engineering roles, because finding unicorns that understand the picture top-to-bottom is difficult (finding /top/ talent is difficult!). In this way, DevOps is well and alive, as a Romero zombie.
Remove the handcuffs from your ops team and your reliability will SOAR.
Kubernetes deployment configurations and Ansible playbooks are code. PromQL is code. Dockerfiles and cloud-init scripts are code. Terraform HCL is code.
It’s all code I personally hate writing, but that doesn’t make it less valid “software development” than (say) writing React code.
There is also a subset that is very allergic to coding at this point. I've interviewed enough to see people who only know HCL/yaml. There is enough need and work (waste?) in the space that roles like this can exist
I assume the first time this happens at any given company will be the moment they realize fully autonomous code changes made on production systems by agents is a terrible idea and every change needs a human to take responsibility for and ownership of it, even if the changes were written by an LLM.
Understanding code you didn't personally write is part of the job.
> What happens if the person who wrote the code went on vacation?
They get yelled at, because shipping code at 5 pm on Friday and then leaving for vacation is typically considered a "dick move".
> What happens if the code is many years old and no current team member has touched the code?
Then the issue probably isn't caused by a recent deployment?
Actually it could be the opposite: they hold the LLM responsible. When the code change breaks production they'll just ask the LLM to fix it. If it can't? "Not my fault, the LLM wrote it not me! We just need to improve our prompting next time!" Never underestimate humans' capacity to avoid doing work.
Teams will figure out how to mitigate such situations in future without sacrificing the potential upside of "fully autonomous code changes made on production systems" (e.g invest more in a production-like env for test coverage).
Software engineering purists have to get out of some of these religious beliefs
To me, the Claude superfans like yourself are the religious, like how you run around poffering unsubstantiated claims like this and believe in / anthropomorphize way too much. Is it because Anthrop'ic is an abbreviation of Anthropomorphic?
The caveat is that we have to be fairly good at steering them in the right direction, as things stand today. It is exhaustive to do it the right way.
I disagree that they are really really capable engineers et al. They have moments where they shine like one. They also have moments where they perform worse than a new grad/hire. This is not what a really really capable engineer looks like. I don't see this fundamental changing, even with all the improvements we are seeing. It's lower level and more core than something adding more layers on top can resolve, that a only addresses best it can
Do you have fail hards to share along with your wins? Are we going to only share our wins like stonk hussies?
These incidents have been less and less over the last year - switching it Opus made failure frequencies less. Same thing for code reviews. Most of it is fluff, but it does give useful feedback, if the instructions are good. For example, I asked for a blind code review of a PR ("Review this PR"), and it gave some generic commentary. I made the prompt more specific ("Follow the API changes across modules and see impact") - it found a serious bug.
The number of times I had to give up in frustration has been going down over the last one year. So I tend believe a swarm of agents could do a decent job of autonomous development/maintenance over the next few years.
How to find problems through testing before they happen is a decades-long unsolved problem, sadly.
The teams are going to figure out how to mitigate bad deploys by using even more AI & giving it even better information gathering.
Think bigger, it's not something you are using today. The next config language should have schemas built in and support for modules/imports so we can do sharing/caring. It should look and feel like config languages and interoperate with all of those that we currently use. It will be a single configuration fabric across the SDLC.
This exists today for you to try, with CUE
I've been cooking up something the last few weeks for those interested, CUE + Dagger
https://github.com/hofstadter-io/hof/tree/_next/examples/env
Python is better than bash in ops, been using more Go in this space
Config is another beast and separate languages
For comments, I use a _comment field for my custom JSON reading apps
How are people not embarrassed by this complete lack of quality in their work?
The current popular config choices cause a lot of extra work, bugs, and effort. Is improving the status quo not a worthy goal anymore? Are we at a point in history throwing our hands up and saying meh, I deal with this... is basically where people are today? (I'm somewhat a believer of this based on anecdata and vibes)
1. already installed everywhere,
2. easy to parse in every language,
3. supported by editors/linters/CI tools,
4. stable enough that vendors bet on them.
however, you don't want config being turing complete, that creates a host of other problems at a layer you don't want them
If your config is turing complete and consumed as-is, then without a lot of discipline you can dig yourself into a hole, sure.
If you're producing YAML that is not turing complete, that constraint means you have to code in a way that produces deterministic output. It's actually very safe, and YAML maps 1:1 to types in something like Python.
My favourite go-to example is for AWS Cloudformation:
DevOps isn't a tool, but there are lots of tools that make it easier to implement.
DevOps isn't how management can eliminate half the org and have one person do two roles, specialization is still valuable.
DevOps isn't an organization structure, though the wrong org structure can make it fail.
DevOps is collaboration. It's getting two distinct roles to better interoperate. The dev team that wants to push features fast. And the ops team that wants stability and uptime.
From the management side, if you aren't focused on building teams that work well together, eliminating conflicts, rewarding the team collectively for features and uptime, and giving them the resources to deliver, that's not a DevOps failure, that's a management failure.
If you can't account for someone spending x% of their time working with a team but for budgetary purposes belonging to a different team then sack your accountants.
DevOps,like agile, when done correctly should help to create teams that understand complete systems or areas of a business work more efficiently than having stand alone teams. The other part of the puzzle is to include the QA team too to ensure that the impact of full system, performance and integration tests are understood by all and that both everyone understands how their changes impact everything else.
Having the dev team build code that makes the test and ops teams life easier benefits everyone. Having the ops team provide solutions that support test and dev helps everyone. Having test teams build system that work best with the Dev and ops teams helps everyone.
Agile development should enable teams to work at a higher level of performance by granting them the agency to make the right decisions at the right time to deliver a better product by building what is needed in the correct timeline.
DevOps and agile fail where companies try to follow waterfall models whilst claiming agile processes. The goal with all these business and operating models is to improve efficiency. When that isn't happening then either you aren't applying the model correctly or you need to change the model.
I first hand saw in, AWS devDays, an AI giving SIWINCH as "root-cause" of Apache error in a containerized process is in EKS for a backend FCGI process connection error. It has been extremely hard since that demo to trust any AI for system level debugging.
(2) AWS is not a leader, if even a contender, in the AI space. I would not evaluate the potential based on a demo they produced
It seems to have become: "we turned ops into coding too, so now the ops team needs to be good at software engineering"
My personal experience says that the best way is that Ops team shouldn not be repurposed as Developers, rather put the experienced Developers into Production Support (incident management, that's intense Ops, working in shifts and weekends, etc.). And rotate them whenever needed. Over a period of time, you'll invariably see less defects and issues percolating down from the Devs, and then after both sides are stable and working well together with less friction and open tickets, then some more tech savvy Ops members can be rotated into Development teams as rookie devs to help reduce costs a bit (as there'll invariably be some natural attrition among the Devs and Ops, so this gives an alternative career path to the Ops team (who are usually less paid, and more stressed), and pushes the Devs not to become complacent). Such an approach is doable and productive.
Most places I've worked it was the even worse "we've laid off the ops team, now developers are responsible for both" followed by "no we can't hire any more developers, we have enough already".
Like everything, the original intentions must have been noble. But as we can see, looking back, it got popular and popular enough to get to the enterprise types.
Nothing really survives that.
PS: I have witnessed a sysadmin team being renamed DevOps and then SRE with not much other meaningful changes. I couldn't believe it at the time.
It was, ca. 2012-15. Sysadmins making automation tools so they could offload the horseshit, often batshit bash/perl scripting work, of manually provisioning dev environments (on VMs, or even basic configuration of new bare metal) to devs, who were already more comfortable with writing their own automation. Devs can unblock themselves, and devs hate relying on anyone else and everyone worships and fears the devs, so fine, give them the sysadmins' rope and rafters.
Moving to a "cattle not pets" mentality for servers well before the proliferation of containers and microservices, much less the mainstreaming of serverless workflows and cloud compute. CI/CD, to make software release processes scriptable, or even better declarative, tasks that could be tested and verified in version-controlled source _before_ being deployed, just like the software itself.
Better automation and better testing meant devs could ship safer and faster; devs owning pipelines meant devs could fix dev-related problems faster.
A lot of early devops tools were written by sysadmins who were tired of being buried by rapidly growing requests to unblock developers, who were outnumbering them by the hundreds or thousands to one at FANG companies (pre-FAANG, much less the big six).
Puppet attacked config management by turning it into declarative code, Ansible made that easier to deploy; Luke Kanies and Michael DeHaan came from sysadmin. HashiCorp made VM provisioning scalable; Armon Dadgar and Mitchell Hashimoto were compsci students who hated doing ops work with rudimentary early cloud services. Most of their early sales inroads into companies came from IT departments using their open-source products; most of their early evangelists were IT executives.
Google splintering devops into the SRE role they coined mostly reflected how they (thought they) had made the "devs unblocking themselves on provisioning" problem that had inspired a lot of foundation tools simply part of the dev culture, especially through GCS and k8s. They didn't think about "devops" anymore much like people don't think about breathing, and narrowed their focus onto uptime.
That was really the failure IMO, that the idea was mostly a cultural one: people working on a problem should also have a stake in, or ownership of, the things they need to unblock their work. A dev being "blocked" from dev work by IT because only IT can provision a piece of hardware or stand up a VM is a cultural problem; the largely open-source tools made by sysadmins and junior/student devs were a response to an entrenched enterprise culture that showed no interest in doing the work necessary to solve that problem.
The tools forced the culture change, but then the tools created their own culture, and the world that defined the culture also changed beneath them. But the companies built around those tools didn't want to die, so they turned devops into whatever might keep them alive.
The problem isn't that "devops" failed to do the job it set out to do (make sysadmins' lives easier), it's that the entire problem area changed so much, and so quickly, that its goal was no longer relevant. There were no "sysadmins" left to help; there are still systems, and there are still administrators, but their responsibilities have been diced up and tossed into the organizational winds.
Not quite as easy of a narrative for the founder of an ops company selling an ops product to frame in a company blog post, though. Not that things in the post are necessarily wrong, but IMO the problem isn't "devops failed", it's why the fuck are we still talking about devops? The word means nothing anymore, its massive overloading pollutes any discussion about who's having problems, what those problems are, and what the solutions to those problems might be.
Or, IMO the problem is that few to no people are asking the modern equivalent of "how do we make sysadmins' lives better?" They're instead chasing a ghost of a concept that peaked a decade ago, because that's easier than looking at an organization's failures from both a sufficiently high and low level to see the cracks that run all the way through them.
It is a bad idea for a company to give shoddy after-sales support to customers, because they would then lose the customer's trust and relationship in the long run. No customer wants to see their production systems have frequent incidents caused hours or days of outages.
Vendor companies ignoring investment and support for Production Support on their Products/Services, do so at their own peril.
In fact, canny companies have realised the real money is not in upfront cost, but in volume billing (billing/invoicing, based on monthly transactions counts, number of users/licenses and tiered rate card), so they need to have adequate Production Support teams
This is why companies are trying their level best to move existing customers to subscription services (e.g., Office 365 by Micro$oft).
The problem in your case is not the dev vs ops split, it's a company culture thing which I'm sure you see play out in more places than this current focus
DevOps is a methodology. DevOps as a role or team name is a fantasy from people who do not understand the methodology.
If you want DevOps to work, your Ops must be member of the development team, take part in the sprints, etc. But many company do not want to do that because they want to separate ops and dev budget/accounting and do not want to hire enough people with ops skills.
I'm so sick of this nonsense. "Devops" isn't failing, isn't an issue, you can rename it whatever you want, but throughout my career the devops engineers (the ones you don't skimp on) are the best, highest paid professionals at the company.
I don't know why I keep reading these completely crazy think-pieces hemming and hawing about a system (having a few engineers who master performance/backups/deployments/oncall/retros) that seems to be wildly successful. It would be nice if more engineers understood under-the-hood, but most companies choose not to exclusively hire at that caliber.
I have been foolish enough to accept a few project proposals with DevOps role, which in the end meant ops work dealing with VMs, networking and the like.
Hence my vitriol: https://news.ycombinator.com/item?id=46662287.
Also: please could he please avoid doing it by illustrating his non-sense with graphs that are both childish and non-sensical?
> What the devs care about is the ability to understand the product experience from the perspective of each customer. In practice, this can mean any combination or permutation of agent, user, mobile device type, laptop, desktop, point of sale device, and so on
Really? Any permutation?
Most (arse hole) devs
- Import world - as it works on their latest 1TB machine or macOS studio
- always on the latest iPhone or pixel
- Add 100 tracking that works on their own machine
- POS device? They should ask some of devs to go and work on their canteens that have POS
DevOps, shift left, full stack dev, all reminds me of the Futurama episode where Hermes Conrad successfully reorgs the slave camp he's sent to, so that all physical labour is done by a single Australian man
Speaking darker, there is a kind of - well, perhaps not misanthropy, but certainly a not-so-well-meaning dismissiveness, to the "silo breaking" philosophy that looks at complex fields and says "well these should all just be lumped together as one thing, the important stuff is simple, I don't know why you're making all these siloes, man" - assuming that ops specialists, sysadmins, programmers, DBAs, frontend devs, mobile devs, data engineers and testers have just invented the breadth and depth and subtleties of their entire fields, only as a way of keeping everybody else out
But modern systems are complex, they are only getting more so, and the further you buy into the shift-left everyone-is-everything computer-jobs-are-all-the-same philosophy the harder and harder it will get to find employees who can straddle the exhausting range of knowledge to master
I don’t think this is the right take. “Silo’s” is an ill-defined term, but let’s look at a couple of the negative aspects. “Lack of communication”, and “Lack of shared understanding” (or different models of the world). I’m going to use a different industry example, as I think it helps think about the problem more abstractly.
In the world of biomedical engineering, the types of products you are making require the expertise of two very different groups of people. Engineers and Doctors. A member of either of these groups have an in-group language, and there is an inherent power differential between them. Doctors are more “important” than engineers. But to get anything made, you need the expertise of both.
One way to handle this is to keep the engineers and doctors separate and to communicate primarily via documents. The doctor will attempt to detail exactly how a certain component should work. The engineer will attempt to detail the constraints and request clarifications.
The problem with this approach is that the engineer cannot speak “doctorese” nor can the doctor speak “engineerese”; and the consequence is a model in each person’s head that differs significantly from the other. There is no shared model; and the real world product suffers as a result.
The alternative is to attempt to “break the silos”; force the engineers and doctors to sit with each other, learn each other’s language, and build a shared mental model of what is being created. This creates a far better product; one that is much closer to the “physical reality” it must inhabit.
The same is true across all kinds of business groups. If different groups of people are required to collaborate, in order to do something, those people are well served by learning each other’s languages and building a shared mental model. That’s what breaking silos is about. It is not “everyone is the same”, it’s “breaking down the communication barriers”.
I don't think anyone thinks siloes are themselves a good thing, but they might be a necessary consequence of having specialists. Shift-left is mostly designed to reduce conversations between groups, by having individuals straddle across tasks. It's actually kind of anti-collaboration, or at least pessimistic that collaboration can happen
I am arguing that all such people, whether developers or ops or ux designers or product managers; need to engage in this learning as they collaborate. This doesn’t mean that we want the DevPM as a resultant title, just that Siloing these different groups will lead to perverse outcomes.
Dev and ops have been traditionally siloed. DevOps was a silly attempt to address it.
To the comments dev and ops are different. They are! I think the magic is massive platform team support too. I am not troubleshooting why splunk indexes aren't indexing for example.
DevOps is a mess of our own making - embracing K8s created complexity for little gain for nearly all companies.
I apologize if my words were sharp because many DevOps engineers were not mean to me. Perhaps I just had bad luck to deal with ignorant gatekeepers to production. You already know if my opinion doesn't apply to you.
And I don't want to trivialize the reality of enterprise platforms where bespoke connectors rule. I have dealt with migrations of platforms that are business critical and managing version compatibility and ensuring none of the integrations regressed was par for the course. I am not even saying that that makes me qualified to replicate Honeycomb.io. But I do think someone with a deep technical background in building observability platforms armed with Claude Code or Codex and armed with the right set of MCP's and all the necessary tooling should be able to build a clone of Honeycomb.uio.
Maybe it won't be a fast turnaround like a typical vibe-coded project but even if it is a month-long project to even get to 60% feature parity. these vendors will have to sit up and pay attention.
as you immediately trivialize something it seems you know very little about
MCPs are outdated btw, it's bad to attach a bunch of MCPs in with your messages, pollutes the context. If you don't do this, you can build agents that are better than copilot/codex on gemini-3-flash. Claude Code is probably the leader here, but still definitely not capable of what you it is
Eventually a bureaucrat becomes the manager of the team, and seeks to expand the set of things under DevOps' control. This makes the team a single point of failure for more and more things, while driving more and more developer processes towards mediocrity. Velocity slows, while the DevOps bottlenecks are used as a reason to hire.
It's an organizational problem, not a talent or knowledge problem. Allowing a group to hire and grow within an organization, which is not directly accountable for the success of the other parts of the organization that it was intended to support, is creating a cancer, definitionally.