Hard disagree. If you cant decompose to avoid "wonky parameters" then keep them separate. Big smell is boolean flags (avoid altogether when you can) and more than one enum parameter.
IME "heavy" function signatures are always making things harder to maintain.
So when deciding whether to merge two similar functions, to me the question to ask yourself is "are future changes to one of these functions almost certain to affect the other one as well?" If not, just leave the functions separate no matter how similar they are.
“any time you have to copy paste, look for an opportunity to abstract” assumes that having an abstraction is always better, but I don't think that is the case.
In my opinion the reasoning as to why "code duplication is a code smell" is that if you have to copy and paste code around you are probably missing an useful abstraction for your code. And I think "useful" is the most important thing to keep in mind.
Sure, every time I copy and paste code I know that exist an abstraction I could create to eliminate this duplication. Generally this is pretty easy. The hard part is to understand when this new abstraction will help you to deliver the features the business need.
Much better than having some advanced mega functions I don’t understand how it’s working anyway
You really don't want to have a function that branches a lot inside. It's very difficult to test.
When you think of adding a flag, run in your head 2^n, this will give you the least number of tests needed. Do you really want to write all of them?
Hard disagree. Your type of misconception is the root cause of most broken and unmaintainable projects, and the root of most technical debt and accidental complexity.
People who follow that simplistic logic of "code can accidentally evolve separately" are completely oblivious to the fact that there is seemingly duplicate code which is only incidentally duplicate, but at its core should clearly be and remain completely decoupled.
More to the point, refactoring two member functions that are mostly the same is far simpler than refactoring N classes and interfaces registered in dependency injection systems required to DRY up code.
I lost count I had to stop shortsighted junior developers who completely lost track of what they were doing and with a straight face were citing DRY to justify adding three classes and a interface to implement a strategy pattern because by that they would avoid adding a duplicate method. Absurd.
People would far better if instead of mindlessly parrot DRY they looked at what they are doing and understood that premature abstractions cause far more problems than the ones they solve (if any).
Newbie, inexperienced developers write complex code. Experienced, seasoned developers write simple code. Knowing the importance of having duplicate code is a key factor.
> You absolutely do not want multiple (even just two) copies of what's meant to be exactly the same functionality, since now they can accidentally evolve separately. But coupling together things that only happen to be mostly similar even at the expense of complicating their implementation and interface just makes things harder to reason about and work with.
So, if things are fundamentally the same, do not duplicate, but if they are fundamentally different, do not unify. This is absolutely correct.
To which you replied:
> People who follow that simplistic logic of "code can accidentally evolve separately" are completely oblivious to the fact that there is seemingly duplicate code which is only incidentally duplicate, but at its core should clearly be and remain completely decoupled.
Despite the fact that this is exactly what the comment you replied to says.
Then you go on a clearly very deeply felt rant about overcomplication via dependency injection and architecture astronautics and so on. Preach it! But this is also nothing to do with what thfuran wrote.
> Newbie, inexperienced developers write complex code. Experienced, seasoned developers write simple code.
Sounds like the kind of overgeneralisation that overconfident mid-career developers make to me.
Consider the case of making API calls to a third party. You, today, are writing a function that calls the remote API with some credentials, reauthenticates on auth failure, handles backoff when rate limited, and generates structured logs for outgoing calls.
You need to add a second API call. You're not sure whether to copy the existing code or create an abstraction. What do you do?
Well, in this case, you have a crystal ball! This is a common abstraction that can be identified in other code as well as your own. You don't know the future with 100% confidence, but it's your job to be able to make a pretty good guess using partial information.
Where I'm saying you absolutely shouldn't copy paste is where there's a business or technical requirement for something to be calculated/processed/displayed exactly a certain way in several contexts. You don't want to let those drift apart accidentally, though you certainly might decouple them later if that requirement changes.
Writing good abstractions is hard and takes practice. Unfortunately the current zeitgeist has (IMO) swung too hard the wrong way with guiding mantras like “explicitness” which is misinterpreted to mean inline all the logic and expose all the details everywhere all the time and “worse is better” which is misinterpreted to justify straight up bad designs / implementations in the name of not overthinking things, instead of good-but-imperfect ones.
The knee-jerk response against abstraction has led to the majority of even seasoned, experienced developers to write overly complex code because they’ve spent a career failing to learn how to abstract. I’d rather us as an industry figure out what makes a quality abstraction and give guidance to junior developers so they learn how to do so responsibly instead of throwing up our hands and acting like it’s impossible. This despite literally all of computing having been built upon a tower of countless abstractions that let us conveniently forget the fact that we’re actually juggling electrons around on rocks.
This is a really inaccurate generalization. Maybe you could say something about excess complexity, but all problems have some level of irreducible complexity that code fundamentally had to reflect.
Obviously, code will reflect the complexity of the problem.
But incidentally, most problems we solve with code are not that hard, yet most code is extremely complex — a lot more complex than the complexity inherent to the problem. And that's where you can tell an experienced, seasoned (and smart) developer who'd write code that's only complex where it needs to be, from an inexperienced one where code will be complex so it appears "smart".
From what I've been seeing, inexperienced developers write complex code because they are trained with a bias towards accidentally complex code (i.e., how else would you show off design patterns), they have no experience in dealing with the tradeoffs of writing accidentally complex code, and they do not understand the problems they create for themselves and others by adding complexity where they do not need it.
I'd frame accidental complexity in the same class as dead code: inexperienced developers might be oblivious to the risk presented by codd that serves no purpose, but experienced developers know very well the ticking time bomb nature of it.
The last couple of days have been annoying, but I got it to work; just not as easily as I wanted. The platform, itself, has limitations, and I needed to find these, by banging into them, and coding around them, which is ugly.
> adding three classes and a interface to implement a strategy pattern
Sounds like the language used is the problem here, not the intent. Hasn't Java (et al) made this easier yet?
fwiw i agree that copy paste is fine
IME if something should be an independent function or module, I rarely get to the point of considering copy/pasting it in the first place. If I want to copy/paste it’s usually because the two places currently only incidentally need the same code now, and my gut usually tells me that it will no longer be the case if I have to make any sort of change.
When you say "DRY" here, would you say you had familiarity with the original definition, or merely what you (quite understandably) inferred from the acronym? Because I think the formulation in The Pragmatic Programmer is pretty spot on in speaking about not repeating "pieces of information", whereas I find in practice most people are reacting to superficial similarity (which may or may not reflect a deeper connection).
To avoid the confusion, it seems like DRY would be better named something like "Single source of truth". Because I do agree with that.
The "misunderstanding" is at least as prevalent as the original, yes. I wasn't trying to say the original is "correct" - language is determined by usage - just wondering which you were discussing.
> To avoid the confusion, it seems like DRY would be better named something like "Single source of truth".
It could probably do with a better name, but "single source of truth" is usually about the information operated on by the program, rather than information embodied in the program.
If so, then that's also news to me. I'd have thought that e.g. something like input validation code that can be reused both in backend and client would go under single source of truth. Which I would always prefer not to be repeated, but frequently hard to do unless you have same language in backend and frontend or codegen.
And that happens.
The beginners problem lies in the reasons why that happens — e.g. very often the reason is that someone didn't really think about their argument and return data types, how functions access needed context data, how to return when functions can error in multiple ways etc, so if you find yourself reimplementing the same thing twice because of that — sure thing, you shouldn't — what you should do is go back and think better about how data is supposed to flow.
But if you have a data flow that you are very confident with and you need to do two things that just differ slightly just copy and paste it into two distinct functions, as this is what you want to have in some cases.
Dogmatism gets you only so far in programming.
I've tried to counter-meme with the joke that collapsing superficially similar code isn't improving it, but compressing it, and that we should refer to such activity as "Huffman coding".
It's also worth noting that the focus on syntax can also miss cases where DRY would recommend a change; if you are saying "there is a button here" in HTML and also in CSS and also in JS, your code isn't DRY even if those three look nothing alike (though whether the steps necessary to collapse those will very much depend on context).
Personally, I don't think the ambiguity is actually much of a problem; often it's not ambiguous, and when it is it's usually the case that multiple ways of organizing things are reasonably appropriate and other concerns should dominate (they may need to anyway).
As in most vague problems, two extreme solutions (join vs dup) are a wrong way to think about it. I have some ideas on how to turn this into a spectrum in a nearby comment.
I think it is important because DRY-flavored problem is basically the thing you meet in the code most. At least that is my experience, as a guy who hates typing out and rediscovering knowledge from slightly different code blocks or tangled multi-path procedures and refactoring these — either in hope that nothing breaks in multiple places, or that you won’t forget to update that one semi-copy.
I’m programming for a very long time and seemingly no one ever even tried to address this in any sensible way.
1) see how it differs from the original immediately next time
2) other devs would see that it’s not just code, but a part of a common block, and follow ideas from it
3) changes to the original block would be merge-compatible downwards (and actually pending)
4) can eject code from this hierarchy in case it completely diverges and cannot be maintained as a part of it anymore
Instead we generate this thread over and over again but no one can define “good {structure,design,circumstances}” etc. It’s all at the “feeling” level and doing so or so in the clueless beginning makes it hard to change later.
I like what you are saying, i think, but am stuck on this internal coupling.
If you want to patch the origin without cluttering other locations, just move it away from there and put another copy into where it was, and edit.
The key idea is to still have the same copied blocks of code. Code will be there physically repeated at each location. You can erase “block <name> {“ parts from code and nothing will change.
But instead of being lost in the trees these blocks get tagged, so you can track their state and analyze and make decisions in a convenient systemic way. It’s an analysis tool, not a footgun. No change propagates automatically, so coupling problem is not a bigger problem that you would have already with duplicated code approach.
You can even gradually block-ize existing code. See a common snippet again? Wrap it into “block <myname> {…}” and start devtime-tracking it together with similar snippets. Don’t change anything, just take it into real account.
That's why DRY is a smell (indicates that something might be wrong) and not a rule.
I've seen, and even currently work on stuff that has beautiful but hard-to-grok abstractions all over the place (typical result of work of unsupervised brilliant juniors, technical debt in gigatons down the line but its almost always other people's problem). The thing is, that code has seen 10 major projects, absorbed other stuff, meaning and structure of data changed few times, other systems kept evolving etc.
Now all those abstractions are proper hell to navigate and perform any meaningful change. Of course another typical brilliant 5-second-attention-span junior result is complete lack of documentation. So you see stuff happening, but no idea why or why not, what does it mean down the line in other systems, why such choices were made and so on.
These days, I've had enough of any-design-patterns-at-all-costs kool aid and over-engineered cathedrals for rather trivial stuff (I think its mostly down to the anxious ego issue but thats for another discussion), I am more than happy to copy&paste stuff even 20x - if it makes sense at that place. And it does surprisingly often. Yes its very uncool and I won't brag about it on my next job interview, but it keeps things refreshingly and boringly stable and surprisingly also easier to change and test consequences, and somehow that's the priority #1 for most of the companies.
In research it is absolutely OK to copy paste a number x of times, because you don't know a priori what will work the way you want.
Usually, I write an algorithm to solve my problem, then I copy paste the function and change it a bit with another idea, and set a switch to choose between them. Then I copy paste another time as the ideas are flowing, and add one more switch.. Etc..
At some point, when I feel that there is too much duplicated code, I abstract the parts of the functions that are similar and never change, so that I can focus only on the changes of ideas, and no more on the mechanic of the methods.
As the code converges toward something I like, I PRUNE the code and remove all not used functions.
But this process can take weeks, and I can go to another issue in the main time, this is because I don't know in advance what is the right thing to do, so I get a code with several parts duplicated, and when I come back to them, I can choose which version I want to use, if something start to feel smelly, I prune it, etc.. Iteratively.
What I wanted to say, is that duplication of code is really dependent on the kind of code I'm doing.
If I'm doing an app, it's way easier to determine which code to keep and wich code to remove and which code to duplicate. But not all fields are the same.
At some period of my life, I always made clean code for research, you loose too many ideas and hidden behind the abstractions, you are not able anymore to work with your code. When you get a new idea, it requires to go through all the abstractions, which is insane in a very rapidly evolving code.
Connecting that idea back to the discussion:
1. IME, usually when code looks similar there exists a nice abstraction (a nice "name" future people will understand) for the similar bits. Allowing duplication to grow when you could have properly named things will eventually slow down development.
2. Functions with many parameters are rarely that kind of nice abstraction. The commonality is something much more contained, and functions with large parameter counts should usually be relegated to "entrypoints" or other locations where you're actually merging a thousand different concerns.
3. Bad abstractions are much more expensive than duplication. I have zero problems with committing duplicated code when there aren't any obvious solutions and letting a better plan materialize later.
function doStuff(flag: boolean) { // do some stuff if (flag) { // do stuff a } else { // do stuff b } // more stuff }
you may want to do two implementations that are something like
function doStuffA() { doSomething(); doStuffSpecificForA(); doSomethingElse(); }
and
function doStuffB() { doSomething(); doStuffSpecificForB(); doSomethingElse(); }
And that can be simulated in code you own by splitting the meat of a set of requirements into one or two bodies, and then doing setup, tear down, or a step in the middle differently in different contexts. So now you have a set of similar tasks with a set of subtasks that intersect or are a superset of the other.
I have seen so many terrible projects with methods with endless arguments/paramters, nested object parameters the signatures are fucking insane
The biggest stench to me in any project is when I see a majority of methods all have > 6 arguments
To quote Shoresy: so dumb
Probably one of those ‘truth is in the middle’ kind of situations.
People are all talking about the format of the code, while what defines if it's a good architecture or not is the semantics. Just evaluating that heuristic (yours or the article's) will lead you into writing worse code.
This "it's more important to wrap your code at 80 columns than to understand how the cache hierarchy works" stuff is becoming worryingly endemic. Teamscale has built an entire business around fooling nontechnical managers into believing this shit is not only worthwhile, but should be enforced by tooling, and middle managers at FAANGs, who should know better, are starting to buy in.
I mean, where you wrap is not important, and is best left to tooling (brain cycles and meeting time can be used for more important things)
Modifying those boolean flags within the context of your tests is practically free. Trying to merge 4 files into one is… not.
Tabs vs spaces - people disagree but usually can adapt to the team if needed.
Use java1.4 for green-field web app - hard disagreement for many, looking for new job is more attractive option.
But, I'll also point out that just like reading about exercise, merely reading the book doesn't help unless one is willing to practice and -- much, much more difficult -- get buy-in from the team. Because software engineering is usually a team sport and if one person is reading these kinds of books and trying to put them into practice, and the other members of the team are happy choosing chaos, it's going to be the outlier who gets voted off the island
Or is it just getting from point A to point B that happens to be the same in two places right this instant?
Similarity can be fleeting.
The more I do this software engineering thing the more I feel like this “advice” bites me in the butt. Understanding when you should duplicate code versus when you should consolidate (or if you should just write a TODO saying “determine if this should be split up by [some set in stone timeline]”) is simply just a HARD problem (sometimes at least), and we should treat it as such.
DRY/ WET or whatever shouldn’t be a maxim (let alone a habit! lol), it should at best be a hand-wavey 2-bit dismissal you give an annoyingly persistent junior software dev who you don’t want to actually help!
This heavily depends on how likely it is for the reasons of change to also apply to the other copies. If the reasons for why the code is the way it is are likely to evolve differently for the different copies, then it’s better to just leave them as copies.
Just being the same code initially is not a sufficient reason to create an abstraction. Don’t focus on the fact that the code is currently the same, instead focus on whether a change in one copy would necessarily prompt the same change in the other copy.
This also applies to pieces of code that are different from the beginning, but are likely to have to change in conjunction, because they rely on shared or mutual assumptions. If possible place those pieces of code next to each other, and maybe add a source comment about the relevant mutual assumptions.
In other words, avoiding code duplication is a non-goal. Keeping code together that needs to evolve together is a goal. Instead of DRY or WET (don’t repeat yourself, write everything twice), think SPOT (single point of truth).
Rules can change enough from year to year so that parameters isn't enough. You will end up with code specific for each year.
You don't want to introduce any chance of changing results for old years when changing common code.
So best to have no common calc code. Each year is fully set in stone.
The rules for how to calculate taxes for a past year don't change, but you probably didn't implement the previous year's rules perfectly.
If you discover a mistake in how you calculated taxes for a previous year, you should recalculate them so that you can file an amendment.
Copy pasting code multiple times is never really “fine”. I’d argue that for most things you’d probably be better off writing a duplication script rather than abstracting it into some over complicated nonsense. It’s much easier to change, and delete, things later this way. It’s obviously not what we teach in CS though, but we really should.
Hard disagree on that. Frameworks change over time. How certain are you that they won't make a seemingly tiny design decision in the future that breaks your software?
One of the most valuable things tests can do for you is to confirm that it is safe to upgrade your dependencies.
If all your test does is duplicate tests from dependency that might be a waste of time... provided that's a stable, documented feature and not something that just happens to work but isn't necessarily expected stable behavior.
But you shouldn't skip testing something because you're confident that the dependency has already covered that.
The tests should prove your software still works.
Integration tests for complex flows inadvertently tests your dependencies, which as you say is awesome for when you have to upgrade.
The only part that's relevant to you is how it interfaces with your own code. If their behavior changes but your code still does exactly what you want it to, the test shouldn't fail.
1. They may not accept the PR
2. Even if they do accept that PR, there's no guarantee that in two years time some maintainer will decide to change that behaviour (and update or discard the test you contributed) anyway.
From https://go-proverbs.github.io/: A little copying is better than a little dependency.
Curious to see how the community is divided on this, I think I'm more leaning towards the single implementation side.
To the author's point - a wonky param to control code flow is a clear and glaring sign that you consolidated something that wasn't actually the same.
The similarity was a lie. A mistake you made because young features often have superficial code paths that look similar, but turn out to be critically distinct as your product ages.
Especially with modern type systems - go ahead and copy, copy twice, three times, sometimes more. It's so much easier to consolidate later than it is to untangle code that shouldn't have ever been intertwined in the first place. Lean on a set of shared types, instead of a shared implementation.
My future self is always happier with past me when I made a new required changeset tedious but simple. Complexity is where the demons live, and shared code is pure complexity. I have to manage every downstream consumer, get it right for all of them, and keep it all in my head at the same time. That starts off real easy at shared consumer number 2, and is a miserable, miserable experience by consumer number 10, with 6 wonky params thrown in, and complex mature features.
---
So for me - his rule of thumb is egregiously too strict. Consolidate late and rarely. Assume the similarity is a lie.
I've been bitten by both decisions in the past. Prematurely abstracting and "what's 4 copies gonna do, that's totally manageable" until it cost quite some time to fix bugs (multiple times then, and because of diverged code paths, with multiple different solutions)
That's actually a really clever way to do things and I think I'll adopt it.
Generally, my anecdotal experience is that Go libraries have far fewer average dependencies than the equivalent Rust or JavaScript libraries, and it may be due in part to this (the comprehensive standard library also definitely helps).
I definitely tend to copy small snippets between my projects and rely sparingly on dependencies unless they're a core part of the application (database adapter, heavy or security-sensitive specifications like OIDC, etc)
> You just never know when you have to revert a particular change and there's a sense of bliss knowing where you introduced a bug six days ago and only reverting that commit without going through the savagery of merge conflicts.
This is key for me: a good shape to aim for with a commit is one that can be easily reverted.
It's usually too hard, regardless of what your commits look like individually, to revert "just one buggy small bit" without breaking the rest of the new feature that was supported by that change, or re-introducing an old bug, or having other inconsistent resulting behavior. And "turn off the whole feature" is rarely desirable unless the bug is producing truly catastrophic behavior.
A roll-forward "just fix that bug" is the ideal case. A more complex "roll forward and make a kinda involved fix" is common too. But neither of those regress things from a user or consumer POV.
Make it bisectable and life will be easier down the line.
I mean have them in the CI as well, but for sure have them as pre-commit hooks.
For those still in bzr land, there used to be a wonderful "bzr-pipelines" plugin to enable seamlessly working on a set of interdependent changes.
You can and should split your features into a series of product/codebase improvements that end up delivering the full "feature" with the last of your commits. If done smartly, along the way, you'll be delivering parts of the feature so your users would start benefiting sooner.
Contrast:
git checkout -b feat-1
echo 'awesome change' > README.md
git commit -am'fix'
git checkout main
git checkout -b feat-2
echo 'awesome change' > README.md
git commit -am'moar awesome fix'
git checkout main
git merge feat-1
git merge feat-2
with its cherry-pick friendIf one is curious why in the world multiple branches would need the exact same commit, I'm sure there are hundreds of answers but the most immediate one is CI manifests are per-branch so if one needs a change to CI, I would a thousand times rather $(for b in $affected_branches; do git checkout $b; git cherry-pick $my_awesome_ci_fix; done) which will survive those branches re-joining main
There's a few things people think git tracks that it actually doesn't, instead it compares diffs and presents the user with extra information that looks like tracking. The go-to example is renaming files, there is a "git mv" but it doesn't actually track the rename. Git reconstructs the rename when looking at history based on if there was a file removed and a file added in the same commit that are some percentage the same.
In this case, if that last line was "git cherry-pick feat-2", it does the same (or at least similar) comparisons as "git merge feat-2", but errors because the user would expect cherry-pick to create a new commit and in this case it won't, instead presenting a message asking the user how to continue.
>If the component is big, then you introduce more complexity[...] If a particular function doesn't fit anywhere, create a new module (or class or component)
This smells like the agile/uncle Bob "every function should be four lines" school of thought which is really bad.
Paraphrasing Ousterhout's book, it's the other way around, when components are big and contain significant implementation you're hiding information and reducing complexity, which is the purpose of good program design. When your component/object/module is just surface you've basically done no work for whoever uses your code. I see it way too often that people write components that are just thin wrappers around some library function in which case you haven't created an abstraction, you've just added a level of indirection.
If a function does not fit anywhere that's a strong indication that it shouldn't be a separate function, it's likely an implementation detail.
I am looking for rebuttals of this naïve Uncle Bob style and while I like the content of Casey Muratori, he doesn’t resonate with more corporate people.
How many times have you had to roll back a minor version upgrade because the library maintainers *absolutely don’t* know what they are doing? Spring, Netty, and Java ecosystem, I'm looking at you...
But good practise here is continual refactoring - almost inimicable to that stability plus imagine the final sign off comes from business who don’t understand why you rewrote a codebase that they signed off two months ago and now have to re-confirm
1. Performance
2. Reliability
3. Readability
4. Correctness
5. Maintainability
6. Extendability
7. Consistency
8. Adequacy
9. Simplicity
10. Predictability
I worked at a place that did this with their frontend app. Devs rarely knew where anything should go and so for any given Component/Module, there was usually some accompanying `MyComponent.fns.ts` file. Homes were NEVER found for it later. Code duplication through the nose and lots of spaghetti coupling.
Edit: i'm definitely blowing off some steam. That said, I think there is good virtue in this "habit" so long as there is good reason that it "doesn't fit anywhere" ... and when another module starts referencing the temporary home module, it is a smell that the time is now to give it a proper home.
Very uncomfortable truth (imo) for many developers who prefer to find abstractions and general all encompassing advice. I have found that the correct placement of functions in files/classes is a "sense" that is improved solely with experience and is never truly complete. It is after all about communicating intent to other human beings for which there are no hard rules.
1. The alternative to small commits (as motivated by the difficulty in reverting large commits) is to not revert the commit, but just add a new commit that fixes the bug. The merits of this is of course debatable, but it does consitute a gap in the reasoning.
2. "Big refactorings are a bad idea", why though?
5. "It's better to create a new independent construct than to jam it into an existing module where you know deep down it doesn't make sense", why though?
6. As a counter point to designing an API via unit tests, you can also just have a design session. Think about the problem for a moment, write some design documents down. When dealing with APIs and interfaces, database schemas, this type of up-front design tends to deal by far the best results.
7. There's no clear argument why having more than two instances of a function is bad. Yeah implementations may diverge, but is that necessarily a bad thing? Even if they started out the same, why do they need to keeps staying the same?
10. "Testability is correlated with good design" is not really motivated at all. I know many designs that are good but not easily testable, and many designs that are extremely testable, but also hideously convoluted (e.g. "uncle bob's syndrome").
2. For the same reason that 'lets rewrite everything from scratch' generally is a bad idea.
5. Because deep down you know it doesn't make sense? Nobody will import your 'awesomeUtilityFunction' from the 'WaarghComponent' file, but they might if it's in a file/module called awesomeUtilities, or just plain awesomeUtilityFunction.
6. Designing an API via unit tests is the equivalent of a design session with a different whiteboard. I like how you complain about things not being well justified and then just claim that your own suggestion leads to better results without any motivation.
7. I think it should be fairly obvious that you only care about this if you _want_ to keep the implementations the same.
10. No good design is 'not easily testable'. Easily testable is a requirement for good design. In my experience, when someone makes this point they try to imply that when you bend yourself into corners to make your test work (as given in the example), you should stop doing that and instead look at better ways to abstract your dependencies (dependency injection, mockable utility functions, lambdas etc.).
This seems like backwards logic. Even if reverting the commit implies you know (or think you know) exactly what the issue is, doesn't adding a new commit fixing the issue also imply this?
> 2. For the same reason that 'lets rewrite everything from scratch' generally is a bad idea.
I'd vehemently object to the two being equivalent. Big refactorings are more laborious for sure and all else being equal, smaller are arguably preferrable to larger, but there are worthwhile changes you simply can't implement in small steps. Big refactoring tasks are mostly a problem if you have too many people working on a codebase, as it requires some degree of freezing a part of the codebase for changes to avoid merge issues.
> 7. I think it should be fairly obvious that you only care about this if you _want_ to keep the implementations the same.
The scenario as being discussed actually goes into the case where their requirements do in fact diverge, and suggests adding parameters to coax the divergent implementations into still being the same code.
> Easily testable is a requirement for good design.
I'd ask in what sense you mean the design is good? The test suite surely serves the code, and not the other way around. Afer all, we've sent people to the moon with code that never saw modern testing practices. There are other ways of ensuring code does what it should than unit tests.
I agree there are some types of code that benefits from extensive testing, but it's far from universal, and the tools needed to provide testability are anything but free, both in terms of performance and driving software complexity.
In that case, an alternative to extensive testability is to design the code in such a simple way that there isn't many places for bugs to hide.
I like this as an ideal. But I struggle to see how code can be both so simple that it is hard to make a mistake and also difficult to unit test.
Most of what I have seen forcing tests to be overly complex and brittle has been coupling code that have very different responsibilities (for example, testing business logic requires testing UI components that perform it). Separating those out would have been better design and more testable.
Your tests should test the API of the code/module/system you are responsible for. Nothing else.
And the tests should really push your API to the limit and beyond. For example, if your API is a server (with a HTTP API) then have N clients try to use it at the same time, as fast as possible, and see what happens.
And of course measure memory usage, disk usage etc. while running these tests continuously for days.
This will automatically test everything you depend on. And you will know instantly of any of the dependencies you rely on have changed in a way that impacts your code.
I have had zero (yes zero) bugs in production for years. Only because of tests that really push the servers I am responsible for hard. Way harder than any customers would.
While the tests often reveal that I am very capable of adding bugs to the code :)
The systems I typically work on are large C++ applications used by large international companies you most likely have heard about.
I have struggled a bit with this at times. There are certain things that can go from "this implementation fits on a postcard" to "this implementation fits on 3-4 pages" if you want to provide the introspection required to provide useful tests (less true in languages like Haskell that provide nice monadic tricks, granted). I like having tests just to prove the point, but I will feel quite bad ripping up _tiny_ implementations to get tests working.
But test code is also code that should be introspected in a certain way (though the objectives are different). Maybe I'm just doing some things the wrong way.
I feel like this is the end game of scrum and most agile methodologies - endless refactoring on a treadmill with no off button,
I like to be introspective, and I am human so my code is far from perfect. But if I was refactoring half of my time I would go more than a little crazy.
The good systems I have worked on have converged on designs that work for that space. Both developers and users see and value the stability.
The bad ones have had the kind of churn the article mentions. Developers are constantly rewriting, functionality is subtly changing all the time; stability doesn’t exist.
Innocuous and fine I guess but it points to (and then ignores) a deeper and interesting issue around how codebases grow, split, and merge over time. When the same thing happens at several levels of abstraction/zoom, take note. Refactoring to extract a method is similar to splitting a package is similar to splitting a monolith into microservices (and the reverse operations). The creation of a new package/module/whatever is an early signal of a "fault line" around which a future refactoring will occur (or, more often than not, a signal that the dev may not be familiar with where things go - but even in this case I tend to agree with the OP to just put it in a new place and let the code review fix it.)
This only works if you know what is and is not a potential future blocker. A perfect example is the data model: IME, most devs do not understand RDBMS very well, and so don’t understand how their decisions will affect future changes or growth. Or worse, they recognize that they don’t know, but choose to dump everything into a JSON column to avoid migrations.
What I oppose is mocking every single dependency of every single injection in the component. It ends up being 50x the code of the system under test and requires throwing it all away when the implementation changes
Interesting question. Have you got any specific examples of something hard to test without mocks?
I agree there's nuance, but I find "don't use mocks" a great starting point, and the sweet spot for web services to normally be only mocking/faking/stubbing/simulating/doubling 3rd-party APIs. I'm sure the spot moves dependent on context, e.g. writing hardware firmware might warrant a different approach.
Maybe a clearer expression would be "consider mocks a code smell".
Another common one is introducing network stalls to ensure timeout code behaves sanely. I'm aware of Comcast and the various nf trickery but I mean something a normal developer could run as part of normal tests, not involving sudo anything
Even as I write this, I'm aware that "there's more than one way to do it" and I'm sure everyone has their own favorite. But my experience has been that only the most pristine decomposed software components have very clean boundaries for testing just this one aspect. So for the rest of us stuck using the AWS sdk and similar, one can choose to shim the interactions with the SDK just to be able to swap it out for testing (which I violently oppose), or feed the software you do control a pseudo-implementation that will explode in very specific ways
What did you use for this? I've achieved this previously by abusing minio, combined with very large uploads & downloads. Maybe that qualifies as some kind of verified mock though(?)
I'd be interested to use a cleaner approach which is also realistic.
If one needs to exercise the AWS SDK itself, as part of some repo steps for a support issue, it's similarly glucose-cheap to patch moto to 500 in the necessary circumstances. I've had good luck using their ExecutionInterceptor ServiceLoader mechanism[3] to patch the Client's endpoint URI to point to moto or localstack without having to monkey with every single Client instantiation, which can be especially no-fun for STS AssumeRole or AssumeRoleWithWebIdentity setups (since one doesn't want it to use real STS for anything). That way the actual SDK pathway is still exercised all the way into the caller's code for a more honest-to-goodness bad outcome but without the hope-and-pray of contacting real S3
1: e.g. https://sdk.amazonaws.com/java/api/2.29.16/software/amazon/a...
2: https://docs.oracle.com/en/java/javase/11/docs/api/java.base...
3: https://github.com/aws/aws-sdk-java-v2/blob/2.29.17/core/sdk...
But the alternative to "mocking" is to use verified fakes (same test passes for both the real implementation and the fake) that actually do something "real" (even if it's simply persisting data in memory).
I am 100% with you on the verified fakes and love moto (and its friend localstack) for that reason. If I had lottery money, I'd even go so far as to create a moto-eqsue implementation backed by lxc or such and have it actually provision/mutate some running infra that I can snapshot and restore
1: https://www.hsqldb.org/doc/2.0/guide/compatibility-chapt.htm...
Even hardware, they likely did develop it using software simulations: they just need to ship it with their SDK. Another thing hardware has it going for it is that it does not change as much.
Note that a verified fake could still have observability points that allow you to monitor what's going on.
In general, the biggest hurdle engineers need to overcome is to believe it is possible and then simply start thinking in terms of delivering value with every single branch (hopefully user value, but a refactoring counts too), and what are the small steps that get us there?
The benefits are amazing:
* Changes are likely to be limited to only one "thing", thus making them both lower-risk and easier to review and QA
* With every step shipped to production, you learn if it is providing the benefit you are looking for or if you need to pivot
* You are not developing a feature branch while "main" moves at the same time, and wasting time on keeping up with it
* If the project gets stopped 3 months in, you have still delivered some value, including those in-between refactorings
* Your customers love you since they are seeing improvements regularly
* There is never any high-risk, big "release" where you need to sit around as 24/7 support and wait for bugs to rear their heads
I am happy to give some guidance myself: what is the "major feature" you think can only be done with a single, large change all at once? (I've done huge DB model changes affecting 100Ms of rows with no downtime, merged two "subapps" into one, migrated monoliths to microservices etc, but also built new full-stack complex features with branches with diff size being less than 400 lines for each)
One thing you can do to address them is to stash the large commit to the side, then piece by piece pull it into a new branch as a series of smaller commits. This also give a good opportunity to refactor before delivery, now that you know what the code is going to do and how.
The other way to do this (or if writing tests isn't helping) is to start with writing examples in the README (or wherever it is you keep docs). If your examples look tortured then your API is torturous. If your examples are understandable then your API is probably laid out reasonably.
Sometimes software is hard and 10x engineers just need to rewrite the whole thing or replace large systems
To subscribe to some world where we have to do that in “small changes” limits us
We shouldn’t make process to the weakest engineers
If you don't actually understand the full set of changes that will be required in order to get to your desired new end state, how can you evaluate whether "just write the whole thing" is a one month, six month, or longer project? There are going to be nasty edge cases and forgotten requirements buried in that old code, and if you discover them for the first time halfway into your big rewrite... you might suddenly find you're only 10% into your big rewrite.
(Especially if you're a "10x engineer" you should understand what makes big rewrites hard and often fail or go way over schedule/budget. You should've seen it all before.)
Re-writes take forever, because a lot of the edge cases and bug fixes are lost [1]. You might think they go away, and some do, but new ones are introduced. QA process is critical. Management becomes critical of excuses, and the longer the project is drawn out, the more they get involved. The final shift to a new system is never one-and-done. Management is paying for two systems, canary deploy.
Smaller re-writes are the ideal practice, and your code base is set up this way already, right?
Maintenance code is cheapest.
[1] https://www.joelonsoftware.com/2000/04/06/things-you-should-...
As for the "weakest" engineers, even the "strongest" engineers are weak sometimes (bad day, something personal, health issues, sleep deprivation...).
You hit on something super important that I don't see discussed often enough: Different phases in the software lifecycle require different approaches. Trying to apply "maintenance mode" to a greenfield project (or vice-versa) can be a disaster for the reason you mentioned - sometimes you just can't break the job into small changes until you have something concrete to change! There is time for principled slow change, and there is a time for rapid prototyping. But most teams use a single process for both.
(and, there is some room for taste/interpretation/etc. i think the thing about copy-paste and "the third time it's in the code, encapsulate it, and deal with flag params later" is maybe true and maybe not true and may be context or team dependent. i know i have done this a few times and if i am trying to cover that func with tests, the complexity of the test goes up fast with the number of flags. and then sometimes i wonder it is even worth writing these tests when the logic is so dead simple.)
> Copy-paste is OK once. The second time you're introducing duplication (i.e., three copies), don't. You should have enough data points to create a good enough abstraction.
There's already a principle that synthesizes this: Write Everything Twice (WET).
It's a play on words to counter the infamous Don't Repeat Yourself (DRY) principle, which clueless but opinionated developers everywhere have used time and again to justify introducing all kinds of problems involving a combination of tight-coupling unrelated code, abstraction hell, adding three classes and an interface to avoid writing two classes, etc. This nonsense is avoided by tolerating duplicate but uncoupled code until the real abstraction and coupling needs emerge.
I still cringe at a PR that a former clueless junior developer posted, where in the name of DRY added a OnFailure handler which, instead of doing any error-handling and recovery logic, simply invoked OnSuccess, because "it's mostly duplicate code and this keeps the code DRY". Utter nonsense.
> Know when you're testing the framework's capability. If you are, don't do it
Except that many frameworks are full of confusing behavior that is easy to misuse. It's funny that the post mentions `useEffect()` because `useEffect()` is so easy to misuse. Writing integration tests that make sure your app does what it is supposed to is totally fine.
> If you don't know what an API should look like, write the tests first as it'll force you to think of the "customer" which in this case is you
This is pointless. It doesn't give you any information, you're just guessing at what the API should look like. You won't actually know until it's integrated into a working application. The idea that you can design in a vacuum like this is wishful thinking.
> Copy-paste is OK once. The second time you're introducing duplication (i.e., three copies), don't. You should have enough data points to create a good enough abstraction.
No you won't, and it will often be with code that is similar in some ways but differs in others. Since the kind of people who write this kind of vague bullshit advice disapprove of things like boolean function parameters and use shitty languages that don't have metaprogramming support, this leads to "abstractions" that create awkward, tight coupling where changing one little thing breaks a million stupid fucking unit tests.
> Testability is correlated with good design. Something not being easily testable hints that the design needs to be changed.
Testability is neither necessary nor sufficient for any particular quality attribute. Depending on the application being written, it can be counterproductive to write out full unit tests for everything.
As always with these stupid "software engineering" posts, there is zero data, zero evidence, zero definitions of terms up front, and zero of anything that is actually real. It's just personal preference, making it dogma.
(FWIW, while naming is probably as important, I am not accepting bad naming as that is too easy)
FWIW, I don't see any tests for this, nor it looks simple to test it, so I don't consider this "testable" code — it looks like this was made to make other code testable, yet it fails to be testable itself.
Also, naming is horrible as well (also noted in the article).
No. You haven't seen real tech debt until you've stared into the abyss and the abyss has stared back.
This isn't _incorrect,_ but I'd say it's insufficient, or at least it lacks a sufficient treatment of what technical debt is and what is important about it.
Technical debt is known technical problems that are affecting or will affect your velocity or the business as a whole. When considering technical debt, you need to know:
- the estimated amount of time required to correct the problem - the ongoing penalty you're paying by not correcting it, if any - the hard cutoff by when the problem must be correct, if any - the consequences for not correcting the problem by the hard deadline
Three examples to demonstrate:
1) You have a User god-model that is thousands of lines of code long. It is incredibly hard to work with, and any change that interacts with it takes, on average, 5x as long as a change that doesn't. It would take appx. four weeks to refactor sufficient methods out of this model to make it as easy to work with as the rest of the code, but there is no hard cutoff by when this problem must be solved.
2) You're only able to clear your job queues on the weekend, and the job queue time has been growing steadily for the past few months. By mid-week, the average queue time is appx. 10 minutes and by end-of-week, it's nearly 30. If this problem is not solved in one month's time, the end-of-week queue time is likely to be over an hour, and in two month's time, the mid-week queue time is, too. We can add extra capacity to our job runner pool in an hour or so, at a cost of $x/month.
3) The new account creation script is a mess of spaghetti code, a real eyesore. Changing it requires about 10-20x as much effort as any other part of the system. It would take appx. 2 weeks to untangle. However, there is no hard cutoff by when this problem must be solved, and in fact, this code is rarely ever touched anyway, only twice in the last year and only small changes were required.
These three cases fall roughly into the three categories suggested by OP (1 -> preventing from doing stuff now, 2 -> preventing from doing stuff later, 3 -> might prevent you from doing stuff later), but you have sufficient information to make informed, better decisions that the simpler model would miss. For example, under the simple mode, the job queue problem would be classified as "try to focus on", but the User god-model takes priority ("minimize" "stuff now" problems). But 2 seems much simpler to fix (provided you can afford it), and the consequences to deprioritizing it in favour of the god-model fix could be catastrophic to user confidence.
And in both systems, we're most likely going to ignore problem #3, but if we know that a larger change to new account creation is coming up, one that you would expect to take 2+ days in another other part of the system, you now can expect that it would instead take 20-40 days in the spaghetti code, but that refactoring it would be appx. 16+2 = 18 days, a net win.