I never measure coverage percentage as a goal, I don't even bother turning it on in CI, but I do use it locally as part of my regular debugging and hardening workflow. Strongly recommend doing this if you haven't before.
I'm spoiled in that the golang+vscode integration works really well and can highlight executed code in my editor in a fast cycle; if you're using different tools, it might be harder to try out and benefit from it.
Sometimes very well covered code is dead code. If it has higher coverage than the rest of the project, then deleting it removes for example 1000 lines of code at 99% coverage, which could reduce the overall by .1%.
And even if it wasn’t 99% when you started, rewriting modules often involves first adding pinning tests, so replacing 1000 lines with 200 new could first raise the coverage percent and then drop it again at the end.
There are some things in CI/CD that should be charts not failures and this is one.
This reminds me of the recent discussion of gettiers[1]. That article focused on Gettier bugs, but this passage discusses what you might call Gettier features.
Something that's gotten me before is Python's willingness to interpret a comma as a tuple. So instead of:
my_event.set()
I wrote: my_event,set()
Which was syntactically correct, equivalent to: _ = (my_event, set())
The auto formatter does insert a space though, which helps. Maybe it could be made to transform it as I did above, that would make it screamingly obvious.My comment on that Gettier post:
Puttiers: When a junior engineer fixes something, but a different error is returned, so they cannot tell if progress was made or not.
https://docs.python.org/3/library/asyncio-sync.html#asyncio....
Could've been called fire() or activate(), perhaps. This is also the kind of problem lints are really good for. I wouldn't be surprised if there was an lint for this already (I haven't checked).
The real issue here is simply that python didn't camel case its built in class names; then this would only be a problem if you violated case conventions and named your method Set() (or mistyped the case as well).
You absolutely don't have to follow every piece of advice you read, but I'll argue for it once more anyway. My guiding principle is that I make mistakes from time to time, and those mistakes are incredibly expensive when they hit prod. If two things need to change in tandem then I'll derive one from the other in code (or at an absolute bare minimum put comments in both locations warning that they need to be changed in tandem, though with modern language features that's rarely necessary for any reason nowadays) (guaranteeing they actually change together). If I don't need the result of a function call I'll still give it a name and explicitly discard it (adding redundant bits of information for future readers -- indicating that I ignored the result intentionally and also reducing future git blame search time in case the result is ever valuable or the API changes). When a chunk of data has some constraints beyond being an arbitrary primitive I create a wrapper type (free at runtime in any system language) to let the compiler double-check my work. Those are all small things, but with a short library of such patterns the code I write is almost always correct if it compiles, and none of those habits are particularly hard to develop.
Not using potentially reserved words, similarly, reduces the chance of certain errors. Kind of like the swiss cheese model in aviation, you add one extra (fallible) layer of redundancy against bugs like foo,set(). Since it's a cheap mitigation, it's one I follow unless I have an amazing reason to do otherwise.
Yes, Python could have named things differently, but that ship has sailed. If I'm using Python (and it winds up being a good choice a few times a week currently) then I have to work around those design issues.
accidentally working app : correct app :: Gettier "not knowledge" JTB : proper knowledge JTB
Is it possible to backport the program analogy back into the realm of philosophy? I'm dreaming of a philosophy paper along the lines of "Knowledge is JTB with proper testing".
I'll have to look into hyper legible monospace fonts. Or maybe I'll just use Atkinson and deal with the variable spacing.
I am doing an accessibility audit for an application at work right now and I was tasked with getting things up to the WCAG AA level but I found that I had huge amounts of unused white space, even viewing it at 1024x768 desktop, and jacked font sizes up not to the "large type" levels of AAA but substantially larger which I think is easier for everyone, given the size of the UI controls information density is not going down a whole lot.
One mechanism to verify that is by running a mutation testing [0] tool. They are available for many languages; mutmut [1] is a great example for Python.
I see this a lot with performance measurement, for example. A team will run small-scale benchmarks, and then try to estimate how a system will scale by linearly extrapolating those results. I don't think I've ever seen it work out well in practice. Nothing scales linearly forever, and there's no reliable way to know when and how it will break down unless you actually push it to the point of breaking down.
It also reflects the domain. For mission critical code there better be 10 different layers of red lines between development and shipping. For web code, care for stuff like performance and even correctness can fall by the wayside.
Basically just look at the 80s and early 90s. Video games, C compilers, NAS software, operating systems and hardware sales were all almost entirely marketing driven. Before any serious Open Source revolution, you paid for almost any code that was perceived to have value. Functionality built-in was not something people took for granted.
Open Source won not because you can't market it (in fact, you can - it's just that nobody is paid to do it), but because it's free. The ultimate victory Linux wielded over it's contemporaries was that you could host a web server without paying out the ass to do it. It turned out to be so competitive that it pretty much decimated the market for commercial OSes with word-of-mouth alone. It's less about their neglect of marketing tactics and more a reflection of the resentment for the paid solutions at the time.
If someone else wrote the code, your model of why it works being wrong doesn't mean anything is wrong other than your understanding.
Sometimes even if you wrote something that works and your own model is wrong, you don't necessarily have to fix anything: just learn the real reason the code works, go "oh", and leave it. :) (Revise some documentation and write some tests based on the new understanding.)
This isn't a complaint; it's too marginal and weird a test case to complain about, and the separate OS process is always there as a fallback solution.
But I delete it when I'm done.
It's a little difficult to productionize an always_fail test since you do actually want the test suite to succeed. You could affirmatively test that you have non-zero passing tests, which is I think what we did. If you have an always_fail test, you could check that that's your only failure, but you have to be super careful that your test suite doesn't stop after a failure.
The following statement is true.
The preceeding statement is false.
I most closely associate it with Gödel and his work on incompleteness.
That shouldn't be an easy mistake to make.
Your test code should be clearly marked, and better if slightly separated from the rest of the code. Also, there should be some feedback about the amount of tests that run.
And yeah, I know Python doesn't help you make those things.
Me: Hmmm.
Managers, a week later: We’re starting everyone on a 50% on-call rotation because there’s so many bugs that the business is on fire.
Anyway, now I get upset and ask them to define “works”, which… they haven’t been able to do yet.
I also don't understand how it's even done, like do you just guess until you get the result you kind of want? Then make up a story for your self explaining it?
Ooh, I know this one! "I asked shatGPT to write some code that does X..."
Then they ask that same bullshit-generator to explain the code, or write a test, etc.
Test Driven Development had a fix for this, which I used to do back in day when I was evangelical about the one true way the write software. You wrote a test that failed, and added or wrote code only to make that test pass. Never add any code except to make a failing test pass.
It didn't guarantee 100% correct software, of course, but it prevented you from gaslighting yourself for being too awesome.
I prefer separating writing some code down, making it functionally work on screen and writing tests. I usually cover cases in step 2 but when you add sometime new later it is nice to have step 3.
Is that actually desirable? This article articulates my exact gut feeling.
To be precise, it’s one of the big reasons, but it’s far from the only reason to write the test first.
This means that the time of writing your first test is too late. It’s part of the core business logic architecture – the whiteboard stage.
If you can make it testable, TDD isn’t just good practice – it’s what you want to do because it’s so natural. Similar to how unit tests are already natural when you write hermetic code (like say a string formatter).
If, OTOH, your business logic is inseparable from prod databases, files, networking, current time & time zone, etc, then TDD and tests in general are both cumbersome to write and simultaneously delivers much less value (as in finding errors) per test-case. Controversially, I think that for a spaghetti code application tests are quite useless and are largely ritualistic.
The only way I know how to design such testable systems (or subsystems) is through the “functional core - imperative shell” pattern. Not necessarily religious adherence to “no side effects”, but isolation is a must.
And I'm big on reusability because I'm lazy. If requirements change, I rather tweak than rebuild.
I don't disagree with this and I have found it to be quite true -- though IMO it still has to be said that you can mock / isolate a lot of stuff, system time included. I am guessing you already accounted for that when you said that tests can become cumbersome to write and I agree. But we should still try because there are projects where you can't ever get a truly isolated system to test f.ex. I recently finished a contract where I had to write an server for dispatching SMS jobs to the right per-tenant & per-data-center instances of the actual connected-to-the-telco-network SMS servers; the dev environment was practically useless because the servers there did not emit half the events my application needed to function properly so I had to record the responses from the production servers and use them as mocks in my dev env tests.
Did the test succeed? Sure they did but ultimately gave me almost no confidence. :/
But yeah, anyway, I agree with your premise, I just think that we should still go the extra mile to reduce entropy and chaos as much as we can. Because nobody likes being woken up to fight a fire in production.
Type systems and various forms of static analysis are going to increasingly shape the future of software development, I think. Large software systems especially become practically impossible to work with and impossible to verify and test without types.