original commit: https://github.com/RsyncProject/rsync/commit/d046525de39315d...
```
- if (!ptr)
- ptr = malloc(num * size);
- else if (ptr == do_calloc)
+ if (!ptr || ptr == do_calloc)
ptr = calloc(num, size);
```Written with claude. This is a good example of what slips through LLM attention. It forces all allocations to be calloc as if it is a strict upgrade. For large and recursive allocations, this becomes a significant cost.
reverted in https://github.com/RsyncProject/rsync/commit/7db73ad9a1b8721...
if you read the description of revert half carefully, it's easy to tell that even that was written by an LLM .
I can understand the sentiment of whoever posted the original thread.
That's exactly what I'd expect when someone is excited about AI usage and becomes... well, sloppy.
"Like many developers of open source packages I’ve been hit by a flood of security reports lately in my role as the rsync maintainer. Many of those reports are AI generated (not all though, there are some notable ones with very careful and high quality manual analysis).
As this flood started to get more intense I realised I needed to raise the defences on rsync a lot — we needed much more thorough test suites, code coverage analysis, CI testing on a lot more platforms, deliberate and thorough scanning for possible security issues (so I find at least some of them before other people!) and the addition of a whole lot of defence-in-depth hardening techniques. This is all a huge amount of work. "
I wonder if the data looks worse or better when not doing per-10commit and instead do per-commit.
Start with unsafe then gradually convert into idiomatic Rust.
(My own view: 10.8 GB is nothing these days. Your sprintf buffers are probably larger than that. (And if they aren't: they should be. That, or you should start using snprintf...))
I wouldn't assume Claude made that decision; it's not as if that was some incidental thing that it snuck into a large commit. The commit message starts with "zero all new memory from allocations", and that's exactly what the commit does. What do you imagine the prompt was?
It seems totally plausible to me that a human initially thought this was an improvement, then rethought after discovering the RSS regression. And it's not a law of nature anyway that this change has to increase RSS; calloc could special-case the case in which memory was freshly returned from the OS, knowing fresh memory mappings are zeroed anyway.
I blame AI for these regressions mostly in the sense that it caused a flurry of vulnerability reports. Those led to a flurry of quick fixes. Sometimes quick fixes cause other problems.
> The change to zero memory was my idea and my change. It was a reaction to a security report I got which caused use of an element past the end of an array. By zeroing the allocation I could ensure that misuse of that memory if a similar bug came up in the future could only cause a null ptr deref, which is better than the chance of a valid pointer. It got a claude co-authored tag on it as I got it to do some tidy ups of a series of commits, and that is just what it does when it makes any modification. It doesn't mean the change was written by claude. It was written by me.
https://github.com/RsyncProject/rsync/issues/959#issuecommen...
How does that prevent reading past the end of the buffer? Or change how bytes outside the buffer are used? Are these arrays of pointers so that the “null ptr deref” comment makes sense?
Or am I the bozo and don’t know what’s happening here?
I hope if this doesn't come across as unkind towards the dev who gives their time and energy to the project. Grateful for that.
https://medium.com/@tridge60/rsync-and-outrage-d9849599e5a0
(Disclosure: while I haven't talked with him in years, Tridge was my colleague and mentor for many years. I feel it is worth considering his view before joining a crusade)
I don't entirely understand what this is saying. People wouldn't have been outraged if only the tests had been updated and/or he pushed solely on master - but he pushed breaking changes onto the release branch(es) too. Breaking workflows that have worked for years is a prime way to get people irate, and then seeing "Claude" in the commits just pours gasoline onto the fire.
I think it's pretty sad that he even had to write it. Quite a lot of judgement from people who aren't paying his bills.
- The release with the highest number of attributed bugs is the release _right before_ the first release with Claude-coauthored commits, released in January; is there a chance that unattributed LLM-authored commits made it into this release?
- The release attribution methodology is not great, since it will tend to attribute bugs introduced in a minor version update to the longest-lived patch release of that minor version. I doubt that 3.4.1 actually introduced a lot of bugs, but since it was released a day after 3.4.0, bugs that were introduced in that release get attributed to 3.4.1.
- Relatedly, more recent releases have had less time to have bugs filed against them, so there may be a bit of a bias toward evaluating recent releases as less buggy.
> Here's my favorite part, though. Digging into the data, one of the first things that jumped out at me with blinding clarity was that the worst release, by far, in rsync history was entirely prior to the introduction of Claude ... And yet nobody noticed.
Language really does suggest the article's author does have a dog in this fight and is cloaking opinion in fancy statistics jargon. "Blinding clarity"? All you have to do is draw a plot. And anyway, v3.4.1 was 2025-01-16, technically well within the AI assisted coding era and before attribution was becoming standard practice.
> "Claude clearly made things worse" &emdash; the main claim
This article was clearly generated by AI, yet I found no mention/attribution of that by author.
How likely is it than someone who vibe codes articles would also vibe code the underlying analysis and be eager to accept an outcome that is highly validating of that person’s workflow? I’d say very.
I've seen plenty of code that was LLM generated but the commit message itself did not have the co-author attached to it. This only seems to happen when someone's interface to the codebase is completely though Claude/Codex/..., and those are usually the most verbose commits, and yet they say the least, because they just summarize the code changes, not the why.
On the other hand I've seen developers using Claude as a tool. They have VSCode open and a terminal window with Claude and go back and forth, ensuring they write correct code, and leave the plumbing to Claude.
So maybe the author of the code started off small and it grew over time?
I have been experimenting with both aforementioned styles with interesting results.
It's amusing. It's not terrible, but tests arn't going to save you from a malicious tester.
Which brings me to my overall response, which is that there is absolutely no evidence, and nothing even intimating this hypothesis, that LLM commits were secretly being added to earlier releases before they were attributed, and that's why the rate of bugs is higher. There's no reason to think that it's an unreasonable thing to think, and there's no evidence for that whatsoever unless you beg the question and assume that higher bug counts must automatically indicate AI involvement, which is just circular reasoning. You're essentially just making up a hypothesis out of thin air to preserve your point.
Regarding your third point, that one's fair, but I've done the analysis and I can put it up if you want, as to how long it usually takes to find bugs and how far through the release cycle we are for each version.
Regarding unlabeled LLM-authored commits, I don't think it's unreasonable in general to think that an open-source project might have had unlabeled LLM-authored commits at some point before 2026. Looking more closely at rsync's recent commit history, I think it's less likely in this case. There's just a low number of commits in general, _until_ large batches of Claude-authored commits start showing up early this year. But this then raises some questions about the bugs-per-commit metric; it does correct for something like "size of release", but also obscures a significant shift in commit velocity that may be downstream of adding LLM development tools to the workflow.
Like I said, I don't have a dog in this fight, and I try not to approach sorts of questions from a position of explicit advocacy. I do think it's an interesting question, though, and we should try to understand what the data is actually telling us.
All code is technical debt.
If rsync releases used to have 500 lines changed and 5 bugs in and AI-powered rsync releases have 50000 lines and 500 bugs, it's the same bugs/line but much worse experience for the user?
I've not looked into the details of this case and I do use AI assistance coding at work but in my experience, the problem is that it's too easy to write lots of code and therefore hard to review the huge volumes of code and this analysis will ignore that?
edit: actually your table shows there weren't unusually large numbers of commits in this release, so perhaps my initial skepticism shows a bias I have?
I really think this a much better standard of evidence — limited though it is — to outrage-fueled cherry-picked anecdotes, which is what has been driving this whole thing. If you disagree, and think the outrage should go one when I've shown there's an absence of evidence entirely for it (although of course, that's not evidence of absence; maybe I'll have to eat my words 5 releases down the line, but appealing to that now feels like a Russell's Teapot), would you care to explain why?
This analysis showed that there is indeed an absence of evidence, but it concludes there is evidence of absence.
Traditional p-hacking is done by oversampling and overtesting. If you do 20 analysis on average one will show p < 0.05 by random chance. This analysis is doing the inverse of that. Under-sampling, and concluding with p > 0.05
I tried pretty hard to avoid saying that, can you point me at how to rephrase? The point I'm trying to make is just that there is absolutely no evidence at all for what people are saying with such absolutism and claimed objectivity (that Claude made rsync worse), and thus it doesn't justify the outrage.
> Under-sampling, and concluding with p > 0.05
How would I avoid under-sampling here? And if you're going to say it's because I only have 2 data points, well, the side making the positive claim — that Claude made rsync worse — only had two as well, and unremarkable ones at that, as I've tried very hard to show.
> With a p-value of 74%, the answer is a decisive no. The odds ratio is 1.06 — essentially 1:1. Claude releases are no more likely to be above the median than any other releases.
are problematic in this context as the correct conclusion here is you just don‘t have enough data conclude whether or not you are more likely to encounter a bug after a Claude commit.
> How would I avoid under-sampling here?
You don‘t. You admit that you don’t have enough data and move on. What you are trying to do here is prove a negative, which is extremely hard to do. In your discussion you claim that the users complaining had no right to, however nothing in your analysis showed they were wrong. We simply don‘t have enough data (yet) to say either way. When we have enough data they may be proven right or wrong, but until then, we cannot conclude either way.
If you insist still, I recommend looking into bayesian analysis. Theoretically at least the posterior distribution from a bayesian analysis can be interpreted directly and analyses on its own merits. However I suspect your posterior will have way too much uncertainty to reach any conclusions.
The ELI5 version is that there are two mistakes you can make when looking at a P value:
Type I error, where your P value is falsely low. In the experiment being discussed here, it would lead one to conclude that AI code is worse. Otherwise known as a false positive.
Type II error, where your P value is falsely high, leading you to conclude that AI code is no different. Otherwise known as a false negative.
https://en.wikipedia.org/wiki/Power_(statistics)
One can calculate statistical power for a given experimental protocol.
My hunch is that if you did this, you would find this experiment is grossly under-powered.
This means you can't make the "absence of evidence" claim.
People need to be responsible for code they commit and push anyways. This has never changed. Whether the code is written by hand, by their cat walking over keyboard, or by AI, is not my concern.
A project's code quality can decline for all kinds of reasons. I don't think it's productive to laser-focus on whether it's produced by AI or not. That's a distraction. If a person just want to find excuse to criticize AI, and another person wants to fight back and defend AI, sure, go for it. But that's not how you would want to assess a project's code quality.
So - why bother forking or going upstream? maybe its selfish. I think publishing the patches are cool but I feel less of a need to force other people into doing what I want or even writing every possible configuration or solution. I just hack it for me
Well the GPL (which rsync is licensed under) says: "This program comes with ABSOLUTELY NO WARRANTY" so actually nobody is responsible for anything.
People should be doing this regardless of drama. No reason to provide free advertising for trillion dollar corporations. Generated-by trailers are only relevant when contributing to third party projects, in that case disclosure is polite.
I don't care about the advertising angle. We all know Claude by now. I want some indicator that AI was used.
I don't see a need for an attribution line in this case.
As I said, disclosure is polite when contributing code to third party projects which will undergo human review.
No need for such things in one's own projects.
This can be largely assumed to be true for any open source code. It's kinda the point of open source.
If there's one thing I learned not to do in open source, it's to assume nonsense like that.
Even with coding agents gaining popularity, many humans still look at the code at some point.
why do you so many people want to hide who the real author is?
we should be very weary of anyone claiming they’re the author of something when they’re absolutely not. if jon wrote a book and i take credit, that’s shady as hell.
The tag is helpful because AI authorship is different than the human authorship. When you work with a project or team for long enough you start to trust certain people and their intuition, but when they start submitting AI-produced code you have to reset and review it like AI code.
I use these tools a lot, too. But I want to know where the code came from so I can review it accordingly. The source matters.
> Ostracize us?
I don't know why you're so defensive. If AI wrote the code just be honest about it.
If you outsourced the code writing to some guy named Bob on Fiverr, I'd want to know that too.
Check it out:
https://lobste.rs/s/29pm2f/llm_generated_submissions_should_...
https://lobste.rs/s/ytim7h/collection_small_low_stakes_low_e...
- Sent from my iPhone
— Sent from my iPhone
I use Linux, btw.
Disabling attribution of LLM-generated code is fraud, because you’re saying you wrote the code.
Of course that fits right in with the use of an LLM to generate code in the first place, since what it’s actually doing is regurgitating its inputs stripped of any license and copyright notice.
In academia this is miss-attribution, outside of academia this does not exist.
This is clearly not not copyright infringement either as LLMs do not claim copyright, nor could they. Just like the photograph taken by the monkey, or pictures drawn by crows. LLM output is not a creative work either.
If this is unethical or immoral is a totaly different question. I really dont think so and I dont think you argue that position well.
It also is copyright infringement, because what the LLM “generates” are actually portions of its training set, which were covered by copyright. Just passing through an LLM does not remove that copyright from that work.
Should there by attribution for Google or Stack Overflow copy/paste? Who should we bully about this?
They are in fact committing fraud if they do not attribute the code in their commit properly, because by committing it they’re claiming to have rights by virtue of authorship that they do not have. (Namely, the right to contribute that code to the project,.) They may also be committing copyright infringement, depending on the copyright and license status of some code they found via Google or Stack Overflow.
It’s always fascinating to me to see how many people on Hacker News have such extremely poor understanding of how intellectual property actually works, and how misrepresenting themselves or their work can actually have consequences.
Obviously, and I'm a bit taken aback that anyone thinks otherwise.
Their name being attached to the commit is itself, irrelevant, as their is no way to submit a patch otherwise. You could use a fake name, but you're just moving this fraud problem around.
You're going to have a hard time convincing anyone that using a tool constitutes fraud. Frankly, it's silly, if not genuinely stupid.
Film photographers in the early 2000s routinely called digital "not real photography" and Photoshop "cheating" because you could delete bad shots and fix everything later. Traditional musicians and critics dismissed drum machines, synthesizers, and autotune as soulless tools.
Often this is also spelled out in a project’s contribution guidelines, and some projects have even had more explicit copyright assignment policies they required contributors to agree to, but the lack of such guidelines or assignment policies does not mean the custom as normally observed in the field is irrelevant.
And I guess maybe there's no such thing as bad press but at least in this cases it doesn't seem like effective marketing for Anthropic.
Setting aside the whole AI = bad argument, let's do a metaphor. Tax evasion is bad and unethical and you should call it out where you see it. But wait, that creates an incentive for people to hide it! So I'd better not call it out, it's best to just keep my mouth shut.
This idea that the community can try to pressure an open source maintainers about the tools they use based off of kneejerk political reactions is so offensive.
Let's go the opposite way: "sorry I'm closing this pr because it didn't use an llm."
If I contributed code to an Open Source project behind my old employer’s back, that would have been bad, because that code was owned by them and not me, even if I wrote it on my own time using my own equipment, because of the contract I signed with them.
If I copied code out of an AGPLv3-licensed codebase and contributed it to a BSD-licensed codebase without telling anyone, that would have been bad, because I did not have the right to change the license on that code to BSD (or change the license on the codebase to which I was contributing to AGPLv3).
If you use an LLM to produce code, you may well be doing the latter since an LLM is actually just regurgitating portions of its inputs. This is not a hypothetical scenario; I’ve personally encountered a case of someone using an LLM attempt to contribute code I recognized from a specific Open Source project under one license to another project under a different license, while claiming they “wrote it themselves.”
Any project that accepts contributions needs to take liability seriously and manage their risk appropriately.
Unfortunately, a large number of people are being told—and here, you can see many who believe it—that the output of an LLM either carries no copyright or is copyright by the one prompting it. In other words, even right here on Hacker News it’s widely believed that LLMs “launder” copyright.
But for better or worst I can assure you (for which you have no reason to believe me, just look at the headlines): nearly all tech companies are setting internal goals to have x% of code generated by llms by y date. And speaking as an insider, that x number is very large and that y date is very soon.
And before everyone continues to downvote me because I'm saying things that you don't want to hear, you have to realize that this is the world we live in now.
So, either you're right and the legal entities attached to some of the most powerful tech corporations have just decided to flaunt the law. Or you are missing something, or the game has changed.
Open source projects that want to hide behind provenance as a gate keeper to introduce llm generated code into their code base are going to get smoked.
There's nothing stopping a company like anthropic from funding an open source division that starts forking projects and accelerating the development. Expect 1000x more Buns.
There's nothing stopping an wealthy individual who wants to do that.
When the dust settles, no one is going to be worried about what you've typed here.
And if somehow the ip lawyers and capitalists won, then China will become the tech hub of the world.
Whether it's right or wrong, that is the reality.
You might consider that there is a very large incentive by the large and public players in this market to promote the idea that this is not true, that they consider themselves large and powerful enough to actually flout the law, and that they plan to use the argument that enforcement will be too damaging to the economy to make their view the “new normal.”
This playbook has been run before, by Uber and Lyft, by AirBnB, by Tesla with “FSD,” and so on. It’s very clearly the approach being taken.
I just looked at the list and I have friends that work at most with the exception of United, mkesson, Berkshire and cencora, so either you were at one of those or you're misinformed about your ex employer.
The entire industry for the most part is all in here.
We clearly disagree at an ideological level, for which I will not try to convince you my side is correct.
Instead, I would probably be willing to bet overall maybe 10k USD that your stance is generally not representative of where we end up in 5 years.
Let's make a Polymarket and compete with dollars instead of words (slightly in jest)
Have fun with 1000x more Buns that literally no one is using or maintaining. An entire software industry built on top of a burning garbage pile of crappy, dead code.
No, you might be experiencing online psychosis. No longer able to distinguish between generated text and things you don't agree with.
Consider collecting related thoughts into paragraphs.
And lo and behold, people are losing their collective minds, bridgading my posts, flagging me and demanding credentials.
Do you have any popular open source projects? Or are you just an Internet gremlin?
I was an AI skeptic some months ago but truly Claude and Codex have changed my development style and velocity in a way I never imagined would ever be possible. With that, yes, I produce more code and am finding more bugs.
So looking over at comments in HN articles the amount of polarising hate to anything produced with AI is quite surprising. Just because some AI helped or even produced entirely doesn't suddenly make a project 'vibe coded' as if that's meant to be some insult levelled at users of LLMs.
It reminds me a lot of when offshore outsources started getting more software development work from the mid-90s with all the derogatory remarks made towards 'Indian developers'. Now we're in the mid 2020s and similar remarks are made towards AI.
I don't get it. I really don't. What I do know for sure is more and more code will be AI generated with or without the detractors.
With programming, I've always been in the later: it's a tool that allows me to do what I actually love, which is problem solving, system level thinking, and providing some nice solution to that problem, that happens to be through software.
So, I have an absolute blast with AI, because it helps do the more boring bits. And, seeing my non-programming colleagues get excited to see their vibe coded ideas become reality has been so much fun.
I'm genuinely curious to hear the perspective of someone anti-AI, who works in software. Perhaps the impending doom/skill shift of our profession?
Reviewing vibe-coded PRs and features has been utterly exhausting over the past few months.
I work on critical, mature software - a small change in behaviour can mean data loss or non-compliance with regulations for our customers. The biggest problem with AI PRs is the sheer amount of churn, extra code and lack of intent with the PRs it generates.
The only way I can describe the latter is that an AI-only PR feels to me like a painting where everything is high detail - and you have to comb over each part before you understand why it's there because so much is superfluous. A well written human PR on the other hand, is painted such that your eye naturally follows the thought process of the author so you can just nod along during the review, as if the solution was obvious.
Also when I'm _using_ the agent; at least 50 percent of my time is spent telling it to stop with it's approach so it doesn't go down a useless rabbit hole and waste tokens.
I don't have a good analogy but the immediate one that comes to mind is treating AI like a junior developer that you're mentoring. If you know what you're doing you can iterate quickly; if you don't then its a whole other story.
Claude built me a Markdown editor - I designed it, set coding standards, etc. It coded it to my spec. The output is in my opinion not bad and is very usable (for me - I use it daily now). Probably would have cost me north of $50k to get a team of seasoned devs to build it to the current level of polish. https://github.com/emrul/md
For context search, I find LLM quite useful... still wrong 20% of the time... but it has some utility.
Here is a thought experiment: If "AI" will eventually generate your work, than what actual value do you bring to the table? =3
However, you also get the lowest common salient answer guaranteed, uncopyrightable work (differs from public domain), and potential legal peril from copyright bleed-through.
We are in the golden Napster age of isomorphic plagiarism. =3
People report the same “took a shortcut” issue with AI vibe coding, and I can confirm that I’ve had to rewrite practically everything the AI generated for me, despite using a frontier model dialed up to 11 thinking levels.
Having said that, AI is very useful for other activities like PR review, security vulnerability analysis, typo hunting, reverse engineering, etc.
I’m probably going to have to increase my subscription to the next tier but at the same time I still can’t use any of the code it generates.
If even one person can simultaneously experience "very useful, need to pay more for it" and "useless output code quality" then of course you'd expect a variety of opinions amongst the general user base.
If by fairest you mean to say that this analysis and response is sufficient, then I'm sorry but I have to disagree. We really need to understand if the nature of the bugs are worse from a user's perspective. Even if the rate stayed unchanged, if the result is the perceived quality of the software declined then I would personally consider that worse, especially if I were a project maintainer.
That's not meant to be wholly dismissive either. But in general, I don't think quantitative analysis alone is enough to fully answer this type of question.
* Why was v3.4.1 the most buggy, right before the Claude commits? Why did "nobody notice"? It's way to strange to just say welp, it must be human error. * Why does v3.4.2 have 0 bugs, or 0 bug score. And why was such an outlier (no other commit seemingly has this??) allowed to mix into aggregate statistics and bring all the "is Claude buggy?" scores down. Tbh idk how that _wasn't_ a red flag in the author's analysis...
This article feels like half of an analysis presented as a highly complex finished product due all the advanced stats they're running.
Why wouldn't it be except question begging priors assuming it couldn't be?
> Why does v3.4.2 have 0 bugs, or 0 bug score. And why was such an outlier (no other commit seemingly has this??) allowed to mix into aggregate statistics and bring all the "is Claude buggy?" scores down.
My original metrics which didn't filter out feature requests and questions had it at four bugs and prior to that it was even higher and it didn't make much of a difference to the overall analysis (fell well within the IQR, the lower end of it too). Also, removing one outlier just because it looks kind of funny to you, especially when we only have two Claude releases at all, would be worse in my opinion and more arbitrary.
This would be even harder to measure.
But the reality is that if you were already set enough to call rsync slop because of a single post, you aren't going to be more down now. Even in these responses I see everyone nitpicking and moving goalposts as if one more commit being actually claude-aided will tip the scales from stable project to "vibe coded slop".
Software has always been fuzzy, we have never come up with an objective way to handle software quality, and this Uber hatred of llm contributions lets the humans who make egregious bugs and mistakes off the hook.
Taking a step back, we need to have more empathy and thoughtfulness of one another in this space. Its new and people are experimenting and there will be nothing good coming from personal insults and DDOsing a good project just because someone got ragebaited on threads, x, mastodon or whatever else.
How do we determine bugs and increase quality? Its almost like we have been grappling with this question for decades and I still hear people fight on the best way forward. Simple design, test driven development, user surveys, all of the above have been used as a proxy for software and they all failed to capture everything. Back in the day we used that ambiguity to give each other grace, now we use that ambiguity to tear down other creators. Whatever, if open source software really is dying its because of this toxic shit just as much as the llms
There is no fixed number. Sample size depends mostly on the size of the set you're sampling. Also takes into account desired margin of error and confidence interval.
If your total set has a million items, you need ~16400 samples to draw conclusions with 99% ±1% certainty.
And again, that's kind of the point. There's exactly zero actual evidence, however you slice it, that "Claude broke rsync" except cherry-picked anecdata, and the whole point of my analysis is to demonstrate the total lack of any such trend/evidence at all, and just how in-distribution/normal these releases are, to show that if people hadn't known Claude was involved in them, they wouldn't have remarked on them.
I run a smallish project with ~1k stars and I've stopped maintaining it last year because people feel like they're absolutely owed features or bug-fixes or whatever. It's tiring and a complete shame that author has to make such an insane deep dive into a random accusation that just caught on social media. I want to emphasize that this has nothing to do with AI, it's just tech tourists, consumers (as opposed to creators), and engagement farmers that have taken over. AI slop probably doesn't help, but the underlying issue has been brewing for at least a decade.
Also, the "making soup for the homeless & pissing in it" is not only an off-base analogy (software is pretty low on Maslow’s Hierarchy of Needs), but also somehow looks down on both people in need and the volunteers that help them. Just absolutely gross.
Agreed, and similarly, as a hobbyist programmer who loves Rust and Go, I've always felt that the people who command others to "rewrite it in xyz" are not themselves developers, they're "ideas people." There's a mass of these people whose main interactions with the world are through the dramatic forcing of their correct opinions.
> I run a smallish project with ~1k stars and I've stopped maintaining it last year because people feel like they're absolutely owed features or bug-fixes or whatever.
That's a bummer and it's something I'm fearful of. I post some code on my website, not on a github type site, and don't interact with people about it. It's nice and plenty of people do it. Is that something you'd consider?
Bugs per commit as a metric papers over severity, both in terms of security severity as well as the effect on the user. A mislabeled button has the same weight as the entire app crashing in this framework.
It is the exact metric you'd choose if you wanted to make the current situation of rsync look like not a big deal.
[0] https://github.com/RsyncProject/rsync/graphs/commit-activity
Not sure if this is mentioned somewhere else, but looks like the maintainer has a blog post that addresses this: https://medium.com/@tridge60/rsync-and-outrage-d9849599e5a0
Of interest is this post here: https://github.com/RsyncProject/rsync/issues/929#issuecommen... which echos the same concern which was raised up thread, however, I failed to find the maintainers’ response.
EDIT: Found it! it is in the (untitled) discussion section (after the results).
https://lobste.rs/s/k1b0za/rsync_outrage#c_2iowov
EDIT 2 (and advice on design): The page design changes backgrounds after the results sections, which kind of conveys to the user that they have reached the end of what was is important and can just skim over the rest (usually pages have a radical change in typography like these when you’ve reached the comment section), however this is what is analogous to a discussion in a typical paper, and is arguably the most important part. I had simply assumed that you just left it at the result and skipped the discussion as a stylistic choice.
I also paraphrase Tridge himself explicitly saying that this is why commits/releases have increased:
> Essentially, this isn't a "Claude" problem, it's a "more security work" problem, something that Tridge himself confirmed in his response, describing how a flood of AI-generated CVE reports forced rapid, extensive changes to rsync's attack surface.
> The page design changes backgrounds after the results sections, which kind of conveys to the user that they have reached the end of what was is important and can just skim over the rest (usually pages have a radical change in typography like these when you’ve reached the comment section), however this is what is analogous to a discussion in a typical paper, and is arguably the most important part. I had simply assumed that you just left it at the result and skipped the discussion as a stylistic choice.
Good point, I assumed everyone would read till the end, that's on me. I'll give it a heading.
Why is it that some unfounded claim is made and the onus is suddenly on the project maintainer to prove it beyond all doubt?
It should be on the person making the claim to prove it
So my systems recently updated to rsync 3.4.3, and as soon as that happened my backup system - which does incremental backups using multiple --compare-dest= arguments - started to fail on anything but a full backup.
Incremental backups is perhaps the primary use of rsync, and they were broken for this person. That's pretty severe.The second reply is similar:
i wondered why my 3d printers were running like sh*t and at 100% cpu; turns out log2ram uses rsync.
This one I took with a grain of salt, since it read more like a dogpile than an actual bug report. However, if it's genuine, it's also reasonably severe.Later in the comments, someone attempted to provide a list of issues that had been added: https://github.com/RsyncProject/rsync/issues/929#issuecommen.... The list included several failures to build or run rsync that appear to have resulted from broken backward compatibility. That seems reasonably severe. If intentional, I would have expected mention in the release notes about the removal of backwards compatibility, but none was made.
The issue comments already degraded into a lot of unnecessary vitriol even before the above mentioned comment and only gets worse from there, so I stopped. But, the fact remains that the whole issue started with a severe bug.
I applaud the attempt at dispassionately analyzing whether the recent LLM releases of rsync were normal or outliers as far as bugs are concerned, but I don't think you can do so properly without analyzing severity.
"A lot of claims in the wider discussion have treated every recent bug report as if it had the same cause. That is not accurate. Some reports were regressions from recent security hardening, some were missing historical test coverage, some were older bugs found because rsync suddenly had more eyes on it (especially by AI that can find issues quickly) and some were packaging or environment-specific failures. A Co-authored-by line is not enough by itself to establish root cause." - https://github.com/RsyncProject/rsync/issues/929#issuecommen...
I think it will be up to some group in academia to make a real full blown study across several repositories.
There must be tons to learn on how LLMs have changed software development and perhaps the cleanest separation will simply be going by what repositories declare e.g. "No LLM involved" vs those that proudly do the opposite or are neutral.
Bugs is not the only variable of interest here. I am guessing someone is already doing this as we discuss it here...
Your verbosity and sentence structure are not a problem. I hope that publishing this gives you a bit more confidence in your writing, because it's legitimately good.
So the criticism was bad, and that somehow makes it ok to use a bad metric?
I come to hn because I get very nuanced, informed information and glorious puns.
Hey, 'logicprog, your writing is fine!
Use LLMs to critique your writing, check its structure, vet your choice of topic sentences, check flow from graf to graf and section to section, look for passive voice and overused words. LLMs are fantastic for that. But don't use a single word an LLM suggests in your actual writing. If it suggests something really fucking good, too bad, those words are disqualified. It's an easy red line to adhere to, easier than it sounds, and it'll keep your writing human.
(You ended up somewhere around here anyways, but that was after you posted something with LLM-written language because you weren't confident enough in your own writing. The things you do "worse" than an LLM are what make you you; be protective of them!)
It's open source, no one is forcing you to use it.
If you don't trust the newer versions; use the old versions.
If you no longer like the maintainer because of reasons, fork it/start your own.
It's not that hard.
Storm in a teacup.
Is this a configuration that's not common and thus not tested?
If people think they can do better, I want to see their forks and them keeping up with it.
https://github.com/RsyncProject/rsync/graphs/contributors?fr...
Can someone explain why one would ever use rsync (pre vibecode version) instead of cp and dd?
Can't we just 'apt remove rsync' and save ourselves the time even spent on evaluating this dependency?
Thanks
Instead we have a shitstorm over presumably legit issue, for which the only source is some mastodon post.
One command that used to work in 3.4.1 and stopped working in 3.4.3. Just one! We could have already bisected the living shit out of this and go home, but no.
> v3.4.3 has been out long enough that its rate (5.00) is already comparable to historical releases. The "wait and see" argument is an appeal to an unknowable future that shifts the burden of proof away from the critics. If more bugs surface, they will enter the distribution like every other release. There is no reason to expect a regime break.
I mean, as someone who uses LLMs, it might be a good idea to consider how one might limit the amount of bugs that will appear in the future at least a little bit: parallel iterative code review loops would probably be the easiest and most applicable to LLMs, though I guess test coverage and other code analysis tools help too.
So what? You've saved a significant amount of time for a decent number of humans, and if those humans are working on other projects, the overall net output for the world is net positive compared to without LLMs.
You have to broaden your perspective. It's not just about how rsync was affected.
> ok, so I was wrong and badly, but I will double down and say I was right anyway
Also if you write a paper where you get statistical conclusions out of whole 2 datapoints you'd be laughed out of the room
Especially since if the earlier commits were so clearly AI authored yet without the Claude marker, surely you or anyone would be able to spot them. You could say, X commit does not have the Claude commit marker yet was AI written. But for all the speculation on this thread, I haven’t seen anyone actually doing that. What may be possible is that the rsync maintainers used AI to assist yet reviewed and edited themselves, as many devs do, and if so then the stats in this article are still notable: there are no poor quality outliers that can reliably be attributed to AI and if one specific release (3.4.0) was, the subsequent releases which presumably also had as much AI as this speculative hidden AI release only show improvement and thus act as a pro-AI argument.
The blog has many more datapoints than two. It compares many releases. You’re looking at 2-vs, not 2.
I'm using methods appropriate to that low amount of data, first of all. Second of all, since I'm only trying to show there's no evidence for the anti-AI hypothesis (not disprove it, or prove the null hypothesis), that's sufficient in itself. Also, I wonder why nobody said things like you're saying ("there's too little data to tell") in response to all the absolutist claims that AI caused rsync to get worse?
> The fact last few commits were attributed to claude doesn't mean previous ones didn't use it.
At this point, you're just positing Russel's Teapot: you'll keep assuming more and more of the code was "secretly" Claude when there's no evidence for it and no reason to think so, just because you've started with the assumption that Claude makes things worse and you want to find a way to prove it.
You can write for an audience or you can write for yourself. Which is fine either way but you shouldn't pass the blame for bad results on to your audience.
> and recieving almost no substantive input, discussion, or response on the actual content of the article
Well did you write it for that purpose?
> "Just wait, more bugs will surface" -- v3.4.3 has been out long enough
Wait for _more releases_. As your own data shows the bug rate is not consistent between releases. So this is probably not a worthwhile metric. Perhaps systems touched, new features included, or attempted fixes would be a better way to contextualize releases and the goals of the author.
What followed was extraordinary: 329 comments and counting, ranging from thoughtful concern to outright harassment.
The thread did not stop at words. One user posted My Little Pony drawings of themselves strangling the "project janitor that pushed vibecoded commits":
It spread to Hacker News and Lobsters, generating hundreds more comments.
This is false, it did not appear on Lobsters. Here is the function in the codebase that prohibits this kind of brigading: https://github.com/lobsters/lobsters/blob/main/app/models/st...Please correct your article.
> On Lobste.rs, in response to the Medium essay Tridge himself posted in response, finally some users like boramalper begin to actually ask for evidence one way or another:
"My honest assessment is that this is a competent calculation performed on a badly confounded measurement, followed by conclusions substantially stronger than the calculation warrants. It is useful as a rebuttal to “the Claude releases are obviously unprecedented disasters,” but not as evidence that Claude was harmless."
[see https://news.ycombinator.com/item?id=48416020 for how all this happened in the first place]
- I used GLM 5.1 to help with the coding and math for this.
- However, I explicitly dictated where the data should be pulled from (GitHub, Bugzilla, mailing list), how it should be tagged and grouped, and what data to look at (e.g. bugs instead of regressions)
- Additionally, I consulted with my wife, who has a master's degree in statistics from Penn State University for what sort of statistical methodology would be justified for this very limited data set, while still giving as much information as possible.
- I know the website looks like we stereotypically consider vibe-coded websites to look, but I actually explicitly asked for that. The original HTML design looked like a website from 1995, and I just prefer how this looks. It's pretty!
> A simple distributional analysis of every rsync release with bug data. No model. No assumptions. Just placement.
Heck, I use LLM assistance for coding and I’ve even coded up whole features with the clankers, but giving it the right to speak for me is too much.
I should also add that I read and understand every line of clanker output that I publish for others, so I’m not a vibe coder either, just adhd.
So your statement betrays a significant misunderstanding - there is no neat clean divide between style and content.
Also, LLMs often generate text that is plausible, but wrong, in ways big and small.
> Also, LLMs often generate text that is plausible, but wrong, in ways big and small.
So do humans. Always have, always will.
Poor prose does not just make writing ugly — it creates friction, obscures nuance, and introduces ambiguity.
You can eat a gourmet meal out of a dirty paper bowl. You still get the calories, but the delivery mechanism definitely impacts the experience and the perceived value of the food. Same food, different response.
See? I can write slop too, I don't even need to burn down a forest to do it. If you are OK with every fucking thing being written exactly like this, good for you. I am not.
I waited a minute to make sure you weren't going to delete this post because frankly, if I had written it, I would have. Guess not, so... Here goes.
No. It is not the fault of my "attitude" that the Internet is going to suck. That is a complete reversal of the reality. The fact that even people without bad intent are already spreading slop everywhere should be enough evidence to essentially prove that there was never any hope. If this is what good actors are doing, what exactly do you expect from bad actors?
Also, to stress it yet again, I don't care if people use LLMs in general. I'll even say that I don't particularly care very much if people use them without disclosing it in most cases. If you're using it like a normal tool and not merely just dumping the output verbatim there is not any particular need to disclose it any more than you'd disclose other tools, though I think people would prefer if you did just for transparency.
My chief complaint is just how bad LLM slop writing is. It simply is not good at all. It would literally be much better for the Internet if they weren't so turboshit at writing. There is almost no writing style I don't prefer over garbage LLM writing. I'm dead serious. Early LLMs were worse at almost everything else, but they were a lot better at writing for sure. Something went wrong somewhere.
But I do also believe that it is inherently bad to dump prose as-if you are communicating as a human, but said prose isn't actually written by a human. If someone shows me a cool drawing that they made, that means that they sat there and went through the process of sketching, possibly multiple drafts, inking, coloring/shading/painting/etc. to create an expression. This involves many human skills that take years to hone, and every detail carries someone's explicit intention. I think that this is cool, and shows a great degree of skill and effort.
When you, of course, generate some crap from an image generator, it may very well look similar. It may emulate some actual defects that make it look like someone really drew it. But someone didn't. A model went directly from a text prompt and dumped out pixels on screen. No sketching. No layers. No thought processes about how to frame things or what details to include. That doesn't mean zero effort went in: I'm sure in many cases someone sat around and fudged with LoRas and inpainting for a couple hours and pulled the slot machine lever to get good seeds and etc. That doesn't mean that an AI model does not have some model for how to structure an appealing image: it does, that's obviously why the results can look decent to begin with. But when you dump out an image from an image generator and you wink wink nudge nudge present it as your own and people evaluate it as if you drew it, this is basically fraud. Everyone looking at it who doesn't know it is AI generated actually believes you went through the normal effort of drawing that image and all of the years of practicing skills and acquiring knowledge that takes. That's bullshit, and it takes away from the actual accomplishments of people who put in the work like cheating in sports does.
Like yeah, a lot of people are cheating at chess, by passing off engine play as their own, but does that really make it okay? When the entire point is using your brain and not just the raw outputs themselves, doesn't that hit you as a problem?
For generative AI, I personally draw this line at what I feel are expressions of creativity. If you use AI for drawing references, whatever. If you use AI to generate globs of repetitive code, whatever. Code can be creative but I do not view it as an expression of creativity and almost any tool is fair game. If you are using ML models for motion capture or some other data processing thing where humans had to do repetitive work before, whatever. Maybe these tools sometimes do devalue the work, but the LLMs are not doing the interesting part here, they're doing the boring part. (This is, in part, an admission that actually writing code is often pretty boring in and of itself, something that I realize programmers have been inconsistent with in an attempt to justify their value. But, I still believe it to be true.)
So okay fine. People are reluctant to disclose that they used AI to generate text because they fear the backlash that it will get them. This is understandable. What upsets me about this is that well-meaning people are apparently falling back to the idea that because LLM backlash is strong, what would be better than either trying to just simply write your own damn posts or be honest about your usage of LLMs... Is to just try to wink wink nudge nudge pass off more or less verbatim LLM writing as if it's a post that you wrote.
I am not ruining the Internet. There is literally nothing I or any group of angry mobs could do that would even remotely slow down the decay of the Internet even if we desperately wanted to.
So in fact, I'm not even trying to not ruin the Internet. I don't particularly care if my attitude is not helping or hurting. I'm not having an attitude as part of some grand strategy to save or destroy the internet. I'm having an attitude, because I am pissed off.
And I am pissed off because I am tired of reading posts the author probably only skimmed themselves.
At the time, I found this a bit irritating, but with a few weeks time I see the merit. The informational content tends to fall into “derivative” territory when LLM’s write stuff. And people are here for novelty and some socialization.
Also LLM prose seems optimized for engagement rather than concise communication. Takes longer to sift through linguistic boilerplate to get to the point. (The quoted bit being a case in point)
And while the comments are always flooded with people like me, the upvotes seem to tell a different story; clearly LLM writing really does appeal to some people. Or idk, maybe a lot of people who vote on stories and don't comment don't actually read them. Hard to say for sure.
(I need a better model to translate from llmese.)
The author provides evidence to the contrary and the HNers won't even engage with it instead just talking about the writing of the article in classic HN bikeshedding fashion.
How about after that we talk about the formatting of the website and the colors?
This site is really going down hill
Where is the accountability for your own opinions?
Are you guys only upvoting things that confirm your existing gripes?
It would be preferable if someone would seed a better discussion by engaging with the article's claims/observations.
Is that the kind of low effort posts we want around here? Just a link to a github comment of a screenshot?
You're complicit here in fueling the harassment of an open source project
Even if you're right, though, you shouldn't be posting comments that break the site guidelines.
People opening issues just to rant against an open source project is acceptable content for HN?
How is that even allowed in the first place without getting flagged/removed?
And every time that happens the project gets brigaded from HN users
> After posting this on Hacker News and recieving almost no substantive input, discussion, or response on the actual content of the article, I decided to rewrite all of the prose in my own voice.
I've therefore turned off the flags and hopefully people can actually now discuss the claims/findings being reported.
Soo... it didn't just sound like genai but was genai?
___
Huh. From the article:
> If anyone complains about my verbosity or sentence structure — as they usually do, which is the reason I originally let the AI write the prose, among other reasons obsoleted by templating — they can go fuck themselves.
This is kinda sad, honestly. But also should show the author that doing what people try to bully you into doing will not stop them from bullying you.
Just stick with your unique voice man. If people don't want to read that that's fine. They do not have to. You're fine
.. what are those em-dashes doing there though?
I agree that it will be interesting to see how this develops going forward. One can imagine wildly varying scenarios.
Why should I care? If it's a good thought, chances are it appears without slop around it. If it doesn't re-appear, life will still go on regardless.
No need to shift through noise just to avoid FOMO.
You're literally doing exactly the bullying I was trying to avoid, even while denouncing it. I like em-dashes. I have AuDHD, and they help me represent how I think.
Uhm, no. Really just no. And, frankly, I find it shameful that you'd throw such an accusation at me.
But I guess we can stop here.
Idk man. The internet can be a bit too much sometimes. I truly get that, but this was too much from your side.
Wish you all the best.
If someone gives them shit about their writing, that's on the critic for being shitty. If they use AI to write, that's on them for being fake. But, to write online at all requires being ready to have people be shitty to you and ideally not reacting in a way that makes the situation worse. Sounds like they need work on that part.
Anyway it is basically always possible for someone to find something legitimately bad about anything a person does. The question is, how much of an issue is that? Not much actually. So you have flaws. Fine, just be flawed. It had no affect on your life beyond your reaction to the attack. And putting aside that reaction is a prerequisite for learning anything useful (or discerning that there is nothing to learn) from the experience.
Good people will trust good intentions through the flaws, while shitty people will write off your work and your intentions because of the flaws (and try to make sure you feel bad about it in the process). But it's always they're too weak to express disagreement maturely, or sometimes because they're bitter and threatened by your good intentions directly. Either way, it's their flaw, not yours.
"No these are fine, now look over there!! <lotsoftext>"
Pay no attention to the man behind the curtain?
"Claude, rewrite all of the prose in my own voice."
The funny part is that it probably works.
If you want me to read your analysis, you are going to have to make it not read like Claude wrote it. What does "placement" even mean here?
The use of "regime shift" is what gave it away for me. I've never seen a human write that, but Claude does from time to time.
At least they removed occurrences of "load-bearing".
If you don't want to read the LLM prose, you can just go to the GitHub of my project, grab the scripts, and run the full pipeline. It will gather the data, build the database, and run the analysis from scratch for you, and you can look at the numbers directly. It's all repeatable.
LLM output has conditioned in me a near reflex response to just close a tab as soon as I smell LLM-authored text. Like, I'm not mad or anything, I just frequently find most default LLM-voiced text very unpleasant to read so I just don't continue reading.
Also, it wasn't written by Claude FWIW, GLM 5.1.
Of course this is a bigger problem, as its now harder to distinguish content that is "AI slop" with "content co-authored with AI that is carefully reviewed" with a quick glimpse, and the "AI smell" is quite off-putting. My initial reaction was also negative, but after glimpsing it through and reading the summaries, I found it decent summary, which also... speaks of this thread, of the content of the blog post and everything about the discussion and the strong feelings people have developed around the use of LLMs.
Anyhow, it would be good to disclose the repo with the code for the statistics & use of LLM in the writing right up front. Which model, and why it was used to do the writing, etc. Its enough to say "I think it writes better than I do" or "I was in a hurry, sorry" or what ever, but it really should be disclosed. It reads more honest.
ps. really... that sideways scroll? plz fix it.
The problem I see is that this is indistinguishable to a reader at a glance.
Distancing the writing from the "AI smell" not only improves the quality by dropping the unnecessary ocean of rhetorical devices, it forces the human to have real weight and agency on what's being said.
I think that act of distancing from raw LLM output through refinement is a huge quality leap. Even if you're only doing the refinement with an LLM, it forces the writing to have more voice and ideas from the author.
I can see the work that went into the analysis here but again, as a casual reader, it's impossible to tell that there were any original ideas here expressed by the author.
If OP had said "here's an AI summary of the data" and generated a conscise summary, I think I would fine with it. But default AI writing is really verbose -- the opposite of a compression algorithm, spewing out cliched phrases that don't add information. It's exhausting to read, and it lacks the interesting noise of a human response.
Please, why can't people write stuff by hand themselves any more? It's a good analysis but how can I trust it without reviewing everything myself?!
At this point we're all used to skimming through thousands of AI-generated sentences every working day and constantly thinking "this is likely to be 20% bullshit", it's hard to turn that off even if I try.
This is low-quality--every single day I witness Codex and Claude misunderstand, mislead, and hallucinate responses based on "assumptions" and I have to fact-check them.
If I wanted a statistical analysis and to be the human in the loop, I would ask the LLM myself, and I would definitely NOT read an article that just dumps the LLM output as-is.
(Also, I suggest clearly acknowledging where AI was/wasn’t used. I like CuriosityC’s suggestion: https://news.ycombinator.com/item?id=48411968)
You didn't care enough to make a good writeup, why should we believe that you cared enough to make a good analysis?
I am pretty insensitive to AI writing. I have never commented before about something sounding like AI, because mostly I don't notice. But this was so over the top that I spent the whole article trying to decide whether it was an intentional parody of AI writing style.
This article's language is not en-US. It's not en-BR. It's en-SLOP.
Yes, that was my clumsy attempt at AI parody. Here's another: this article doesn't just have AI tells. It is AI tells.
Every sentence is saturated with AI style. Perhaps the author so AI-indoctrinated that they can't see this? It doesn't read as even vaguely plausible human writing. Which is mightily ironic given the thesis of "AI generated stuff is just fine, m'kay?" The writing style does more to defeat its conclusion than the analysis itself.
As for the substance of the analysis, it seems pretty good to me but I see some flaws that weaken it a bit.
The presence of "The Outlier Nobody Noticed" proves nothing and deserves no more than a passing mention. A random release introduced way more bugs than the Claude-containing releases. That provides evidence that Claude doesn't introduce more bugs only if your hypothesis is a very naive "AI is the only thing that can ever increase bug introduction rates."
The whole analysis has very limited data. It's necessarily based off a single pair of releases at the very end of the chronological timeline. You would never be able to reject a null hypothesis based only on that, so it's even less sound to present it as proving the null hypothesis. (By the same token, it would be incorrect for critics to claim that it proves their point. Did anyone claim this, though? The heated complaints seemed more based on priors about AI code.)
"The critics' claim is a simple comparison: did the rate go up?" That's reductive. For one, these releases are known to be in reaction to a flood of (AI-discovered!) security reports, which is a novel situation and in fact is a huge confound to anyone arguing about what those two releases mean -- they're both heavily AI-written, but in response to an unusual situation. When the samples are only drawn from a distinct scenario, statistic analysis can only speak to the quality of code in that scenario.
Also, another reasonable hypothesis could be: AI-written code has bugs of a different flavor that bothers users more. It's optimized for passing tests and convincing people and AIs that security holes are closed, which means other considerations like preserving functionality can more easily be regressed as compared to if humans were doing it. (If true, it still doesn't support the claim that depending on AI code is a catastrophe, fwiw.)
I'm not arguing the conclusion is wrong. I'm saying the analysis proves far less than it claims to. As for whether it's a debacle for rsync to become dependent on AI code generation, I think that's a reasonable debate to have but it's not going to be resolved this reductively.
It does not statistically prove anything, but as I thought I made extremely clear in the card where I discuss it, the point of bringing it up is different: to prove the hypocrisy of the anti-AI crowd.
> By the same token, it would be incorrect for critics to claim that it proves their point. Did anyone claim this, though? The heated complaints seemed more based on priors about AI code.
The entire outrage is because people noticed what they thought was an unusual number of bugs and/or regressions in the release, saw it had Claude in it, and assumed a causal link, not just "priors about AI code."
> You would never be able to reject a null hypothesis based only on that, so it's even less sound to present it as proving the null hypothesis.
The point I'm trying to make is that there is no evidence, based on these two releases, to think Claude made anything worse, whatsoever, and so the outrage is unfounded. This doesn't require me to prove Claude didn't cause any problems. If I ever made the latter claim, I should clean that up.
> It's optimized for passing tests and convincing people and AIs that security holes are closed, which means other considerations like preserving functionality can more easily be regressed as compared to if humans were doing it.
Tridge actually explicitly says he made that tradeoff on purpose, not the AI.
> Every sentence is saturated with AI style. Perhaps the author so AI-indoctrinated that they can't see this? It doesn't read as even vaguely plausible human writing. Which is mightily ironic given the thesis of "AI generated stuff is just fine, m'kay?" The writing style does more to defeat its conclusion than the analysis itself.
I've since rewritten nearly 100% of the prose in the analysis with my own, more inflammatory and verbose style. I also intentionally left in my natural mispellings and typos, to prove it was me.
> I've since rewritten nearly 100% of the prose in the analysis with my own, more inflammatory and verbose style. I also intentionally left in my natural mispellings and typos, to prove it was me.
Thank you thank you thank you. I would love to be able to describe how hard it was for me to think about the actual evidence you're presenting when reading about it through the AI writing, but I suspect it's one of those things where it bothers you or it doesn't. If you'd like to empathize, maybe I'll give it one try: imagine an otherwise solid PhD thesis written in crayon. The facts and evidence and reasoning are unaffected, but it's just so hard to take it seriously.
Anyway, with the rewrite I don't have to battle my kneejerk reactivity nearly as much.
I'm no expert like she is, but based on what I know, I agree with your wife on the statistics. That style of analysis is going to be the best you can do with the data available. It's an accepted way to stretch data without being too dependent on an assumed distribution. It's a good analysis. I still don't come away with the conclusion that concerns about AI code maintenance are necessarily overblown, but that's fine. I think your analysis project is a very solid contribution, and it's a hell of a lot more evidence-based than the rants people were posting.
Yes, it did. Here is some math showing that you shouldn’t care about that.
If I’m hiring and I see this kind of slop, I ain’t hiring you.
So far it reintroduced several security issues and replaced the README.md.
It's not a fork, but it's 8 years old, and is already shipped by default in OpenBSD and macOS.
> As to all the people saying “I’m going to package openrsync for platform XXX and we’ll use that!”. I find that rather amusing. If you do decide to go down that path I’d suggest you try the new rsync test suite on openrsync if you can stomach something that an AI has helped write. I tried it today and openrsync currently fails 85 of 98 tests, so I’m sure it won’t take you long to get it up to speed. You run it like this “./runtests.py — rsync-bin=../openrsync/openrsync — use-tcp”. Admittedly a lot of the failures are just features openrsync doesn’t have, but still, it’s not a great result.
"Cars are just a tool. The drivers who piloted the vehicles and weren't careful enough [are responsible for the deaths.]"
The unsolicited security reports are the issue.
$ apt-cache policy rsync | grep Installed
Installed: 3.4.1+ds1-7ubuntu0.2
$ sudo apt-mark hold rsync
rsync set on hold.As usual, Ubuntu backported fixes and didn't upgrade to a new version. Whether or not they also backported regressions in edge cases that afflict the latest rsync, I don't know. Pinning the Ubuntu package may prevent getting further regressions, but is preventing you getting any future such backported security fixes.
This is a terrible argument; I didn't need to have had secrets exfiltrated before applying row-hammer mitigations. If rsync is the cornerstone of my backup strategy, and has been for years, I need to trust that on its correctness, and for it to not lose my data. If I wait until I "face any actual bugs or regressions" - that will be far too late.
Stability is another issue not discussed. If the error rate holds steady, but number of significant PRs merged per release goes up from 5 to 200, that would be huge net-negative for my use case.
I didn't have the time to actually think about any "arguments" at all tbh it's just a knee jerk reaction as I get ready to log off for the weekend. Not actually looking to argument for or against your post at all lol.
- All analysis is contingent.
- How do you know the conclusion was premotivated, and does it matter if the analysis, which is attempting to be as objective and extremely reproducible as possible, holds up?
- The whole point is that there's no actual evidence for what you are claiming, so why does it being highly-contingent cause a problem for me, when that just further shows there's no evidence for what the anti-AI crowd is saying?
- Why do the anti-AI crowd get to state wide, absolute, objective claims with cherry-picked anecdotes as their only evidence, but the pro-AI crowd is not allowed to respond the same way, and when we then go out of our way to respond in a far more thorough, rigorous, and objective way than you ever did, that's just more evidence for our guilt? It's a Kafka trap. You can't win.
Good for you. I really mean that. I think people are winding you up in this thread, but keep your cool, and I admire publicly crediting and being proud of your wife. That’s a healthy relationship. Good for you.
Do you genuinely believe an article written by AI defending itself is going to convince anyone who wasn't already on your side? All you're doing is giving more fuel to the "anti-AI crowd" you hate so much.
> Your analysis was so thorough, rigorous, and objective, that you couldn't be bothered to write it yourself. Do you genuinely believe an article written by AI defending itself is going to convince anyone who wasn't already on your side?
Except that I did. I spend days comparing and manually deciding on metrics and methodology – I did not use the AI to decide what I would do or how I would do it, so it is not "the AI defending itself" — then refining things, adding more angles to analyze, and, as I literally say in the opening section, I rewrote all the prose in the entire document just to satisfy critics like you. That sounds like "could be bothered" to me. But people like you will never be satisfied.
Also, even if I hadn't done all that work, that wouldn't make it not rigorous (it clearly is) or objective (it is as objective as it can be with so little data). You're bikeshedding to avoid the point.
This statement is honestly so ridiculous that I felt it didn't warrant a direct response, but here's one anyway: AI enthusiasts have been proudly proclaiming for literal years that AI makes them 10x as productive based on cherry-picked anecdotes with zero empirical evidence to back it up. It's way, way too late to claim hypocrisy here. As I stated under the original submission about this topic, irrational anti-AI behavior is usually just an equal and opposite reaction to irrational pro-AI behavior.
> I rewrote all the prose in the entire document just to satisfy critics like you.
And that doesn't help. If anything, editing the AI output to make it read less like blatant slop just comes off as deceptive, like you're trying to hide the fact that the analysis was AI generated. Looking at the commits, you were adding more AI generated text less than 2 hours ago[0] before quickly editing out one of the most blatantly sloppy sentences I've ever read[1].
Regardless, the final contents of the article are not the main issue. Even if we ignore the bias clearly on display there, the premise alone is enough to dismiss the entire thing as heavily biased and chasing a pre-determined conclusion - of course someone who is so dependent and trustful of AI that they decide such an analysis on the bugginess of AI code should itself be written by AI is going to steer the conclusion towards "actually AI code is good and you luddites are overreacting". The entire concept is so tone-deaf that failing to notice it or predict the criticism before publishing is enough to prove the bias.
[0] https://github.com/alexispurslane/rsync-analysis/commit/e029...
[1] https://github.com/alexispurslane/rsync-analysis/commit/740b...
> This statement is honestly so ridiculous that I felt it didn't warrant a direct response, but here's one anyway: AI enthusiasts have been proudly proclaiming for literal years that AI makes them 10x as productive based on cherry-picked anecdotes with zero empirical evidence to back it up.
Let's go back to remedial classes on this one.
"I have found that [tool] has made me more effective" is what we call lived experience. It is an "i" statement communicating something about the person’s life. It does not require evidence by default, and you are a crazy person if you call bullshit without good reason, because many "I" statements are epistemically justified in ways that can't be empirically demonstrated or require tacit knowledge.
"[tool] has been buggier since [change]" is a falsifiable claim; you need to actually provide evidence for believing it, and what I'm showing is literally that there isn't any.
I'm talking about the double standard on the anti-AI side about what evidence should count, not some vague industry-wide epistemic standard, whatever that means. I'm aware LinkedIn Lunatics and Steve Yegge are also being crazy. And it seems to me that even your response here is engaging in a bit of a double standard, or something akin to it, in that you think the irrational anti-AI behavior should be given a pass — and the conclusions perhaps even taken seriously — just because pro-AI people did it too.
> And that doesn't help. If anything, editing the AI output to make it read less like blatant slop just comes off as deceptive, like you're trying to hide the fact that the analysis was AI generated.
Okay, so, if I don't spend the time to write everything myself, that's bad because it's AI slop. If I do rewrite everything myself, then it's evidence of deceptiveness... despite being asked by multiple people to do that, and being extremely explicit about my methods and process and the commit history being (as you've shown), very public.
Also, the AI-generatedness of the text doesn't mean the analysis is AI generated, in terms of what was actually done. That's a category error.
> Looking at the commits, you were adding more AI generated text less than 2 hours ago[0] before quickly editing out one of the most blatantly sloppy sentences I've ever read[1].
The second commit literally says that that was my prose it was fucking with by adding slop. It's just that me adding my prose, and it adding slop to it, were in the same previous commit. Additionally, my process is often giving it exactly what I want to say, more or less, and having it HTML-format it and insert the templated numbers and UI widgets around that text.
But again, even if I'm spending the time to read through and edit everything it's writing to de-slop it, then I'm clearly also reading it through enough to make sure the analysis makes sense, and is accurate; how is that not enough "effort" for you, if effort is supposed to be a proxy for verification?
> Even if we ignore the bias clearly on display there, the premise alone is enough to dismiss the entire thing as heavily biased and chasing a pre-determined conclusion - of course someone who is so dependent and trustful of AI that they decide such an analysis on the bugginess of AI code should itself be written by AI is going to steer the conclusion towards "actually AI code is good and you luddites are overreacting".
That's not ignoring the bias, that's literally restating that you think the bias is there. But if you really think that my bias meaningfully "steered the results," then show me how that happened. Tell me how you would've proven the Claude releases were meaningfully worse, or unusual, at all, or how the methods I chose biased the data against that result, or literally anything except shifting the goalposts and using accusations of "bias" as a get-out-of-jail-free-card.
> The entire concept is so tone-deaf that failing to notice it or predict the criticism before publishing is enough to prove the bias.
And you're so committed to your preconceived notions that anything made with AI must be bad, wrong, or not worth your time, that you'll spend your entire time begging the question ("it's made with AI, therefore it's wrong") and shifting the goalposts instead of engaging meaningfully.
Also, I certainly predicted the criticism (in general, anyway, to the fact that it was made with AI; not the prose being AI) but I made it this way anyway, because if someone is so AI-blinded that they can't read and evaluate the actual metrics, methodology, and provide meaningful criticism to it, and instead can only see that it was made with AI, and they're so it doesn't matter.
Nothing you have said makes the analysis wrong. At this point, you're essentially just resorting to ad homenem and begging the question.
I don't know who asked you to do it. I wouldn't have done it. Personally, the original intent matters far more to me. You intended to submit an AI-generated article, defending AI, to be read by humans. Anything short of taking the article down and rewriting the entire thing from scratch doesn't meaningfully change that.
> Additionally, my process is often giving it exactly what I want to say, more or less, and having it HTML-format it and insert the templated numbers and UI widgets around that text.
Sorry but you're just further proving my point here. You are so deeply invested in AI that even just manually writing some English text into a static HTML file is something you consider to be below you.
Imagine going back in time 5 years and telling someone: "In the future, nobody uses text editors. On the rare occasion that we actually want to write something to a text file verbatim, we instead recite the text to a complex artificial intelligence algorithm that uses large amounts of computing power to process said text and then recite back a command that writes the text to a file. Sometimes the algorithm decides to be a smartass and change our words or add an extra quip, but that's all part of the fun."
> That's not ignoring the bias, that's literally restating that you think the bias is there.
I was referring to the bias within the actual text of the article vs the inherent bias displayed by the very concept of an AI-generated article defending AI. Passages like these:
> The thread did not stop at words. As is typical for anti-AI users, it eventually escalated to fantasies of violence
Make it fairly obvious that you went into this project with the primary goal of proving such people wrong, possibly backed by a sense of moral superiority relative to a few weirdos on the internet who took things too far (such individuals are present in every online discussion that gets big enough, and their actions do not represent the whole).
> And you're so committed to your preconceived notions that anything made with AI must be bad, wrong, or not worth your time
"Bad" or "wrong" may be subjective, but it's definitely not worth my time, no. If you didn't consider it worth your time to write it, why do you believe it's worth someone else's time to read it? Again, it doesn't matter if you went back to rewrite parts of it after being criticized, as that doesn't change the original intent.
Submitting an AI generated article and expecting meaningful human responses only makes sense if you consider your own time to be worth more than that of others. Do you?
HN relatively, is a very intellectual part of the internet, yet even still, it's really common to see very uneducated opinions here. Not that everyone needs to be very educated, but posts with plainly wrong assumptions and biases shouldn't go completely unchecked so rampantly.