No right to relicense this project
301 points
5 hours ago
| 24 comments
| github.com
| HN
antirez
2 hours ago
[-]
I believe that Pilgrim here does not understand very well how copyright works:

> Their claim that it is a "complete rewrite" is irrelevant, since they had ample exposure to the originally licensed code

This is simply not true. The reason why the "clean room" concept exists is precisely since actually the law recognizes that independent implementations ARE possibile. The "clean room" thing is a trick to make the litigation simpler, it is NOT required that you are not exposed to the original code. For instance, Linux was implemented even if Linus and other devs where well aware of Unix internals. The law really mandates this: does the new code copy something that was in the original one? The clean room trick makes it simpler to say, it is not possible, if there are similar things it is just by accident. But it is NOT a requirement.

reply
jacquesm
2 hours ago
[-]
This is correct. I think any author of a main chunk of code that they claim ownership to (which is probably all of us!) should at least study the basics of copyright law. Getting little details wrong can cost you time, money and eventually your business if you're not careful.
reply
kreco
3 minutes ago
[-]
This is not unprecedented, TCC relicensed part of its code by being approved by all authors:

https://repo.or.cz/tinycc.git/blob/3d963aebcd533da278f086a3e...

The interesting part is that the original author is against it but some people claims it could be a rewrite and not a derivative work.

I don't know the legal basis of everything but it's definitly not morally correct toward the original author.

reply
dathinab
3 hours ago
[-]
The argument that a rewrite is a copyright violation because they are familiar with the code base is not fully sound.

"Insider Knowledge" is not relevant for copyright law. That is more in the space of patent law then copyright law.

Or else a artist having seen a picture of a sunset over an empty ocean wouldn't be allowed to pain another sunset over an empty ocean as people could claim copyright violation.

Through what is a violation is, if you place the code side by side and try to circumvent copyright law by just rephrasing the exact same code.

This also means that if you give an AI access to a code base and tell it to produce a new code base doing the same (or similar) it will most likely be ruled as copyright violation as it's pretty much a side by side rewriting.

But you very much can rewrite a project under new license even if you have in depth knowledge. IFF you don't have the old project open/look at it while doing so. Rewrite it from scratch. And don't just rewrite the same code from memory, but instead write fully new code producing the same/similar outputs.

Through while doing so is not per-se illegal, it is legally very attackable. As you will have a hard time defending such a rewrite from copyright claims (except if it's internally so completely different that it stops any claims of "being a copy", e.g. you use complete different algorithms, architecture, etc. to produce the same results in a different way).

In the end while technically "legally hard to defend" != "illegal", for companies it's most times best to treat it the same.

reply
simiones
1 hour ago
[-]
> "Insider Knowledge" is not relevant for copyright law. That is more in the space of patent law then copyright law.

On the contrary. Except for discussions about punitive damages and so on, insider knowledge or lack thereof is completely irrelevant to patent law. If company A has a patent on something, they can assert said patent against company B regardless of whether any person in company B had ever seen or heard of company A and their patent. Company B could have a legal trail proving they invented their product that matches the patent from scratch with no outside knowledge, and that they had been doing this before company A had even filed their patent, and it wouldn't matter at all - company A, by virtue of filing and being granted a patent, has a legal monopoly on that invention.

In contrast, for copyright the right is intrinsically tied to the origin of a work. If you create a digital image that is entirely identical at the pixel level with a copyrighted work, and you can prove that you had never seen that original copyrighted work and you created your image completely independently, then you have not broken anyone's copyright and are free to sell copies of your own work. Even more, you have your own copyright over your own work, and can assert it over anyone that tries to copy your work without permission, despite an identical work existing and being owned by someone else.

Now, purely in principle this would remain true even if you had seen the other work. But in reality, it's impossible to convince any jury that you happened to produce, entirely out of your own creativity, an original work that is identical to a work you had seen before.

> But you very much can rewrite a project under new license even if you have in depth knowledge. IFF you don't have the old project open/look at it while doing so.

No, this is very much false. You will never be able to win a court case on this, as any significant similarity between your work and the original will be considered a copyright violation, per the preponderance of the evidence.

reply
aleph_minus_one
4 minutes ago
[-]
> In contrast, for copyright the right is intrinsically tied to the origin of a work. If you create a digital image that is entirely identical at the pixel level with a copyrighted work, and you can prove that you had never seen that original copyrighted work and you created your image completely independently, then you have not broken anyone's copyright and are free to sell copies of your own work.

This is not true. I will just give the example of the nighttime illumination of the Eiffel Tower:

> https://www.travelandleisure.com/photography/illegal-to-take...

> https://www.headout.com/blog/eiffel-tower-copyright/

reply
twoodfin
1 hour ago
[-]
If I read Mario Puzo’s The Godfather and then proceed to write a structurally identical novel with many of the same story beats and character types, it will not be difficult to convince a jury exposed to these facts that I’ve created a derivative work.

On the other hand, if I can prove to the jury’s satisfaction that I’ve never been exposed to Puzo’s work in any form, it’s independent creation.

reply
helsinkiandrew
1 hour ago
[-]
In the case of chardet though it wouldn't it be more like you were the publisher of the godfather novel, withdrawing it from print and releasing a novel with the same name with much of the same plot and characters but claiming the new version was an independent creation?
reply
helsinkiandrew
2 hours ago
[-]
If the new maintainers used Claude as their “fancy code generator” (there’s a Claude.md file in the repository so it seems so) then it was almost certainly trained with the chardet source code.
reply
oneeyedpigeon
2 hours ago
[-]
> And don't just rewrite the same code from memory, but instead write fully new code producing the same/similar outputs.

How different does the new code have to be from the old code and how is that measured?

reply
larodi
1 hour ago
[-]
nobody can tell and this is how we entered this very turbulent modern times of "everything can be retold" without punishment. LLMs already doing it at large, while original author is correct in terms of the LGPL, it is nearly impossible to say how different should expression of an idea be to be considered separate one. this is truly fundamental philosophical question that may not have an easy answer.
reply
jmyeet
2 hours ago
[-]
This is a bad argument.

Think of a rewrite (by a human or an LLM) as a translation. If you wrote a book in English and somebody translated it into Spanish, it'd still be a copyright issue. Same thing with translations.

That's very different to taking the idea of a body of work. So you can't copyright the idea of a pirate taking a princess hostage and a hero rescuing her. That's too generic. But even here there are limits. There have been lawsuits over artistic works being too similar.

Back to software, you can't copyright the idea of photo-editing software but you can copyright the source code that produces that software. If you can somehow prompt an LLM to produce photo editing software or if a person writes it themselves then you have what's generally referred to as a "cleanroom" implmentation and that's copyright-free (although you may have patent issues, which is a whole separate issue).

But even if you prompted an LLM that way, how did the LLM learn what it needed? Was the source code of another project an input in its training? This is a legal grey area, currently. But I suspect it's going to be a problem.

reply
pera
1 hour ago
[-]
Suchir Balaji, the OpenAI researcher who was found dead in his flat just before testifying against his employer, published an excellent article somehow related to this topic:

When does generative AI qualify for fair use?

https://suchir.net/fair_use.html

Balaji's argument is very strong and I feel we will see it tested in court as soon as LLM license-washing starts getting more popular.

reply
bsenftner
1 hour ago
[-]
Hate to be "that guy" but in a corrupt legal system, which ours is, none of this matters. Who has the influence and dollars to make the decision theirs is all that matters.
reply
RcouF1uZ4gsC
2 hours ago
[-]
I think you could have an LLM produce a written English detailed description of the complete logic of the program and tests.

Then use another LLM to produce code from that spec.

This would be similar to the cleanroom technique.

reply
simiones
1 hour ago
[-]
Producing a copy of a copyrighted work through a purely mechanical process is clear violation of copyright. LLMs are absolutely not different from a copier machine in the eyes of the law.

Original works can only be produced by a human being, by definition in copyright law. Any artifact produced by an animal, a mechanical process, a machine, a natural phenomenon etc is either a derived work if it started from an original copyrighted work, or a public domain artifact not covered by copyright law if it didn't.

For example, an image created on a rock struck by lightning is not a copyright covered work. Similarly, an image generated by an diffusion model from a randomly generated sentence is not a copyrightable work. However, if you feed a novel as a prompt to an LLM and ask for a summary, the resulting summary is a derived work of said novel, and it falls under the copyright of the novel's owner - you are not allowed to distribute copies of the summary the LLM generated for you.

Whether the output of an LLM, or the LLM weights themselves, might be considered derived works of the training set of that LLM is a completely different discussion, and one that has not yet been settled in court.

reply
robinsonb5
2 hours ago
[-]
Perhaps - but an argument might still be made that the result is a derivative work of the original, given that it's produced by feeding the original work through automated tooling.

But either way, deleting the original version from the repo and replacing it with the new version - as opposed to, say, archiving the old version and starting a new repo with the new version - would still be a dick move.

reply
robin_reala
2 hours ago
[-]
Assuming the second LLM hadn’t been trained on the existing codebase. Which in this case we can’t know, but can assume that it was.
reply
knollimar
2 hours ago
[-]
Does the second LLM have the codebase in its training?
reply
9864247888754
2 hours ago
[-]
One could use Comma, which has only been trained on public domain texts:

https://arxiv.org/pdf/2506.05209

reply
markthered
1 hour ago
[-]
The copyright argument is a sidetrack both in the PR comment thread and here. The issue opened claims the new code is based on the old code, and therefore derivative, and therefore must be offered in a modified version of the source code under the previous license, LGPL. The complaint is the maintainers violated the terms of LGPL, that they must prove no derivation from the original code to legally claim this is a legal new version without the LGPL license. Claim is if they or Claude read the old code (or of course directly use any of it) it is a license violation. “… in the release 7.0.0, the maintainers claim to have the right to “relicense” the project. They have no such right; doing so is an explicit violation of the LGPL. Licensed code, when modified, must be released under the same LGPL license. Their claim that it is a "complete rewrite" is irrelevant, since they had ample exposure to the originally licensed code (i.e. this is not a "clean room" implementation).“ By this reasoning, I am genuinely asking (I’m not a license expert) if a valid clean room rewrite is possible, because at a minimum you would need a spec describing all behavior, which ses to require ample exposure to the original to be sufficiently precise.
reply
Pannoniae
59 minutes ago
[-]
That's not what a derivative work means, though. Being exposed to something doesn't mean you can't create original work which is similar to it (otherwise every song or artwork would be a derivative work of everything before it)

People do cleanroom implementations as a precaution against a lawsuit, but it's not a necessary element.

In fact, even if some parts are similar, it's still not a clear-cut case - the defendant can very well argue that the usage was 1. transformative 2. insubstantial to the entirety of work.

"The complaint is the maintainers violated the terms of LGPL, that they must prove no derivation from the original code to legally claim this is a legal new version without the LGPL license."

The burden of proof is on the accuser.

"I am genuinely asking (I’m not a license expert) if a valid clean room rewrite is possible, because at a minimum you would need a spec describing all behavior, which ses to require ample exposure to the original to be sufficiently precise."

Linux would be illegal if so (they had knowledge of Unix before), and many GNU tools are libre API-compatible reimplementations of previous Unix utilities :)

reply
Roritharr
4 hours ago
[-]
As part of my consulting, i've stumbled upon this issue in a commercial context. A SaaS company who has the mobile apps of their platform open source approached me with the following concern.

One of their engineers was able to recreate their platform by letting Claude Code reverse engineer their Apps and the Web-Frontend, creating an API-compatible backend that is functionally identical.

Took him a week after work. It's not as stable, the unit-tests need more work, the code has some unnecessary duplication, hosting isn't fully figured out, but the end-to-end test-harness is even more stable than their own.

"How do we protect ourselves against a competitor doing this?"

Noodling on this at the moment.

reply
jillesvangurp
9 minutes ago
[-]
> "How do we protect ourselves against a competitor doing this?"

You can try patenting; but not after the fact. Copyright won't help you here. You can't copyright an algorithm or idea, just a specific form or implementation of it. And there is a lot of legal history about what is and isn't a derivative work here. Some companies try to forbid reverse engineering in their licensing. But of course that might be a bit hard to enforce, or prove. And it doesn't work for OSS stuff in any case.

Stuff like this has been common practice in the industry for decades. Most good software ideas get picked apart, copied and re-implemented. IBM's bios for the first PC quickly got reverse engineered and then other companies started making IBM compatible PCs. IBM never open sourced their bios and they probably did not intend for that to happen. But that didn't matter. Likewise there were several PC compatible DOS variants that each could (mostly) run the same applications. MS never open sourced DOS either. There are countless examples of people figuring out how stuff works and then creating independent implementations. All that is perfectly legal.

reply
3rodents
3 hours ago
[-]
You're not describing anything new, you're describing progress. A company invests time and money and expertise into building a product, it becomes established, people copy in 1/10th of the time, the quality of products across the industry improve. Long before generative AI, Instagram famously copied Snapchat's stories concept in a weekend, and that is now a multi-multi-multi-billion contributor to Meta's bottom line.

As engineers, we often think only about code, but code has never been what makes a business succeed. If your client thinks that their businesses primary value is in the mobile app code they wrote, 1) why is it even open source? 2) the business is doomed.

Realistically, though, this is inconsequential, and any time spent worrying about this is wasted time. You don't protect yourself from your competitor by worrying about them copying your mobile app.

reply
amelius
3 hours ago
[-]
> You don't protect yourself from your competitor by worrying about them copying your mobile app.

They did not copy the mobile app. They copied the service.

reply
3rodents
40 minutes ago
[-]
Replace “mobile app” with “backend” in my comment.
reply
IanCal
4 hours ago
[-]
You might be interested in the dark factory work here https://factory.strongdm.ai/

They do something very similar for some of their work. It’s hard to use external services so they replicate them and the cost of doing so has come down from “don’t be daft, we can’t reimplement slack and google drive this sprint just to make testing faster” to realistic. They run the sdks against the live services and their own implementations until they don’t see behaviour differences. Now they have a fast slack and drive and more (that do everything they need for their testing) accelerating other work. I’m dramatically shifting my concept of what’s expensive and not for development. What you’re describing could have been done by someone before, but the difficulty of building that backend has dropped enormously. Even if the application was closed you could probably either now or soon start to do the same thing starting with building back to core user stories and building the app as well.

You can view some of this as having things like the application as a very precise specification.

Really fascinating moment of change.

reply
Garlef
2 hours ago
[-]
> It’s hard to use external services

I think it's interesting to add what they use it for and why its hard.

What they use it for:

- It's about automated testing against third party services.

- It's not about replicating the product for end users

Why using external services is hard/problematic

- Performance: They want to have super fast feedback cycles in the agentic loop: In-Memory tests. So they let the AI write full in-memory simulations of (for example) the slack api that are behaviorally equivalent for their use cases.

- Feasiblity: The sandboxes offered by these services usually have performance limits (= number of requests per month, etc) that would easily be exhausted if attached to a test harness that runs every other minute in an automated BDD loop.

reply
zozbot234
4 hours ago
[-]
> "How do we protect ourselves against a competitor doing this?"

If the platform is so trivial that it can be reverse engineered by an AI agent from a dumb frontend, what's there to protect against? One has to assume that their moat is not that part of the backend but something else entirely about how the service is being provided.

reply
littlecranky67
4 hours ago
[-]
Interesting case, IANAL but sounds legal and legit. The AI did not have expose to the backend it re-implemented. The API itself is public and not protectable.
reply
bandrami
4 hours ago
[-]
OTOH as of yesterday the output of the LLM isn't copyrightable, which makes licensing it difficult
reply
graemep
3 hours ago
[-]
As other's have pointed out, this case is really about refusing to allow an LLM to be recognised as the author. The person using the LLM waived any right to be recognised as the author.

Its also US only. Other countries will differ. This means you can only rely on this ruling at all for something you are distributing only in the US. Might be OK for art, definitely not for most software. Very definitely not OK for a software library.

For example UK law specifically says "In the case of a literary, dramatic, musical or artistic work which is computer-generated, the author shall be taken to be the person by whom the arrangements necessary for the creation of the work are undertaken."

https://www.legislation.gov.uk/ukpga/1988/48/section/9

reply
jacquesm
2 hours ago
[-]
> The person using the LLM waived any right to be recognised as the author.

They can't waive their liability from being identified as an infringer though.

reply
bakugo
3 hours ago
[-]
> the author shall be taken to be the person by whom the arrangements necessary for the creation of the work are undertaken.

This seems extremely vague. One could argue that any part of the pipeline counts as an "arrangement necessary for the creation of the work", so who is the author? The prompter, the creator of the model, or the creator of the training data?

reply
graemep
2 hours ago
[-]
The courts will have to settle that according to circumstances. I think it is likely to be the prompter, and in some cases the creator of the training data as well. The creator of the model will have copyright on the model, but unlikely to have copyright on its outputs (any more than the writer of a compiler has copyright on its output).
reply
NitpickLawyer
4 hours ago
[-]
I wrote this comment on another thread earlier, but it seems relevant here, so I'll just c/p:

I think we didn't even began to consider all the implications of this, and while people ran with that one case where someone couldn't copyright a generated image, it's not that easy for code. I think there needs to be way more litigation before we can confidently say it's settled.

If "generated" code is not copyrightable, where do draw the line on what generated means? Do macros count? Does code that generates other code count? Protobuf?

If it's the tool that generates the code, again where do we draw the line? Is it just using 3rd party tools? Would training your own count? Would a "random" code gen and pick the winners (by whatever means) count? Bruteforce all the space (silly example but hey we're in silly space here) counts?

Is it just "AI" adjacent that isn't copyrightable? If so how do you define AI? Does autocomplete count? Intellisense? Smarter intellisense?

Are we gonna have to have a trial where there's at least one lawyer making silly comparisons between LLMs and power plugs? Or maybe counting abacuses (abaci?)... "But your honour, it's just random numbers / matrix multiplications...

reply
bandrami
2 hours ago
[-]
In terms of adoption, "it's not settled" is even worse
reply
amelius
3 hours ago
[-]
Maybe we should build an LLM that can be the judge of that :)
reply
senko
4 hours ago
[-]
That's a very incorrect reading.

AI can't be the author of the work. Human driving the AI can, unless they zero-shotted the solution with no creative input.

reply
camgunz
2 hours ago
[-]
Only the authored parts can be copyrighted, and only humans can author [0].

"For example, when an AI technology receives solely a prompt from a human and produces complex written, visual, or musical works in response, the 'traditional elements of authorship' are determined and executed by the technology—not the human user."

"In other cases, however, a work containing AI-generated material will also contain sufficient human authorship to support a copyright claim. For example, a human may select or arrange AI-generated material in a sufficiently creative way that 'the resulting work as a whole constitutes an original work of authorship.'"

"Or an artist may modify material originally generated by AI technology to such a degree that the modifications meet the standard for copyright protection. In these cases, copyright will only protect the human-authored aspects of the work, which are 'independent of' and do 'not affect' the copyright status of the AI-generated material itself."

IMO this is pretty common sense. No one's arguing they're authoring generated code; the whole point is to not author it.

[0]: https://www.federalregister.gov/d/2023-05321/p-40

reply
maxerickson
2 hours ago
[-]
So if I want to publish a project under some license and I put a comment in an AI generated file (never mind what I put in the comment), how do you go about proving which portion of that file is not protected under copyright?

If the AI code isn't copyrightable, I don't have any obligations to acknowledge it.

reply
bandrami
1 hour ago
[-]
You're looking at this as the infringer rather than the owner. How do you as a copyright owner prove you meaningfully arranged the work when you want to enforce your copyright?
reply
camgunz
1 hour ago
[-]
Copyright office says this has to be done case-by-case. My guess is they'd ask to see prompts and evidence of authorship.
reply
skeledrew
3 hours ago
[-]
The human is still at best a co-author, as the primary implementation effort isn't theirs. And I think effort involved is the key contention in these cases. Yesterday ideas were cheap, and it was the execution that matters. Today execution is probably cheaper than ideas, but things should still hold.
reply
phire
3 hours ago
[-]
That's not really what the ruling said. Though, I suspect this type of "vibe rewrite" does fall afoul of the same issue.

But for this type of copyright laundering, it doesn't really matter. The goal isn't really about licensing it, it's about avoiding the existing licence. The idea that the code ends up as public domain isn't really an issue for them.

reply
oblio
4 hours ago
[-]
As of yesterday?
reply
phi-go
3 hours ago
[-]
reply
rwmj
2 hours ago
[-]
No serious enterprise SaaS company differentiates themselves solely on the product (the products are usually terrible). It's the sales channel, the fact that you know how to bill a big company, the human engineer who is sent on site to deploy and integrate the product, the people on the support line 24/7, the regulatory framework that ensures the customer can operate legally and obtain insurance, the fact that there's a deep pool of potential hires who have used and understand the product. Those are the differentiators.
reply
ShowalkKama
4 hours ago
[-]
If your backend is trivial enough to be implemented by a large language model, what value are you providing?

I know it's a provoking question but that answers why a competitor is not a competitor.

reply
Meneth
1 hour ago
[-]
"How do we protect ourselves against a competitor doing this?"

That's the neat thing: you don't!

reply
senko
4 hours ago
[-]
> "How do we protect ourselves against a competitor doing this?"

DMCA. The EULA likely prohibits reverse engineering. If a competitor does that, hit'em with lawyers.

Or, if you want to be able to sleep at night, recognize this as an opportunity instead of a threat.

reply
orthoxerox
3 hours ago
[-]
What about jurisdictions where reverse engineering is an inalienable right?
reply
nandomrumber
4 hours ago
[-]
Maybe a better question is:

How do our competitors protect themselves against us doing this?

reply
amelius
3 hours ago
[-]
Makes me wonder when AI will put the mobile phone OS duopoly to an end.
reply
mellosouls
4 hours ago
[-]
The famous case Google vs Oracle may need to be re-evaluated in the light of Agents making API implementation trivial.

https://en.wikipedia.org/wiki/Google_LLC_v._Oracle_America,_....

reply
fragmede
4 hours ago
[-]
Nothing. This is why SaaS stocks took a dump last week.
reply
jmyeet
2 hours ago
[-]
I think the genie is out of the bottle on this one and there's really no putting it back.

There is a certain amount of brand loyalty and platform inertia that will keep people. Also, as you point out, just having the source code isn't enough. Running a platform is more than that. But that gap will narrow with time.

The broader issue here is that there are people in tech who don't realize that AI is coming for their jobs (and companies) too. I hope people in this position can maybe understand the overall societal issues for other people seeing their industries "disrupted" (ie destroyed) by AI.

reply
scosman
4 hours ago
[-]
Sounds like they didn’t build a proper clean room setup: the agent writing the code could see the original code.

Question: if they had built one using AI teams in both “rooms”, one writing a spec the other implementing, would that be fine? You’d need to verify spec doesn’t include source code, but that’s easy enough.

It seems to mostly follow the IBM-era precedent. However, since the model probably had the original code in its training data, maybe not? Maybe valid for closed source project but not open-source? Interesting question.

reply
swiftcoder
4 hours ago
[-]
> Sounds like they didn’t build a proper clean room setup: the agent writing the code could see the original code.

It doesn't matter how they structure the agents. Since chardet is in the LLM training set, you can't claim any AI implementation thereof is clean room.

reply
scosman
4 hours ago
[-]
Yeah I mention that in the question.

Might still be valid for closed source projects (probably is).

I think courts would need to weigh in on the open source side. There’s legal precedent is that you can use a derived work to generate a new unique work (the spec derived for the copyrighted code is very much a derived work). There are rulings that LLMs are transformative works, not just copies of training data.

LLMs can’t reproduce their entire training set. But this thinking is also ripe for misuse. I could always train or fine-tune a model on the original work so that it can reproduce the original. We quickly get into statistical arguments here.

It’s a really interesting question.

reply
swiftcoder
1 hour ago
[-]
> There’s legal precedent is that you can use a derived work to generate a new unique work (the spec derived for the copyrighted code is very much a derived work)

Indeed, but in the clean room scenario, the party who implements the spec has to be a separate entity that has never seen the code. Whether or not the LLM is copyright infringing is a separate question - it definitely has (at least some) familiarity with the code in question, which makes the "clean room" argument an uphill battle

reply
jacquesm
2 hours ago
[-]
I just wrote a long comment about that, but yes, you are on to something here.

The key to me is that the LLM itself is a derived work and that by definition it can not produce something original. Which in turn would make profiting off such a derived work created by an automated process from copyrighted works a case of wholesale copyright infringement. If you can get a judge to agree on that I predict the price of RAM will come down again.

reply
bsza
3 hours ago
[-]
So by that logic, you're not legally allowed to implement your own character detector and license it as your own if you've ever looked at chardet's source code? I'm confused. I thought copyright laws protect intellectual property as-is, not the impression it leaves on someone.
reply
jacquesm
2 hours ago
[-]
Well, you are not making things easier for yourself by looking at that source code if the author of chardet brings a case for copyright infringement against you.

The question is: if you had not looked at chardet's source would you still be able to create your work? If the answer is 'yes' then you probably shouldn't have looked at the source, you just made your defense immeasurably harder. And if the answer is 'no' then you probably should have just used chardet and respected its license.

reply
bsza
1 hour ago
[-]
Sorry, but that sounds like a witch hunt to me, not modern law. Isn't the burden of proof on the accuser? I.e. the accuser has to prove that "this piece of code right here is a direct refactoring of my code, and here are the trivial and mechanical steps to produce one from the other"? And if they present no such evidence, we can all go home?
reply
jacquesm
1 hour ago
[-]
No, the burden of proof is on the defender: if you didn't create it you are not the copyright holder.

Copyright is automatic for a reason, the simple act of creation is technically enough to establish copyright. But that mechanism means that if your claimed creation has an uncanny resemblance to an earlier, published creation or an unpublished earlier creation that you had access to that you are going to be in trouble when the real copyright holder is coming to call.

In short: just don't. Write your own stuff if you plan on passing it off as your own.

The accuser just needs to establish precedence.

So if you by your lonesome have never listened to the radio and tomorrow morning wake up and 'Billy Jean' springs from your brain you're going to get sued, even if the MJ estate won't be able to prove how you did it.

reply
bsza
51 minutes ago
[-]
That much I understand, but that question only comes up when the similarity is already an established fact, no? If we take the claim that this is a "complete rewrite" at face value, then there should be no reason for the code to have any uncanny similarities with chardet 6 beyond what is expectable from their functionality (which is not copyrightable) being the same, right?

So my (perhaps naive) understanding is if none can be found, then the author of chardet 1-6 simply doesn't have a case here, and we don't get to the point of asking "have you been exposed to the code?".

reply
jacquesm
5 minutes ago
[-]
No, they're on the record as this being a derived work. There is no argument here at all. Not finding proof in a copyright case when the author is on the record about the infringement is a complete non-issue.

You'd have to make that claim absent any proof and then there better not be any gross similarities between the two bodies of code that can not be explained away by coincidence.

And then there is such a thing as discovery. I've been party to a case like this and won because of some silly little details (mostly: identical typos) and another that was just a couple of lines of identical JavaScript (with all of the variable names changed). Copyright cases against large entities are much harder to win because they have deeper pockets but against smaller parties that are clearly infringing it is much easier.

When you're talking about documented protocols or interface specifications then it is a different thing, those have various exceptions and those vary from one jurisdiction to another.

reply
swiftcoder
12 minutes ago
[-]
> when the similarity is already an established fact

The similarity is an established fact - the authors claim that this is chardet, to the extent that they are even using the chardet name!

Had they written a similar tool with a different name, and placed it in its own repo, we might be having a very different discussion.

reply
swiftcoder
2 hours ago
[-]
> if you've ever looked at chardet's source code

If you wish to be able to claim in court that it is a "clean room" implementation, yes.

Clean room implementations are specifically where a company firewalls the implementing team off from any knowledge of the original implementation, in order to be able to swear in court that their implementation does not make any use of the original code (which they are in such a case likely not licensed to use).

reply
zozbot234
4 hours ago
[-]
This seems right to me. If you ask a LLM to derive a spec that has no expressive element of the original code (a clean-room human team can carefully verify this), and then ask another instance of the LLM (with fresh context) to write out code from the spec, how is that different from a "clean room" rewrite? The agent that writes the new code only ever sees the spec, and by assumption (the assumption that's made in all clean room rewrites) the spec is purely factual with all copyrightable expression having been distilled out. But the "deriving the spec (and verifying that it's as clean as possible)" is crucial and cannot be skipped!
reply
sigseg1v
4 hours ago
[-]
How would a team verify this for any current model? They would have to observe and control all training data. In practice, any currently available model that is good enough to perform this task likely fails the clean room criteria due to having a copy of the source code of the project it wants to rewrite. At that point it's basically an expensive lossy copy paste.
reply
zozbot234
4 hours ago
[-]
You can always verify the output. Unless the problem being solved really is exceedingly specific and non-trivial, it's at least unlikely that the AI will rip off recognizable expression from the original work. The work may be part of the training but so are many millions of completely unrelated works, so any "family resemblance" would have to be there for very specific reasons about what's being implemented.
reply
oytis
4 hours ago
[-]
It requires the original project to not be in the training data for the model for it to be a clean room rewrite
reply
zozbot234
4 hours ago
[-]
That only matters if expression of the original project really does end up in the rewrite, doesn't it? This can be checked for (by the team with access to the code) and it's also quite unlikely at least. It's not trivial at all to have an LLM replicate their training verbatim: even when feasible (the Harry Potter case, a work that's going to be massively overweighted in training due to its popularity) it takes very specific prompting and hinting.
reply
oytis
4 hours ago
[-]
> That only matters if expression of the original project really does end up in the rewrite, doesn't it?

No, I don't think so. I hate comparing LLMs with humans, but for a human being familiar with the original code might disqualify them from writing a differently-licensed version.

Anyway, LLMs are not human, so as many courts confirmed, their output is not copyrightable at all, under any license.

reply
toyg
4 hours ago
[-]
Uh, this is just a curiosity, but do you have a reference for that last argument?

If true, it would mean most commercial code being developed today, since it's increasingly AI-generated, would actually be copyright-free. I don't think most Western courts would uphold that position.

reply
duskdozer
3 hours ago
[-]
reply
pseudalopex
1 hour ago
[-]
The headline was misleading. The courts avoided to decide what Thaler could have copyrighted because he said he was not the author.
reply
vkou
3 hours ago
[-]
> That only matters if expression of the original project really does end up in the rewrite, doesn't it?

If that were the case, nobody would bother with clean-room rewrites.

reply
nneonneo
4 hours ago
[-]
Somewhat annoyingly, there's been research that suggests that models can pass information to each other via (effectively) steganographic techniques - specific but apparently harmless choices of tokens, wordings, and so on; see https://arxiv.org/abs/1712.02950 and https://alignment.anthropic.com/2025/subliminal-learning/ for some simple examples.

While it feels unlikely that a simple "write this spec from this code" + "write this code from this spec" loop would actually trigger this kind of hiding behaviour, an LLM trained to accurately reproduce code from such a loop definitely would be capable of hiding code details within the spec - and you can't reasonably prove that the frontier LLMs have not been trained to do so.

reply
duskdozer
3 hours ago
[-]
Not if the codebase was included in training the implementer.
reply
fergie
4 hours ago
[-]
Answer: probably not, as API-topography is also a part of copyright

Edit: this is wrong

reply
Tiberium
4 hours ago
[-]
Didn't the Google - Oracle case about Java APIs in Android https://en.wikipedia.org/wiki/Google_LLC_v._Oracle_America,_.... directly disprove this?
reply
looperhacks
3 hours ago
[-]
In the end, the supreme court case decided that the re-implementation fell under fair use, it did not answer the copyright question.
reply
scosman
4 hours ago
[-]
The courts decided that wasn’t true for IBM, Java and many other cases. API typography describes functionality, which isn’t copyrightable (IANAL).
reply
Keyframe
4 hours ago
[-]
Wasn't Oracle vs Google about all of that?
reply
actionfromafar
4 hours ago
[-]
Yeah I think, the Compaq / IBM precedent can only superficially apply. It would be like having two teams only meet in a room full of documentation - but both teams crammed the source code the day before. (That, the source code you are "reverse engineering" is in the training data.) It doesn't make sense.

Also, it's weird that it's okay apparently to use pirated materials to teach an LLM, but maybe not to disseminate what the LLM then tells you.

reply
p0w3n3d
4 hours ago
[-]
Wow that's hot. I was not aware that you need to be "untainted" by the original LGPL code. This could mean that...

All AI generated code is tainted with GPL/LGPL because the LLMs might have been taught with it

reply
wongarsu
4 hours ago
[-]
Being completely untainted is the standard many reimplementations set for themselves to completely rule out legal trouble. For example ReactOS won't let you contribute if you have ever seen Windows code. Because if you have never seen it, there can be no allegation that you copied it.

That is however stricter than what's actually legally necessary. It's just that the actual legal standard would require a court ruling to determine if you passed it, and everyone wants to avoid that. As a consequence there also aren't a lot of court cases to draw similarities to

reply
p_l
3 hours ago
[-]
"Taint" requires that the code is demonstratably derivative from the *GPL licensed work.

This is actually harder standard than some people think.

The absolute clean room approaches in USA are there because they help short circuit a long lawsuit where a bigger corp can drag forever until you're broken.

reply
allreduce
3 hours ago
[-]
Not a lawyer, but that always seemed naively correct to me.

However, the copyright system has always be a sham to protect US capital interests. So I would be very surprised if this is actually ruled/enforced. And in any case american legislators can just change the law.

reply
actionfromafar
4 hours ago
[-]
Yes, that's what some lonely people have been shouting in the desert since the LLM craze started.
reply
greggoB
4 hours ago
[-]
Does "lonely" in this case encompass people who've formed relationshios with said LLMs?
reply
orwin
4 hours ago
[-]
I'm not lonely! And I stopped shouting that since 24, because you know :/
reply
hu3
5 hours ago
[-]
I torn on where the line should be drawn.

If the code is different but API compatible, Google Java vs Oracle Java case shows that if the implementation is different enough, it can be considered a new implementation. Clean room or not.

reply
spoiler
4 hours ago
[-]
That whole clean room argument makes no sense. Project changed governance and was significantly refactored or reimplemented... I think the maintainers deserve to call it their own. Original-pre MIT release can stay LGPL.

I don't think this is a precedent either, plenty of projects changed licenses lol.

I keep kind mixing them up but the GPL licenses keep popping up as occasionally horror stories. Maybe the license is just poorly written for today's standards?

reply
shaan7
4 hours ago
[-]
> plenty of projects changed licenses lol.

They usually did that with approval from existing license holders (except when they didn't, those were the bad cases for sure).

reply
DarkmSparks
4 hours ago
[-]
No. Because they couldnt have done any of that refactoring without a licence to do so, and that licence forbids them from relicencing it.
reply
spoiler
4 hours ago
[-]
Ok since this is not really answered... Hypothetically, If I'm a maintainer of this project. I decided I hate the implementation, it's naive, horrible performance, weird edge cases. I'm wiser today than 3 years ago.

I rewrite it, my head full of my own, original, new ideas. The results turn out great. There's a few if and while loops that look the same, and some public interfaces stayed the same. But all the guts are brand new, shiny, my own.

Do I have no rights to this code?

reply
DarkmSparks
4 hours ago
[-]
You have all rights to the code that you wrote that is not "colored" by previous code. Aka "an original work"

But code that is any kind of derivative of code before it contains a complex mix of other peoples rights. It can be relicensed, but only if all authors large and small agree to the terms.

reply
IanCal
4 hours ago
[-]
Hmm are we in a ship of Theseus/speciation area? Each individual step of refactoring would not cross the threshold but would a rewrite? Even if the end result was the same?
reply
spoiler
4 hours ago
[-]
Let us also remember that certain architectural changes need to happen over a period of planned refractors. Nobody wants to read a 5000 line shotgun-blast looking diff
reply
spoiler
4 hours ago
[-]
So effective, LGPL means you freely give all copyright for your work to the license holder? Even if the license holder has moved on from the project?

What if I decide to make a JS or Rust implementation of this project and use it as inspiration? Does that mean I'm no longer doing a "clean room" implementation and my project is contaminated by LGPL too?

reply
justinclift
4 hours ago
[-]
The standard way of "relicensing" a project is to contact all of the prior code contributors about it and get their ok.

Generally relicensing is done in good faith for a good reason, so pretty much everyone ok's it.

Trickiness can turn up when code contributors aren't contactable (ie dead, missing, etc), and I'm unsure of the legally sound approach to that.

reply
Meneth
1 hour ago
[-]
If a copyright holder does not give you permission, you can't legally relicense. Even if they're dead.

If they're dead and their estate doesn't care, you might pirate it without getting sued, but any recipient of the new work would be just as liable as you are, and they'd know that, so I probably wouldn't risk it.

reply
toyg
4 hours ago
[-]
The legally-sound approach is to keep track of your actions, so you can later prove you've made "reasonable" efforts to contact them.
reply
user34283
4 hours ago
[-]
Afaik you can do whatever you like to GPL licensed code, you do not need a license to refactor it.

I understand you need to publish the source code of your modifications, if you distribute them outside of your company.

reply
skeledrew
2 hours ago
[-]
You can do anything except change the license, which ensures that right to do anything passes on to others in perpetuity. That's how it's designed.
reply
duskdozer
3 hours ago
[-]
You also can't relicense it to be less restrictive
reply
scosman
4 hours ago
[-]
Governance change or refactoring don’t give you a right to relicense someone else’s work. It needs to be a whole new work, which you own the copyright to.
reply
spoiler
4 hours ago
[-]
Which is what happened here? The maintainers did a rewrite, apparently, but it's not enough!
reply
duskdozer
3 hours ago
[-]
No, that defeats the entire purpose of GPL licenses
reply
QuadmasterXLII
1 hour ago
[-]
“Mr Teacher, how many words do I have to change after copy pasting wikipedia so its not plagiarism?” has grown up and entered the workforce.

Pin your dependency versions people! With hashes at this point, cant trust anybody out here.

reply
jmyeet
1 hour ago
[-]
There's a subtext in your point that I want to expand on.

Tech people, particularly engineers, tend to make a fundamental error when dealing with the law that almost always causes them to make wrong conclusions. And that error is that they look for technical compliance when so much of the law is subjective and holistic.

An example I like to use is people who do something illegal on the Internet and then use the argument "you can't prove I did it (with absolute certainty)". It could've been someone who hacked your Wifi. You don't know who on the Wifi did it, etc. But the law will look at the totality of the evidence. Did the activity occur when you were at home and stop when you weren't? How likely are alternative explanations? Etc.

All of that will be considered based on some legal standard depending on the venue. In civil court that tends to be "the preponderance of the evidence" (meaning more likely than not) while in criminal court it's "beyond a reasonable doubt" (which is a much higher standard).

So, using your example, an engineer will often fall into a trap of thinking they can substitute enough words to have a new original work, Ship of Theseus-like. And the law simply doesn't work that way.

So, when this gets to a court (which it will, it's not a question of "if"), the court will consider how necessary the source work was to what you did. If you used it for a direct translation (eg from C++ to Go) then you're going to lose. My prediction is that even using it in training data will be cause for a copyright claim.

If you use Moby Dick in your training data and ask an LLM to write a book like Moby Dick (either explicitly or implicitly) then you're going to have an issue. Even if you split responsibilities so one LLM (training on Moby Dick) comes up with a structure/prompt and another LLM (not trained on Moby Dick) writes it, I don't think that'll really help you avoid the issue.

reply
darkwater
1 hour ago
[-]
It's not clear at all why the current maintainers wanted/needed this re-licensing. I guess that their employee, Monarch Money, wants to use derivative work in their application without releasing the changes? It was already LGPL, perfect for a library, not GPL.
reply
duckerude
1 hour ago
[-]
Perhaps notable: years ago the original original chardet was rewritten with a different license: https://github.com/hsivonen/chardetng

AFAIK this was not a clean room reimplementation. But since it was rewritten by hand, into a different language, with not just a different internal design but a different API, I could easily buy that chardetng doesn't infringe while Python chardet 7 does.

reply
binaryturtle
3 hours ago
[-]
Isn't the real issue here that tons of projects that depend on the "chardet" now drag in some crappy still unverified AI slop? AI forgery poisoning, IMHO.

Why does this new project here needed to replace the original like that in this dishonourable way? The proper way would have been to create a proper new project.

Note: even Python's own pip drags this in as dependency it seems (hopefully they'll stick to a proper version)

reply
robinsonb5
2 hours ago
[-]
This indeed the real issue (not the AI angle per se, but the wholesale replacement. The licensing issue is real, but less important IMO).

Half a million lines of code have been deleted and replaced over the course of four days, directly to the main branch with no opportunity for community review and testing. (I've no idea whether depending projects use main or the stable branch, but stable is nearly 4 years old at this point, so while I hope it's the version depending projects use, I wouldn't put money on it.)

The whole thing smells a lot like a supply chain attack - and even if it's in good faith, that's one hell of a lot of code to be reviewed in order to make sure.

reply
duskdozer
55 minutes ago
[-]
The test coverage is going to be entirely different, unless of course they copied the tests, which would then preclude them from changing the license. They didn't even bother to make sure the CI passed on merging a major version release https://github.com/chardet/chardet/actions/runs/22563903687/...
reply
earthscienceman
1 hour ago
[-]
Woah. As someone not in this particular community but dependent on these tools this is exactly the terrifying underbelly we've all discussed with the user architecture of tools like pip and npm. It's horrifying that a major component just got torn apart, rebuilt, and deployed to anyone who uses those python ecosystems (... many millions? ... billions of people?)
reply
adrian17
23 minutes ago
[-]
The drop"-in" compatibility claims are also just wrong? I ran it on the old test suite from 6.0 (which is completely absent now), and quickly checking:

- the outputs, even if correctly deduced, are often incompatible: "utf-16be" turns into "utf-16-be", "UTF-16" turns into "utf-16-le" etc. FWIW, the old version appears to have been a bit of a mess (having had "UTF-16", "utf-16be" and "utf-16le" among its outputs) but I still wouldn't call the new version _compatible_,

- similarly, all `ascii` turn into `Windows-1252`

- sometimes it really does appear more accurate,

- but sometimes it appears to flip between wider families of closely related encodings, like one SHIFT_JIS test (confidence 0.99) turns into cp932 (confidence 0.34), or the whole family of tests that were determined as gb18030 (chinese) are now sometimes determined as gb2312 (the older subset of gb18030), and one even as cp1006, which AFAIK is just wrong.

As for performance claims, they appear not entirely false - analyzing all files took 20s, versus 150s with v6.0. However, looks like the library sometimes takes 2s to lazy initialize something, which means that if one uses `chardetect` CLI instead of Python API, you'll pay this cost each time and get several times slower instead.

Oh, and this "Negligible import memory (96 B)" is just silly and obviously wrong.

reply
duskdozer
1 hour ago
[-]
Yeah, there's really low quality code added if you take a look.
reply
geenat
4 hours ago
[-]
FastAPI's underlying library, Starlette, has been going through licensing shenanigans too lately: https://github.com/Kludex/starlette/issues/3042

Be really careful who you give your projects keys to, folks!

reply
Orygin
4 hours ago
[-]
That doesn't seem related at all, this is just adding attribution, not changing the license through LLM-washing
reply
pmarreck
1 hour ago
[-]
I have successfully reproduced a few projects with LLM assistance via strict cleanroom rules and only working off public specifications.
reply
b40d-48b2-979e
1 hour ago
[-]
Once you use a LLM, the room is no longer clean.
reply
bdangubic
1 hour ago
[-]
same is true for using humans
reply
b40d-48b2-979e
18 minutes ago
[-]
No, it isn't. A human wasn't trained on the material they're trying to reproduce.
reply
noosphr
1 hour ago
[-]
If the code is written by an Ai they can't copyright it. It is all public domain.
reply
Ardren
4 hours ago
[-]
Huh, 7e25bf4 was a big commit.

  2,305 files changed
  +0 -546871 lines changed
https://github.com/chardet/chardet/commit/7e25bf40bb4ae68848...
reply
mytailorisrich
5 hours ago
[-]
> Licensed code, when modified, must be released under the same LGPL license. Their claim that it is a "complete rewrite" is irrelevant, since they had ample exposure to the originally licensed code (i.e. this is not a "clean room" implementation).

I don't think that the second sentence is a valid claim per se, it depends on what this "rewritten code" actually looks like (IANAL).

Edit: my understanding of "clean room implementation" is that it is a good defence to a copyright infrigement claim because there cannot be infringement if you don't know the original work. However it does not mean that NOT "clean room implementation" implies infrigement, it's just that it is potentially harder to defend against a claim if the original work was known.

reply
bo1024
4 hours ago
[-]
I agree that (while the ethics of this are a different issue) the copyright question is not obviously clear-cut. Though IANAL.

As the LGPL says:

> A "work based on the Library" means either the Library or any derivative work under copyright law: that is to say, a work containing the Library or a portion of it, either verbatim or with modifications and/or translated straightforwardly into another language. (Hereinafter, translation is included without limitation in the term "modification".)

Is v7.0.0 a [derivative work](https://en.wikipedia.org/wiki/Derivative_work)? It seems to depend on the details of the source code (implementing the same API is not copyright infringement).

reply
jerven
5 hours ago
[-]
I was wondering how the existing case law of translated works, from one language to an other works here. It would at suggest that this is an infringement of the license especially because of the lack of creativity. But IANAL and of course no idea of applicable case law.
reply
_ache_
4 hours ago
[-]
"Exposure" means here, I think, that they feed the 6.X code version to Claude.
reply
Radle
5 hours ago
[-]
the ai copy pasted the existing project. How can such a procedure not fall under copyright?

Especially now that ai can do this for any kind of intellectual property, like images, books or sourcecode. If judges would allow an ai rewrite to count as an original creation, copyright as we know it completely ends world wide.

Instead whats more likely is that no one is gonna buy that shit

reply
charcircuit
4 hours ago
[-]
>the ai copy pasted the existing project.

The change log says the implementation is completely different, not a copy paste. Is that wrong?

>Internal architecture is completely different (probers replaced by pipeline stages). Only the public API is preserved.

reply
fzeroracer
4 hours ago
[-]
It's up to them to prove that a) the original implementation was not part of whatever data set said AI used and b) that the engineers in question did not use the original as a basis.
reply
charcircuit
4 hours ago
[-]
It's up to the accuser to prove that they copied it and did not actually write it from scratch as they claimed.
reply
fzeroracer
4 hours ago
[-]
No, that's not how copyright laws work. Especially in a world where the starting point is the accused making something and marketing it as someone else's IP with a license change.
reply
Ukv
4 hours ago
[-]
It's still on the claimant to establish copying, which usually involves showing that the two works are substantially similar in protected elements. That the defendants had access to the original helps establish copying, but isn't on its own sufficient.

Only after that would the burden be on the defendants, such as to give a defense that their usage is sufficiently transformative to qualify as fair use.

reply
spacedcowboy
4 hours ago
[-]
I came here to say this. While I agree with Mark that what they’re doing is not nice, I’m not sure it’s wrong. A clean-room implementation is one way the industry worked around licensing in the past (and present, I guess), but it’s not a requirement in law as far as I know.

I’m not sure that “a total rewrite” wouldn’t, in fact, pass muster - depending on how much of a rewrite it was of course. The ‘clean room’ approach was just invented as a plausible-sounding story to head off gratuitous lawsuits. This doesn’t look as defensible against the threat of a lawsuit, but it doesn’t mean it wouldn’t win that lawsuit (I’m not saying it would, I haven’t read or compared the code vs its original). Google copied the entire API of the Java language, and got away with it when Oracle sued. Things in a courtroom can often go in surprising ways…

[edit: negative votes, huh, that’s a first for a while… looks like Reddit/Slashdot-style “downvote if you don’t like what is being said” is alive and well on HN]

reply
duskdozer
3 hours ago
[-]
I spent like two minutes looking at the diff between the original and the supposed "clean room" implementation [1] and already found identical classes, variable names, methods, and parameters. It looks like there was no actual attempt at clean-rooming this, regardless of whether that "counts".

[1]https://github.com/chardet/chardet/compare/6.0.0.post1...7.0...

reply
toyg
4 hours ago
[-]
Lol at the statement that "clean room" would have been invented to scare people from suing. It's the opposite: clean room is a fairly-desperate attempt to pre-empt accusations in court when it is expected that the "derivative" argument will be very strong, in order to then piggyback on the doctrine about interoperability. Sometimes it works, but it's a very high bar to clear.
reply
actionfromafar
4 hours ago
[-]
I thought we were debating if it was legal, not if it's wrong. The law is about creativity. Was this creative or a more mechanical translation?
reply
klustregrif
5 hours ago
[-]
It will hold up in court. The line of argument of “well I went into a dark room with only the first Harry Potter book and a type writer and reproduced the entire work, so now I own the rewrite” doesn’t hold up in court, it doesn’t either when when you put AI in the mix. It doesn’t matter if the result is slightly different, a judge will rule based on the fact that this even is literally what the law is intended to prevent, it’s not a case of which incantation or secret sentence you should utter to free the work of its existing license.
reply
mytailorisrich
4 hours ago
[-]
> “well I went into a dark room with only the first Harry Potter book and a type writer and reproduced the entire work, so now I own the rewrite”

This is not a good analogy.

A "rewrite" in context here is not a reproduction of the original work but a different work that is functionally equivalent, or at least that is the claim.

reply
IanCal
4 hours ago
[-]
Possibly important is that it’s largely api compatible but it’s not functionally equivalent in that its performance (as accuracy not just speed) is different.
reply
oytis
4 hours ago
[-]
I wonder if LLMs will push the industry towards protecting their IP with patents like the other branches of engineering rather than copyright. If you patent a general idea of how your software works then no rewrite will be able to lift this protection.
reply
skeledrew
2 hours ago
[-]
General patents aren't allowed.
reply
charcircuit
5 hours ago
[-]
Clean room implementations are not necessary to avoid copyright infringement.
reply
q3k
4 hours ago
[-]
> 12-stage detection pipeline

What is this recent (clanker-fueled?) obsession to give everything fancy computer-y names with high numbers?

It's not a '12 stage pipeline', it's just an algorithm.

reply
IanCal
4 hours ago
[-]
Isn’t it? I mean 12 stage pipeline has a very specific meaning to me in this area, and is not a new way of describing something. The release notes description sounds like a multi stage pipeline.

Do you know this kind of area and are commenting on the code?

reply
mamoon_syed
4 hours ago
[-]
"ok chatgpt, what name do i give to this algorithm, so it sounds fancy and advanced?"
reply
myrmidon
4 hours ago
[-]
I think Mark Pilgrim misrepresents the legal situation somewhat: The AI rewrite does not legally need to be a clean room implementation (whatever exactly that would even mean here).

That is just the easiest way to disambiguate the legal situation (i.e. the most reliable approach to prevent it from being considered a derivative work by a court).

I'm curious how this is gonna go.

reply
soulofmischief
4 hours ago
[-]
The README has clearly been touched by an LLM. Count the idiosyncrasies:

“chardet 7.0 is a ground-up, MIT-licensed rewrite of chardet. Same package name, same public API — drop-in replacement for chardet 5.x/6.x”

Do people not write anymore?

reply
tclancy
3 hours ago
[-]
I finally had to mute r/isthisai on Reddit because there’s now a subset of people who see the hand of AI in everything. Could that be generated by a clanker? Sure, but it’s also exactly what I would write if I wanted a quick pitch for a library that addresses some immediate concerns. It’s also what I would focus on if the fact we had just finished a rebuild from scratch.

As Freud famously said, sometimes an em dash is just an em dash.

reply
adrian17
2 hours ago
[-]
FWIW, I don't think there's even a room for interpretation here, given the commit that created the README (and almost all commits since the rewrite started 4 days ago) is authored by

> dan-blanchard and claude committed 4 days ago

reply
tclancy
1 hour ago
[-]
Sure, I just could use a break from the needless side tracks.
reply
remix2000
3 hours ago
[-]
For me, some projects I start by writing a readme.txt by hand. That saves me time in cases I realize I'd be making something pointless. (I don't use chatbots when coding though)
reply
imcritic
4 hours ago
[-]
Licenses are cancer and the enemy of opensource.
reply
kykat
2 hours ago
[-]
There would be no open source without the gpl
reply
actionfromafar
4 hours ago
[-]
Open source as a concept is intertwined with the concept of a license.
reply
spoiler
4 hours ago
[-]
I think it's just the GPL family of licenses that tend tend to cause most problems. I appreciate their intent, but the outcome often leaves a lot to be desired.
reply
nothrabannosir
4 hours ago
[-]
The GPL exists for the benefit of end users, not developers. It being a chore for developers who want to deny their users the software freedoms is a feature, not a bug.
reply
orphea
4 hours ago
[-]
If you have ill intentions or maybe you're a corporation that wants to use someone else's work for free without contributing anything back, then yes, I can see how GPL licenses "tend to cause problems".
reply
duskdozer
3 hours ago
[-]
If the GPL causes you problems, then it's working as intended.
reply
vova_hn2
4 hours ago
[-]
I like to think about GPL as a kind of an artistic performance and an elaborate critique of the whole concept of copyright.

Like, "we don't like copyright, but since you insist on enforcing it and we can't do anything against it, we will invent a clever way to use your own rules against you".

reply
jonathanstrange
4 hours ago
[-]
That is not really the motivation behind GPL licenses. These licenses have been designed to ensure by legal means that anyone can learn from the source code of software, fix bugs on their own, and modify the software to their needs.
reply
Orygin
4 hours ago
[-]
Wtf are these comments? A LGPL licensed project, guaranteed to be free and open source, being LLM-washed to a permissive license, and GPL is the problem here?

They are literally stealing from open source, but it's the original license that is the issue?

reply
spoiler
2 hours ago
[-]
They have been maintaining the project for years. It's not like some Joe Random with ChatGPT randomly entered the scene
reply
Orygin
2 hours ago
[-]
And? Doesn't give them any right to re-license the code. Especially not to strip rights for other users.
reply
cap11235
2 hours ago
[-]
And what exactly are some of these problems?
reply
jonathanstrange
4 hours ago
[-]
Why? What's your problem with them? They do exactly what they're supposed to do, to ensure that future derivatives of the source code have to be distributed under the same license and distribution respects fundamental freedoms.
reply
skeledrew
4 hours ago
[-]
I feel like the author is missing a huge point here by fighting this. The entire reason why GPL and any other copyleft license exists in the first place is to ensure that the rights of a user to modify, etc a work cannot be ever taken away. Before, relicensing as MIT - or any other fully permissive license - would've meant open doors to apply restrictions going forward, but with AI this is now a non-issue. Code is now very cheap. So the way I see this, anyone who is for copyleft should be embracing AI-created things as not being copyrightable (or a rewrite being relicensable) hard*.
reply
Maken
2 hours ago
[-]
The user is the end-user of the product. If the relicensing means that someone down the line receives a close-down binary application that he cannot modify, that's a violation of the user's rights.
reply
skeledrew
2 hours ago
[-]
But it's a non-issue as said user can just have AI reverse engineer said binary. Or reimplement something with the same specs. That's what it means for code to be cheap.
reply
duskdozer
51 minutes ago
[-]
It may be "cheap" at the moment. Let's revisit when the AI companies decide they need to regain a little bit of the hundreds of billions of dollars in losses they're creating.
reply
skeledrew
42 minutes ago
[-]
China is always waiting for this. And the US won't allow China to get all the users who'd emigrate over increased costs, so the costs will remain low. They'll have to find ways to recoup that don't involve raising the cost of code.
reply
philipwhiuk
2 hours ago
[-]
Code is only cheap with AI because AI ignores the law.
reply
skeledrew
2 hours ago
[-]
Laws change, and it's also law that now says AI-generated works can't be copyrighted, which makes everything even cheaper.
reply