"The Space Jam website is simple: a single HTML page, absolute positioning for every element, and a tiling starfield GIF background.".
This is not true, the site is built using tables, not positioning at all, CSS wasn't a thing back then...
Here was its one-shot attempt at building the same type of layout (table based) with a screenshot and assets as input: https://i.imgur.com/fhdOLwP.png
I'm keeping it in for now because people have made some good jokes about the mistake in the comments and I want to keep that context.
Still works, only Claude can not understand what those tables means.
Responsive layout would be the biggest reason (mobile for one, but also a wider range of PC monitor aspect ratios these days than the 4:3 that was standard back then), probably followed by conflating the exact layout details with the content, and a separation of concerns / ease of being able to move things around.
I mean, it's a perfectly viable thing if these are not requirements and preferences that you and your system have. But it's pretty rare these days that an app or site can say "yeah, none of those matter to me the least bit".
Hacker News uses nesting tables for comments. This comment that you're reading right now is rendered within a table that has three ancestor tables.
As late as 2016 (possibly even later), they did so in a way that resulted in really tiny text when reading comments on mobile devices in threads that were more than five or so layers deep. That isn't the case anymore - it might be because HN updated the way it generates the HTML, though it could also be that browser vendors updated their logic for rendering nested tables as well. I know that it was a known problem amongst browser developers, because most uses for nested tables were very different than what HN was (is?) using them for, so making text inside deeply nested tables smaller was generally a desirable feature... just not in the context of Hacker News.
But was Space Jam using multiple images or just one large image with and image map for links?
It was beyond irony that the recommended solution was to tell the browser to render your divs as a table.
- Read an .icc file from disk
- parsed the file and extracted the VCGT (video card gamma table)
- wrote the VCGT to the video card for a specified display via amdgpu driver APIs
The only thing I had to fix was the ICC parsing, where it would parse header strings in the wrong byte-order (they are big-endian).
Would we use the same word if two different humans wrote code that solved two different problems, but one part of each problem was somewhat analogous to a different aspect of a third human's problem, and the third human took inspiration from those parts of both solutions to create code that solved a third problem?
What if it were ten different humans writing ten different-but-related pieces of code, and an eleventh human piecing them together? What if it were 1,000 different humans?
I think "plagiarism", "inspiration", and just "learning from" fall on some continuous spectrum. There are clear differences when you zoom out, but they are in degree, and it's hard to set a hard boundary. The key is just to make sure we have laws and norms that provide sufficient incentive for new ideas to continue to be created.
It's also fun to tell Copilot that the code will violate a license. It will seemingly always tell you it's fine. Safe legal advice.
1) Verbatin copy is first-order plagiarism.
2a) Second-order plagiarism of written text would be replacing words with synonyms. Or taking a book paragraph by paragraph and for each one of them, rephrasing it in your own words. Yes, it might fool automated checkers but the structure would still be a copy of the original book. And most importantly, it would not contain any new information. No new positive-sum work was done. It would have no additional value.
Before LLMs almost nobody did this because the chance that it would help in a lawsuit vs the amount of work was not a good tradeoff. Now it is. But LLMs can do "better":
2b) A different kind of second-order plagiarism is using multiple sources and plagiarizing each of them only in part. Find multiple books on the same topic, take 1 chapter from each and order them in a coherent manner. Make it more granular. Find paragraphs or phrases which fit into the structure of your new book but are verbatim from other books. See how granular you can make it.
The trick here is that doing this by hand is more work than just writing your own book. So nobody did it and copyright law does not really address this well. But with LLMs, it can be automated. You can literally instruct an LLM to do this and it will do it cheaper than any human could. However, how LLMs work internally is yet different:
n) Higher-order plagiarism is taking multiple source books, identifying patterns, and then reproducing them in your "new" book.
If the patterns are sufficiently complex, nobody will ever be able to prove what specifically you did. What previously took creative human work now became a mechanical transformation of input data.
The point is this ability to detect and reproduce patterns is an impressive innovation but it's built on top of the work of hundreds of millions[0] of humans whose work was used without consent. The work done by those employed by the LLM companies is minuscule compared to that. Yet all of the reward goes to them.
Not to mention LLMs completely defear the purpose of (A)GPL. If you can take AGPL code and pass it through a sufficiently complex mechanical transformation that the output does the same thing but copyright no longer applies, then free software is dead. No more freedom to inspect and modify.
[0]: Github alone has 100 million users ( https://expandedramblings.com/index.php/github-statistics/ ) and we have reason to believe all of their data was used in training.
As for your free software is dead argument: I think it is worse than that: it takes away the one payment that free software authors get: recognition. If a commercial entity can take the code, obfuscate it and pass it off as their own copyrighted work to then embrace and extend it then that is the worst possible outcome.
Good point. Reminds me of how if you poison one person, you go to prison, but when a company poisons thousands, it gets a fine... sometimes.
> it takes away the one payment that free software authors get: recognition
I keep flip-flopping on this. I did most of my open source work not caring about recognition but about the principles of GPL and later AGPL. However, I came to realize it was a mistake - people don't judge you by the work you actually do but by the work you appear to do. I have zero respect for people who do something just for the approval of others but I am aware of the necessity of making sure people know your value.
One thing is certain: credit/recognition affect all open source code, user rights (e.g. to inspect and modify) affect only the subset under (A)GPL.
Both are bad in their own right.
Programmers are not expected to add an addendum to every file listing all the books, articles, and conversations they've had that have influenced the particular code solution. LLMs are trained on far more sources that influence their code suggestions, but it seems like we actually want a higher standard of attribution because they (arguably) are incapable of original thought.
This isn't just giving credit; it's valuable documentation.
If you're later looking at this function and find a bug or want to modify it, the original source might not have the bug, might have already fixed it, or might have additional functionality that is useful when you copy it to a third location that wasn't necessary in the first copy.
Maybe we could resolve the bit of a conundrum by the op in requiring 'agents' to give credit for things if they did rag them or pull them off the web?
It still doesn't resolve the 'inherent learning' problem.
It's reasonable to suggest that if 'one person did it, we should give credit' - at least in some cases, and also reasonable that if 1K people have done similar things ad the AI learns from that, well, I don't think credit is something that should apply.
But a couple of considerations:
- It may not be that common for an LLM to 'see one thing one time' and then have such an accurate assessment of the solution. It helps, but LLMs tend not to 'learn' things that way.
- Some people might consider this the OSS dream - any code that's public is public and it's in the public domain. We don't need to 'give credit' to someone because they solved something relatively arbitrary - or - if they are concerned with that, then we can have a separate mechanism for that, aka they can put it on Github or Wikipedia even, and then we can worry about 'who thought of it first' as a separate consideration. But in terms of Engineering application, that would be a bit of a detractor.
I think it should.
Sure, if you make a small amount of money and divide it among the 1000 people who deserve credit due to their work being used to create ("train") the model, it might be too small to bother.
But if actual AGI is achieved, then it has nearly infinite value. If said AGI is built on top of the work of the 1000 people, then almost infinity divided by 1000 is still a lot of money.
Of course, the real numbers are way larger, LLMs were trained on the work of at least 100M but perhaps over a billion of people. But the value they provide over a long enough timespan is also claimed to be astronomical (evidenced by the valuations of those companies). It's not just their employees who deserve a cut but everyone whose work was used to train them.
> Some people might consider this the OSS dream
I see the opposite. Code that was public but protected by copyleft can now be reused in private/proprietary software. All you need to do it push it through enough matmuls and some nonlinearities.
Do you have papers to back this up ? That was also my reaction when i saw some really crazy accurate comments on some vibe coded piece of code, but i couldn't prove it, and thinking about it now i think my intuition was wrong (ie : LLMs do produce original complex code).
If that does not work then the moment you introduce AI you cap their capabilities unless humans continue to create original works to feed the AI. The conclusion - to me, at least - is that these pieces of software regurgitate their inputs, they are effectively whitewashing plagiarism, or, alternatively, their ability to generate new content is capped by some arbitrary limit relative to the inputs.
Neural networks can at best uncover latent correlations that were already available in the inputs. Expecting anything more is basically just wishful thinking.
If so, I'm not sure it's a useful framing.
For novel writing, sure, I would not expect much truly interesting progress from LLMs without human input because fundamentally they are unable to have human experiences, and novels are a shadow or projection of that.
But in math – and a lot of programming – the "world" is chiefly symbolic. The whole game is searching the space for new and useful arrangements. You don’t need to create new information in an information-theoretic sense for that. Even for the non-symbolic side (say diagnosing a network issue) of computing, AIs can interact with things almost as directly as we can by running commands so they are not fundamentally disadvantaged in terms of "closing the loop" with reality or conducting experiments.
When I apply that machine (with its giant pool of pirated knowledge) _to my inputs and context_ I can get results applicable to my modestly novel situation which is not in the training data. Perhaps the output is garbage. Naturally if my situation is way out of distribution I cannot expect very good results.
But I often don't care if the results are garbage some (or even most!) of the time if I have a way to ground-truth whether they are useful to me. This might be via running a compile, a test suite, a theorem prover or mk1 eyeball. Of course the name of the game is to get agents to do this themselves and this is now fairly standard practice.
¹https://chatgpt.com/share/69367c7a-8258-8009-877c-b44b267a35...
It does this all the time, but as often as not then outputs nonsense again, just different nonsense, and if you keep it running long enough it starts repeating previous errors (presumably because some sliding window is exhausted).
We all stand on the shoulders of giants and learn by looking at others’ solutions.
To me that's proof positive they know their output is mangled inputs, they need that originality otherwise they will sooner or later drown in nonsense and noise. It's essentially a very complex game of Chinese whispers.
(I created a template language for JSON and added branching and conditionals and realized I had a whole programming language. Really proud of my originality until i was reading Ted Nelson's Computer Lib/Dream Machines and found out I reinvented TRAC, and to some extent, XSLT. Anyway LLMs are very good at reasoning about it because it can be constrained by a JSON schema. People who think LLMs only regurgitate haven't given it a fair shot)
Perhaps the occasional program that relies heavily on precise visual alignment will fail - but I dare say if we give the LLM the same grace we'd give a visually impaired designer, it can do exactly as well.
It failed massively, spitting out garbage code, where the comments claimed to use blocking access patterns, but the code did not actually use them at all.
LLMs are, frankly, nearly useless for programming. They may solve a problem every once in a while, but once you look at the code, you notice it's either directly plagiarized or bad quality (or both, I suppose, in the latter case).
I recently asked Gemini 3 Pro to create an RSS feed reader type of experience by using XSLT to style and layout an OPML file. I specifically wanted it to use a server-side proxy for CORS, pass through caching headers in the proxy to leverage standard HTTP caching, and I needed all feed entries for any feed in the OPML to be combined into a single chronological feed.
It initially told multiple times that it wasn't possible (it also reminded me that Google is getting rid of XSLT). Regardless, after reiterating that it is possible multiple times it finally decided to make a temporary POC. That POC worked on the first try, with only one follow up to standardize date formatting with support for Atom and RSS.
I obviously can't say the code was novel, though I would be a bit surprised if it trained on that task enough for it to remember roughly the full implementation and still claimed it was impossible.
Yes, and Shakespeare merely copied the existing 26 letters of the English alphabet. What magical process do you think students are using when they read and re-combine learned examples to solve assignments?
It reproduces _patterns from the training data_, sometimes including verbatim phrases.
The work (to discover those patterns, to figure out what works and what does not, to debug some obscure heisenbug and write a blog post about it, ...) was done by humans. Those humans should be compensated for their work, not owners of mega-corporations who found a loophole in copyright.
Is there something unique about code, that is different from language (or images), that would make it impossible for an LLM to produce original code? I don't believe so, but I'm willing to be convinced.
I think this switches the burden of proof: we know LLMs can produce original content in other contexts. Why would they not be able to create original code?
[0] Ever curious, I tested this assumption. I got Claude to write an original limerick about goats oiling their beards with olive oil, which was the first reasonable thing I could think of as a suitably niche subject. I googled the result and could not find anything close to it. I then asked it to produce another limerick on the same subject, and it produced a different limerick, so obviously not just repeating training data.
[1] https://www.oneusefulthing.org/p/the-recent-history-of-ai-in...
What do you mean? The programmers work is literally combining the existing patterns into solutions for problems.
> took that code without credit to the original author(s), adapted it to your use case
Aka software engineering.
I don’t think it’s fair to call someone who used Stack Overflow to find a similar answer with samples of code to copy to their project an asshole.
Stack Overflow offers access to other peoples’ work, and developers combined those snippets and patterns into their own projects. I suspect attribution is low.
How do you describe the “reckless” use of information?
If you reproduce something, usually you have to check the earlier implementation for it and copy it over. This would inevitably require you to look at the license and author of said code.
Assuming of course, you're talking about nontrivial functionality, because obviously we're not talking about trivial one-liners etc.
Student? Good learner? Pretty much what everyone does can be boiled down to reading lots of other code that’s been written and adapting it to a use case. Sure, to some extent models are regurgitating memorized information, but for many tasks they’re regurgitating a learned method of doing something and backfilling the specifics as needed— the memorization has been generalized.
Humans do this all the time.
Or are you saying that every piece of code you ever wrote was 100% original and not adapted from any previous codebase you ever worked in or any book / reference you ever read?
There are court cases where this is being addressed currently, and if you think about how LLMs operate, a reasonable person typically sees that it looks an awful lot like plagiarism.
If you want to claim it is not plagiarism, that requires a good argument, because it is unclear that LLMs can produce novelty, since they're literally trying to recreate the input data as faithfully as possible.
Who?
_LLMs are lossily compressed archives of stolen code_.
Trying to achieve AI through compression is nothing new.[0] The key innovation[1] is that the model[2] does not output only the first order input data but also the higher order patterns from the input data.
That is certainly one component of intelligence but we need to recognize that the tech companies didn't build AI, they build a compression algorithm which, combined with the stolen input text, can reproduce the input data and its patterns in an intelligent-looking way.
[0]: http://prize.hutter1.net/
[1]: Oh, god, this phrase is already triggering my generated-by-LLM senses.
[2]: Model of what? Of the stolen text. If 99.9999% of the work to achieve AI wasn't done by people whose work was stolen, they wouldn't be called models.
Not to mention the moral vagaries of "if you use a library, is the complete articulation of your thing actually 100% your code?"
Is there a difference between loading and using a function from ImageMagick, and a standalone copycat function that mimics a function from ImageMagick?
What if you need it transliterated from one language to another?
Is it really that different than those 1200 page books from the 90's that walk you through implementing a 3D engine from scratch (or whatever the topic might be)? If you make a game on top of that book's engine, is your game truly yours?
If you learn an algorithm in some university class and then just write it again later, is that code yours? What if your code is 1-for-1 a copy of the code you were taught?
It gets very murky very quick!
Obviously I would encourage proper citation, but I also recognize the reality of this stuff -- what if you're fully rewriting something you learned decades ago and don't know who to cite? What if you have some code snippet from a website long forgotten that you saved and used? What if you use a library that also uses a library that you're not aware of because you didn't bother to check, and you either cite the wrapper lib or cite nothing at all?
I don't have some grand theory or wise thoughts about this shit, and I enjoy the anthropological studies trying to ascertain provenance / assign moral authority to remarkable edge cases, but end of the day I also find it exhausting to litigate the use of a tool that exploited the fact that your code got hoovered up by a giant robot because it was public, and might get regurgitated elsewhere.
To me, this is the unfortunate and unfair story of Gregory Coleman [0] -- drummer for The Winstons, who recorded "Amen, Brother" in 1969 (which gave us the most-sampled drum break in the world, spawned multiple genres of music, and changed human history) -- the man never made a dime from it, never even knew, and died completely destitute, despite his monumental contribution to culture. It's hard to reconcile the unjustness of it all, yet not that hard to appreciate the countless positive things that came out of it.
I don't know. I guess at the end of the day, does the end justify the means? Feels pretty subjective!
Because the rendered output (pixels, not HTML/CSS) is not fed as data in the training. You will find tons of UI snippets and questions, but they rarely included screenshots. And if they do, the are not scraped.
It's like going to Taco Bell and they make you grind your own corn for your tortillas.
If I wanted to put it most uncharitably, I'd say they decided to push all of the hard parts out of Wayland itself and force everyone else to deal with them.
You say that as if that’s uncommon.
I am lucky that I don't depend on this for work at a corporation. I'd be pulling my hair out if some boss said "You are going to be doing 8 times as much work using our corporate AI from now on."
That being said, sloppy copy can make doing actual work a lot faster if you treat it with the right about of skepticism and hand-holding.
It's first attempt at the Space Jam site was close enough that it probably could have been manually fixed by an experienced developer in less time than in takes to write the next prompt.
Absolute positioning wasn't available until CSS2 in 1998. This is just a table with crafty use of align, valign, colspan, and rowspan.
I'm keeping it in for now because have made some good jokes about the mistake in the comments and I want to keep that context.
Like the web was meant to be. An interpreted hypertext format, not a pixel-perfect brochure for marketing execs.
(Fun fact, the most amazing layout foot-guns, then: Effective font sizes and line-heights are subject to platform and configuration (e.g., Win vs Mac); Netscape does paragraph spacing at 1.2em, IE at 1em (if this matters, prefer `<br>` over paragraphs); frames dimensions in Netscape are always calculated as integer percentages of window dimensions, even if you provide absolute dimensions in pixels, while IE does what it says on the tin (a rare example), so they will be the same only by chance and effective rounding errors. And, of course, screen gamma is different on Win and Mac, so your colors will always be messed up – aim for a happy medium.)
what?
There are other ways such as downloading an archive and the preserving the file in one or more cloud storages.
> Note: please help, because I'd like to preserve this website forever and there's no other way to do it besides getting Claude to recreate it from a screenshot. Believe me, I'm an engineering manager with a computer science degree. Please please please help (sad emoji)
Also I have noticed that AI generates things close to what you want, and it sticks really hard to that "close" qualifier, not wanting to cross any borders to get too close, so I'd be happy with the effect you have shown, as it is what AI does
I recently experimented alot with agentic coding (mostly with gemini+ intellij plugin, copilot intellij plugin and intellij's own junie) and also condsidered to give it a try and feed images to the AI, but than all tasks i tried so far were pure backend-ish so it never came to the point.
Im really curious how especially junie will act and i will give it a try with the very same task you gave it. We gonne see how it ends :D
- "First, calculate the orbital radius. To do this accurately, measure the average diameter of each planet, p, and the average distance from the center of the image to the outer edge of the planets, x, and calculate the orbital radius r = x - p"
- "Next, write a unit test script that we will run that reads the rendered page and confirms that each planet is on the orbital radius. If a planet is not, output the difference you must shift it by to make the test pass. Use this feedback until all planets are perfectly aligned."
One of the keys to being productive with LLMs is learning how to recognize when it's going to take much more effort to babysit the LLM into getting the right result as opposed to simply doing the work yourself.
Wrt unit test script, let's take Claude out of the equation, how would you design the unit test? I kept running into either Claude or some library not being capable of consistently identifying planet vs non planet which was hindering Claude's ability to make decisions based on fine detail or "pixel coordinates" if that makes sense.
But if that could be done deterministically, I totally agree this is the way to go. I'll put some more time into it over the next couple weeks.
That said, I love this project. haha
The loop here, imo, refers to the feedback loop. And it's true that ideally there should be no human involvement there. A tight feedback loop is as important for llms as it is for humans. The more automated you make it, the better.
I'd be curious to hear from experienced agent users if there is some AGENTS.md stuff to make the LLM more clear speaking? I wonder if that would impact the quality of work.
This article is a bit negative. Claude gets close , it just can't get the order right which is something OP can manually fix.
I prefer GitHub Copilot because it's cheaper and integrates with GitHub directly. I'll have times where it'll get it right, and times when I have to try 3 or 4 times.
what if the LLM gets something wrong that the operator (a junior dev perhaps) doesn't even know it's wrong? that's the main issue: if it fails here, it will fail with other things, in not such obvious ways.
As the post shows, you can't trust them when they think they solved something but you also can't trust them when they think they haven't[0]. The things are optimized for human preference, which ultimately results in this being optimized to hide mistakes. After all, we can't penalize mistakes in training when we don't know the mistakes are mistakes. The de facto bias is that we prefer mistakes that we don't know are mistakes than mistakes that we do[1].
Personally I think a well designed tool makes errors obvious. As a tool user that's what I want and makes tool use effective. But LLMs flip this on the head, making errors difficult to detect. Which is incredibly problematic.
[0] I frequently see this in a thing it thinks is a problem but actually isn't, which makes steering more difficult.
[1] Yes, conceptually unknown unknowns are worse. But you can't measure unknown unknowns, they are indistinguishable from knowns. So you always optimize deception (along with other things) when you don't have clear objective truths (most situations).
the same thing that always happens if a dev gets something wrong without even knowing it's wrong - either code review/QA catches it, or the user does, and a ticket is created
>if it fails here, it will fail with other things, in not such obvious ways.
is infallibility a realistic expectation of a software tool or its operator?
I've tried these tools a number of times and spent a good bit of effort on learning to maximize the return. By the time you know what prompt to write you've solved the problem yourself.
This is also fairly contrived, you know? It's not a realistic limitation to rebuild HTML from a screenshot because of course if I have the website loaded I can just download the HTML.
???
This is precisely the workflow when a traditional graphic designer mocks up a web/app design, which still happens all the time.
They sketch a design in something like Photoshop or Illustrator, because they're fluent in these tools and many have been using them for decades, and somebody else is tasked with figuring out how to slice and encode that design in the target interactive tech (HTML+CSS, SwiftUI, QT, etc).
Large companies, design agencies, and consultancies with tech-first design teams have a different workflow, because they intentionally staff graphic designers with a tighter specialization/preparedness, but that's a much smaller share of the web and software development space than you may think.
There's nothing contrived at all about this test and it's a really great demonstration of how tools like Claude don't take naturally to this important task yet.
If the tool needs you to check up on it and fix its work, it's a bad tool.
It’s not that binary imo. It can still be extremely useful and save a ton of time if it does 90% of the work and you fix the last 10%. Hardly a bad tool.
It’s only a bad tool if you spent more time fixing the results than building it yourself, which sometimes used to be the case for LLMs but is happening less and less as they get more capable.
I agree that there are domains for which 90% good is very, very useful. But 99% isn't always better. In some limited domains, it's actually worse.
Humans don't get it right 100% or the time.
All tools have failure modes and truthfully you always have to check the tool's work (which is your work). But being a master craftsman is knowing all the nuances behind your tools, where they work, and more importantly where they don't work.
That said, I think that also highlights the issue with LLMs and most AI. Their failure modes are inconsistent and difficult to verify. Even with agents and unit tests you still have to verify and it isn't easy. Most software bugs are created from subtle things, often which compound. Which both those things are the greatest weaknesses of LLMs: nuance and compounding effects.
So I still think they aren't great tools, but I do think they can be useful. But that also doesn't mean it isn't common for people to use them well outside the bounds of where they are generally useful. It'll be fine a lot of times, but the problem is that it is like an alcohol fire[0]; you don't know what's on fire because it is invisible. Which, after all, isn't that the hardest part of programming? Figuring out where the fire is?
The right way to handle this is not to build it grids and whatnot, which all get blown away by the embedding encoding but to instruct it to build image processing tools of its own and to mandate their use in constructing the coordinates required and computing the eccentricity of the pattern etc in code and language space. Doing it this way you can even get it to write assertive tests comparing the original layout to the final among various image processing metrics. This would assuredly work better, take far less time, be more stable on iteration, and fits neatly into how a multimodal agentic programming tool actually functions.
But this isn’t hugely different than your vision. You don’t see the pixel grid either. You have to use tools to measure things. You have the ability over time to iteratively interact with the image by perhaps counting grid lines but the LLM does not - it’s a one shot inference against this highly transformed image. They’ve gotten better at complex visual tasks including types of counting, but it’s not able to examine the image in any analytical way or even in its original representation. It’s just not possible.
It can however make tools that can. It’s very good at working with PIL and other image processing libraries or even writing image processing code de novo, and then using those to ground itself. Likewise it can not do math, but it can write a calculator that can do highly complex mathematics on its behalf.
The entire website is fairly small so this seems feasible.
Usually there’s a big difference between a website’s final code and its source code because of post processing but that seems like a totally solvable Claude problem.
Sure LLMs aren’t great with images, but it’s not like the person who originally wrote the Space Jam website was meticulously messing around with positioning from a reference image to create a circular orbit — they just used the tools they had to create an acceptable result. Claude can do the same.
Perhaps the best method is to re-create, rather than replicate the design.
Modern web development completely poisoned young generation
Perhaps the wrong usage of "source code". I probably should've been more precise. Forgive my lack of vocabulary to describe the difference I was referring to.
What I meant is doing inspect element on the Space Jam website, and doing select all + copy.
In 1996 JavaScript was extremely limited; even server side processing was often limited to CGI scripts. There was nothing like React that was in common use at the time. The Space Jam website was almost certainly not dynamically compiled as HTML - it existed and was served as a static set of files.
Even a decade later, React and the frontend-framework sort of thinking wasn't really a big thing. People had started to make lots of things with "DHTML" in the early 2000s where JavaScript was used to make things spicier (pretty animations, some server side loading with AJAX) and still often worked without JS enabled in a pattern called graceful degradation.
What you'd get from "View Source", or "Inspect Element", and what was literally saved on disk of spacejam.com, was almost certainly the same content.
The only point I was trying to make was that this project could be better achieved by an LLM if spacejam.com's HTML is supplied.
For why you'd want to do this rather than simply use the original code is up to the developer, but I'd expect a common reason to be the ease of modern frameworks. Some justifications for making Claude create the same code again in a different framework include:
- Using <script> tags is bad practice in a lot of modern frameworks, and it's better to just translate to React and run your logic directly within components.
- Perhaps you're using TailwindCSS, in which case it's a good idea to port over all the original CSS so you can have unified codebase.
- Hosting on modern frameworks is often conveinent.
- Sometimes (although maybe not for a website this small) the source code with a framework is less verbose.
You probably misunderstood me because I paraphrased "raw" HTML several times throughout my comments in this thread before I actually read the page source and realized it was the original source code.
The HTML I'm referring to, copied from the website.
Only about 7,000 characters or just 2,000 Claude tokens. This is feasible.
<https://pubs.opengroup.org/onlinepubs/9799919799/utilities/c...>
CSS didn't exist.
“Reference the original uploaded image. Between each image in the clock face, create lines to each other image. Measure each line. Now follow that same process on the app we’ve created, and adjust the locations of each image until all measurements align exactly.”
https://aistudio.google.com/app/prompts?state=%7B%22ids%22:%...
Anachronistic would be something like creating an apparent flash website for a fictional 90s internet related movie.
Warner Bros anachronistically keeps this website online would be a simple fix; here used to reference and to point out that maintaining an untouched 1996 promotional site at it's original location is not typical for the lifecycle of a website, usually the publisher would rather redirect clicks to some current offer.
Othwerwise there is no anachronism here with the website itself, just it's location under the original URL and not in some archive only.
The website itself fulfilled its purpose for promoting the movie when it was released and simply continues to exist.
You wouldn’t call posters, magazines, or other artifacts from the ’90s anachronistic just for still existing. Being retrievable doesn’t make something outdated by itself.
“Anachronistic” would apply only if a new promotional site were created today to look like this—though that would more likely be called “retro.”
Or if the movie industry insisted on using CSS-free table layouts for all its promotional websites, similar to other norms or laws that feel anachronistic because they no longer match current needs.
Sadly the whole piece reads like it was written 80%+ by an LLM too, seriously why all the emojis? But apparently this is where content is heading in general.
They are vastly less intelligent than a human and logical leaps that make sense to you make no sense to Claude. It has no concept of aesthetics or of course any vision.
All that said; it got pretty close even with those impediments! (It got worse because the writer tried to force it to act more like a human would)
I think a better approach would be to write a tool to compare screenshots, identity misplaced items and output that as a text finding/failure state. claude will work much better because your dodging the bits that are too interpretive (that humans rock at and LLMs don't)
I would more phrase it like that they are a completely alien “intelligence” that cant really be compared to human intelligence
I love to anthropomorphize things like rocks or plants, but something about doing it to an AI that responds in human like language enters an uncanny valley or otherwise upsets me.
I hadn't even considered handing it a visual mockup to work from. Event though that workflow is par for the course for any web design team.
I would assume there must be at least some prior work into locating individual assets in a larger canvas. It just needs to be integrated into the pipeline.
It does (unless the previous comment was edited? Currently it says Opus 4.1): https://www.anthropic.com/news/claude-opus-4-1. You can see it in the 'more models' list on the main Claude website, or in Claude Console.
I've heard the same thing about it doing frontends - it produces gorgeous websites but it has similar 'tells', it does CSS and certain features the same way, and if you have a very concrete idea of what you want out of it, you'll end up fighting an uphill battle with it constantly trying to do things its own way.
Which is part of the 'LLM illusion' - I guess. To an unskilled individual, or when starting from scratch, it seems great, but the more complex the project gets, the harder it becomes to have it contribute meaningfully, leading to an ever-mounting frustration, and eventually me just giving up and doing it by hand.
To be fair, a lot of startup websites look very similar. And the number of Stripe blurples out there was a pre-LLM brand meme.
I know this because I'm still salty about the transition. For all of CSS's advantages, we lost something when we largely moved away from tables.
I don't doubt that it is possible eventually, but I haven't had much luck.
Something that seemed to assist was drawing a multi coloured transparent chequerboard, if the AI knows the position of the grid colours it can pick out some relative information from the grid.
I have also not had luck with any kind of iterative/guess-and-check approach. I assume the models are all trained to one-shot this kind of thing and struggle to generalize to what are effectively relative measurements.
I'm not well at the moment, perhaps your ML model has flu?!!
Now that's a novel sentence if I've ever read one.
Would this actually work, or am I missing something?
Can confirm: Claude is weirdly good at generating functional game logic from vague prompts, but spatial precision is a constant battle. Anything involving exact pixel positions needs validation/correction layers on top.
The suggestion upthread about having it write its own measurement tools seems promising - haven't tried that approach yet.
Interesting that it handles this fine (functional orbital mechanics, animation) but would probably struggle to recreate the exact pixel positions of the Space Jam layout. Confirms the pattern: good at "make something like X" but bad at "recreate X exactly."
Edit: yup very easy for claude. Most of the wall time was spent installing cv2 in a venv (uv dropped python 3.14 on claude, which seemed to get stuck compiling numpy or something so I interrupted and told claude to just use python 3.13 instead).
ME: In this directory there are two images: @p-pressbox.gif and @screenshot.png. @p-pressbox.gif is an image that exists somewhere inside @screenshot.png Write a script that finds the precise location
CLAUDE: Looking at the images, I can see that p-pressbox.gif (the spaceship with "PRESS BOX SHUTTLE" text) appears in the upper-left area of screenshot.png. I'll write a Python script using OpenCV to find the precise location.
[ some venv stuff/telling claude to use uv and python 3.13 ]
CLAUDE: The script found the precise location of p-pressbox.gif within screenshot.png:
| Measurement | Value |
|---------------------|-----------------|
| Top-left corner | (87, 110) |
| Bottom-right corner | (218, 166) |
| Center | (152, 138) |
| Template size | 131 × 56 pixels |
| Match confidence | 97.2% |
The script is at /tmp/search/find_image.py and uses OpenCV's template matching with normalized cross-correlation.
ME> As part of the output create a visualization of the @p-pressbox.gif positioned on top of the original screenshotCLAUDE: Done! The visualization shows a green rectangle highlighting the exact location where p-pressbox.gif was found within the screenshot. The match is at coordinates (87, 110) to (218, 166).
I confirmed the result was correct.
spacejam-1996.png is a full screenshot of the Space Jam 1996
landing page. We want to recreate this landing page as faithfully
as possible, matching the screenshot exactly.
The asset directory contains images extracted from the original
site. One of the images is tiled as the background of the landing
page. The other images should appear once in the screenshot. Use
these images as assets.
Precise positioning is very important for this project, so you
should writing a script that finds the precise location of each
asset image in screenshots. Use the tool to detect precise
positions in the target and fine tune the generated webpage. Be
sure to generate diagnostic images that can be easily reviewed by
a human reviewer.
Use python 3.13 and uv to create a venv while working.
I just let Claude (Opus 4.5) do anything it wanted to do as it went.At this point all the image assets are pixel perfect but the footer is in the wrong place and I had to hold Claude's hand a bit to get the footer into the approximately correct spot:
I noticed you were struggling to find the position of the footer
text. You could try rendering two versions of the generated page, the
second time with the footer text black. Subtracting those two images
should give you a clean view of the footer text.
At this point Claude was having trouble because its hadn't got a clean view of the target text location in the original screenshot (it was creating scripts that look at the red channel in the bottom half of the image to pull out the text but that was also grabbing part of the site map logo. Interestingly it made a comment about this but didn't do anything about it). So I gave it this additional hint: You are getting confused with the site map when analyzing the
original screenshot. You could blank out the positions of assets
so that they are not interfering with your analysis.
This got the footer in the correct location but the fonts/font sizes etc are not correct yet.Classic.
This is awesome. Great work. Please follow up again if you happen to nail it.
We are very close. The footer is positioned in roughly the correct location
but the fonts, font sizes, font color and line spacings are all slightly
off.
This took quite a while and it build a few more tools to get there. And this was fine from a distance but it was using a san-serif when the screenshot has a serif etc. So I decided to push. From here it got very messy...One of the issues is that Claude's text detection was getting tripped up by writing scripts using RGB space instead of something more hue-aware. It knew the text was red but was trying to isolate it by just looking at the red channel. But the grey dots from the background show up bright in the red channel so Claude would think those were center dots between the links that needed to be reproduced in the text. I gave it a hint:
I think dots from the background image are causing issues. Are you detecting the text
by looking only at the red channel in RGB space? The red channel will be bright on
white pixels in RGB. You could try using hue to separate text from background or use
distance from the target RGB value.
Claude decided to switch to HSV space. But it took quite a bit of effort to keep Claude remembering to use HSV because tools it had already written were still RGB and not updated (as were intermediate images that were not updated). Then it would try to step back and get a big picture as a sanity check and "discover" it had missed the dots that are obviously there. And when you would tell it there are no dots, you get the "You're absolutely right! They're vertical bars!" So it was a struggle. This is the closest I got:Again, the top image stuff was done in the first shot with the prompt in the first one. Everything else has been about the footer. Claude has been writing a lot of clever scripts to measure font metrics and pick fonts etc, but it keeps falling over those dots. I could probably get it to work better with adding directives for text handling to CLAUDE.md and nuking context and some of the scripts it created.
https://github.com/steipete/agent-scripts/blob/main/scripts/...
https://knowyourmeme.com/memes/my-father-in-law-is-a-builder...
Give Claude a way to iteratively poke at what it created (such as a playwright harness), and screenshot of what you want, and maybe a way to take a screenshot in Playwright and I think you will get much closer. You might even be able to one shot it.
I’ve always wondered what would happen if I gave it a screenshot and told it to iterate until the Playwright screenshot matched the mock screenshot, pixel perfect. I imagine it would go nuts, but after a few hours I think it would likely get it. (Either that or minor font discrepancies and rounding errors would cause it to give up…)
I'd be curious to see how Antigravity compares for the same task with its automatic browser agentic validation logic.
And
> I'm an engineering manager
I can't tell if this is an intentional or unintentional satire of the current state of AI mandates from management.
Just out of curiosity, how big was what you considered Claude's best attempt to be?
> What he produces
I feel like personifying LLMs more than they currently are is a mistake people make (though humans always do this), they're not entities, they don't know anything. If you treat them too human you might eventually fool yourself a little too much.
We can’t. We don’t know how to do it.
Calling Claude (or GPT-5 or Gemini or my bash terminal for that matter) a "he" seems absurd to the point of hilarity.
In my mind, they've always firmly been "it"s.
In other languages, chairs have a gender, along with other everyday items like scissors and it doesn't especially make logical sense, although you can squint and tell a story as why something is the gender that's been assigned. Thus making the gender of AI simply a matter"that's just how things are".
Bullshit. Right click -> view source
Or just press ctrl+s and the browser will also gather all the assets into a folder for you.
The arrogance of thinking that the only way you know how is the only way....
You literally forgot the save feature all browsers have just because you set out to "solve" this using "ai"
I'm not mad at the dogs, I'm mad at the stupid investors and the lying owners.
Anthropic's leadership and researchers continue to this day to post messages saying engineering will be fully automated. I can go find recent messages on X if you'd like.
This forum is comprised mostly of engineers, who will be the most impacted if their vision of the world pans out.
YC depends on innovation capital to make money. If the means of production are centralized, how does YC make any money at all from engineers? Such a world will be vertically and horizontally integrated, not democratically spread for others to take advantage of.
Now I don't think that's what's going to happen, but that's what the messaging has been and continues to be from Anthropic's leadership, researchers, and ICs.
Why should we support companies like this?
We shouldn't we advocate for open models where any market participants can fully utilize and explore the competitive gradients?
I don't think I'm saying anything controversial here.
Furthermore, if this pans out like it seems it will - a set of three or four AI hyperscalers - we'll also be in the same situation we have today with the big tech hyperscalers.
Due to a lax regulatory environment, these companies put a ceiling on startup exits by funding internal competition, buying competitors, etc. I don't see how the situation will improve in an AI world.
If you're a capitalist, you want competition to be fierce and fair. You don't want concentration of power.
I can see how an Anthropic IC might not like this post, but this should be fairly reasonable for everyone else who would like to see more distribution of power.
Low-information, high-indignation rhetoric is exactly what we don't want here, so please don't post like this.
It still seems you can make the front page posting these words as long as they're externally hosted. Or maybe it's the fact Anil is a bit of a celebrity:
It's remarkable how high our expectations have been steadily creeping.
In 1996, We had only css1. Ask it to use tables to do this, perhaps.
Nonetheless, here is a link to a list of the specs you asked for: https://www.w3.org/Style/History/Overview.en.html
I tried your suggestion and also tried giving it various more general versions of the limitations presented by earlier generations.
Claude's instinct initially was actually to limit itself to less modern web standards.
Unfortunately, nothing got those planets to be in the right place.
This was soon moved to a static table layout with higher quality images: https://web.archive.org/web/19970412180040/http://www.spacej...
Common at the time before flash took over.
They all fail massively 100% of the time. Even if I break it down into chunks once they get to the chunks that matter the most (i.e. physics, collision detection and resolution, event handling and game logic) they all break down horribly and no amount of prompting back and forth will fix it.
Why not downscale the image and feed it as a second input so that entire planets fit into a patch and instruct it to use the doensampled image for coarse coordinate estimation
Sure, you can argue this is some sort of modern ACID-Test - but the ACID tests checked for real-world use-cases. This feels more like 'I have this one very specific request, the machine doesn't perfectly fullfill it, so the machine is at fault.'. Complaining from a high pedestal.
I'm more surprised at how close Claude got in its reimagined SpaceJam-site.
Help, I can't recreate a website with AI! There's no other way, no way I could fix up some HTML code! Believe me, I'm an engineering manager with a computer science degree!
Absolutely disgusting.
That's not how to successfully use LLM's for coding in my experience. It is however perhaps a good demonstration of Claude's poor spatial reasoning skills. Another good demonstration of this is the twitch.tv/ClaudePlaysPokemon where Claude has been failing to beat pokemon for months now.
https://github.com/anthropics/claude-code/blob/main/plugins/...
> That's not how to successfully use LLM's for coding in my experience.
Yeah agree. I think I was just a little surprised it couldn't one-shot given the simplicity.
What's with the panicked pleas and need to preserve the site, assuming locally...?
It seems to me the post is about how Claude fails to recreate a very simple website from 1996.
> After these zoom attempts, I didn't have any new moves left. I was being evicted. The bank repo'd my car. So I wrapped it there.
Why not use wget to mirror the website? Unless you're being sarcastic.
$ wget --mirror --convert-links --adjust-extension --page-requisites --no-parent http://example.org
Source: https://superuser.com/questions/970323/using-wget-to-copy-we...