FilterHN

We put Claude Code in Rollercoaster Tycoon

228 points

by iamwil

5 days ago

| past

| 25 comments

| labs.ramp.com

| HN

▲

rashidae

56 minutes ago

[-]

> As a mirror to real-world agent design: the limiting factor for general-purpose agents is the legibility of their environments, and the strength of their interfaces. For this reason, we prefer to think of agents as automating diligence, rather than intelligence, for operational challenges.

▲

Jaysobel

23 minutes ago

[-]

Author here - some bonus links!

Session transcript using Simon Willison's claude-code-transcripts

https://htmlpreview.github.io/?https://gist.githubuserconten...

Reddit post

https://www.reddit.com/r/ClaudeAI/comments/1q9fen5/claude_co...

OpenRCT2!!

https://github.com/jaysobel/OpenRCT2

Project repo

https://github.com/jaysobel/OpenRCT2

▲

theptip

3 minutes ago

[-]

Did you eval using screenshots or some sort of rendered visualization instead of the CLI? I wonder if Claude has better visual intelligence when viewing images (lots of these in its training set) rather than ascii schematics (probably very few of these in the corpus).

▲

hk__2

3 hours ago

[-]

> The only other notable setback was an accidental use of the word "revert" which Codex took literally, and ran git revert on a file where 1-2 hours of progress had been accumulating.

▲

alt227

1 hour ago

[-]

I wonder how they accidentaly used a word like that.

▲

gbear605

1 hour ago

[-]

“Please revert that last change you did”, referring to like a smaller change that had just been done

▲

esafak

2 hours ago

[-]

Does Codex not let you set command permissions?

▲

_flux

2 hours ago

[-]

Amazing that these tools don't maintain a replayable log of everything they've done.

Although git revert is not a destructive operation, so it's surprising that it caused any loss of data. Maybe they meant git reset --hard or something like that. Wild if Codec would run that.

▲

arcanemachiner

35 minutes ago

[-]

I was looking at the insanity known as Gas Town [0] the other day, and it does use Git to store historical work state in something it calls "beads":

https://github.com/steveyegge/gastown?tab=readme-ov-file

▲

calebkaiser

14 minutes ago

[-]

If anyone is curious, Beads is an agent memory project from the same developer: https://github.com/steveyegge/beads

▲

theptip

8 minutes ago

[-]

Claude Code has had this feature for a few months now.

▲

rabf

1 hour ago

[-]

I have had codex recover things for me from its history after claude had done a git reset hard, codex is one of the more reliable models/harneses when it comes to performing undo and redo operations in my experience.

▲

MattGaiser

2 hours ago

[-]

Claude Code has /rewind. Not sure if it is foolproof, but this has been tried.

▲

Filligree

3 hours ago

[-]

Yet another reason to use Jujutsu. And put a `jj status` wrapper in your PS1. ;-)

▲

glemion43

30 minutes ago

[-]

It's not going to happen...

Stop spamming

▲

NewsaHackO

19 minutes ago

[-]

This is funny. I tried it once and didn't see what the benefit was. Then, when I tried to reset it back to normal git, I realized that the devs had not (at the time) made any clean way to revert it back, just a one-way conversion to jj. I haven't tried it since.

▲

steveklabnik

9 minutes ago

[-]

What were you trying to “revert back”? You should have been able to just stop using jj, there’s nothing to revert back to.

▲

diath

2 hours ago

[-]

> Yet another reason to use Jujutsu

And what would that reason be? You can git revert a git revert.

▲

jsnell

2 hours ago

[-]

You're correct for an actual git revert, but it seems pretty clear that the original authors have mangled the story and it was actually either a "git checkout" or "git reset". The "file where 1-2 hours of progress had been accumulating" phrasing only makes sense if those were uncommitted changes.

And the reason jj helps in that case is that for jj there is no such thing as an uncommitted change.

▲

MarkMarine

44 minutes ago

[-]

Also JJ undo is there and easy to tell the model to use, I have it in my Claude.md

▲

block_dagger

1 hour ago

[-]

Having no such thing as an uncommitted change seems like it would be a nightmare, but perhaps I'm just too git-oriented.

▲

steveklabnik

7 minutes ago

[-]

Things like the index become a workflow pattern, rather than a feature, if that makes any sense.

▲

mbb70

2 hours ago

[-]

Probably it actually ran git checkout or reset. As you say git revert only operates on committed snapshots so it will all be in the reflog

▲

ewoodrich

53 minutes ago

[-]

Yes, this exact scenario has happened to me a couple times with both Claude and Codex, and it's usually git checkout, more rarely git reset. They immediately realize they fucked up and spend a few minutes trying to undo by throwing random git commands at it until eventually giving up.

▲

westurner

2 hours ago

[-]

Start with env args like AGENT_ID for indicating which Merkle hash of which model(s) generated which code with which agent(s) and add those attributes to signed (-S) commit messages. For traceability; to find other faulty code generated by the same model and determine whether an agent or a human introduced the fault.

Then, `git notes` is better for signature metadata because it doesn't change the commit hash to add signatures for the commit.

And then, you'd need to run a local Rekor log to use Sigstore attestations on every commit.

Sigstore.dev is SLSA.dev compliant.

Sigstore grants short-lived release attestation signing keys for CI builds on a build farm to sign artifacts with.

So, when jujutsu autocommits agent-generated code, what causes there to be an {{AGENT_ID}} in the commit message or git notes? And what stops a user from forging such attestations?

▲

westurner

2 hours ago

[-]

- "Diffwatch – Watch AI agents touch the FS and see diffs live" (2025) https://news.ycombinator.com/item?id=45786382 :

> you can manually stage against @-: [with jujutsu]

▲

pocketarc

3 hours ago

[-]

I love the interview at the end of the video. The kubectl-inspired CLI, and the feedback for improvements from Claude, as well as the alerts/segmentation feedback.

You could take those, make the tools better, and repeat the experience, and I'd love to see how much better the run would go.

I keep thinking about that when it comes to things like this - the Pokemon thing as well. The quality of the tooling around the AI is only going to become more and more impactful as time goes on. The more you can deterministically figure out on behalf of the AI to provide it with accurate ways of seeing and doing things, the better.

Ditto for humans, of course, that's the great thing about optimizing for AI. It's really just "if a human was using this, what would they need"? Think about it: The whole thing with the paths not being properly connected, a human would have to sit down and really think about it, draw/sketch the layout to visualize and understand what coordinates to do things in. And if you couldn't do that, you too would probably struggle for a while. But if the tool provided you with enough context to understand that a path wasn't connected properly and why, you'd be fine.

▲

wonnage

45 minutes ago

[-]

I see this sentiment of using AI to improve itself a lot but it never seems to work well in practice. At best you end up with a very verbose context that covers all the random edge cases encountered during tasks.

For this to work the way people expect you’d need to somehow feed this info back into fine tuning rather than just appending to context. Otherwise the model never actually “learns”, you’re just applying heavy handed fudge factors to existing weights through context.

▲

lukebechtel

3 hours ago

[-]

> We don't know any C++ at all, and we vibe-coded the entire project over a few weeks. The core pieces of the build are…

what a world!

▲

yoyohello13

3 hours ago

[-]

Everyone should read that section. It was really interesting reading about their experiences/challenges getting it all working.

▲

AndrewKemendo

3 hours ago

[-]

I would’ve walked for days to a CompUSA and spent my life savings if there was anything remotely equivalent to this when I was learning C on my Macintosh 4400 in 1997

People don’t appreciate what they have

▲

imiric

2 hours ago

[-]

Did you actually learn C? Be thankful nothing like this existed in 1997.

A machine generating code you don't understand is not the way to learn a programming language. It's a way to create software without programming.

These tools can be used as learning assistants, but the vast majority of people don't use them as such. This will lead to a collective degradation of knowledge and skills, and the proliferation of shoddily built software with more issues than anyone relying on these tools will know how to fix. At least people who can actually program will be in demand to fix this mess for years to come.

▲

metaltyphoon

1 hour ago

[-]

I don't understand how OP thinks that being oblivious how anything work underneath is a good thing. There is a threshold of abstraction to which you must know how it works to effectively fix it when it breaks.

▲

jedberg

1 hour ago

[-]

You can be a super productive Python coder without any clue how assembly works. Vibe coding is just one more level of abstraction.

Just like how we still need assembly and C programmers for the most critical use cases, we'll still need Python and Golang programmers for things that need to be more efficient than what was vibe coded.

But do you really need your $whatever to be super efficient, or is it good enough if it just works?

▲

kshri24

53 minutes ago

[-]

One is deterministic the other is not. I leave it to you to determine which is which in this scenario.

▲

neilwilson

1 hour ago

[-]

That’s what a C compiler does when generating a binary.

There was a time when you had to know ‘as’, ‘ld’ and maybe even ‘ar’ to get an executable.

In the early days of g++, there was no guarantee the object code worked as intended. But it was fun working that out and filing the bug reports.

This new tool is just a different sort of transpiler and optimiser.

Treat it as such.

▲

wizzwizz4

1 hour ago

[-]

> There was a time when you had to know ‘as’, ‘ld’ and maybe even ‘ar’ to get an executable.

No, there wasn't: you could just run the shell script, or (a bit later) the makefile. But there were benefits to knowing as, ld and ar, and there still are today.

▲

jstummbillig

19 minutes ago

[-]

> But there were benefits to knowing as, ld and ar, and there still are today.

This is trivially true. The constraint for anything you do in your life is time it takes to know something.

So the far more interesting question is: At what level do you want to solve problems – and is it likely that you need knowledge of as, ld and ar over anything else, that you could learn instead?

▲

imiric

1 hour ago

[-]

If you don't see a difference between a compiler and a probabilistic token generator, I don't know what to tell you.

And, yes, I'm aware that most compilers are not entirely deterministic either, but LLMs are inherently nondeterministic. And I'm also aware that you can tweak LLMs to be more deterministic, but in practice they're never deployed like that.

Besides, creating software via natural language is an entirely different exercise than using a structured language purposely built for that.

We're talking about two entirely different ways of creating software, and any comparison between them is completely absurd.

▲

anthk

36 minutes ago

[-]

People negating down your comment are just "engineers" doomed to fail sooner or later.

Meanwhile, 9front users have read at least the plan9 intro and know about nm, 1-9c, 1-9l and the like. Wibe coders will be put on their place sooner or later. It´s just a matter of time.

▲

anthk

38 minutes ago

[-]

Competent C programmers know about nm, as, ld and a bunch of other binary sections in order to understand issues and proper debugging.

Everyone else are deluding themselves. Even the 9front intro requieres you to at least know the basics of nm and friends.

▲

AndrewKemendo

1 hour ago

[-]

It would’ve been nice to have a system that I could just ask questions to teach me how it works instead of having to pour through the few books that existed on C that was actually accessible to a teenager learning on their own

Going to arcane websites, forum full of neckbeards to expect you to already understand everything isn’t exactly a great way to learn

The early Internet was unbelievably hostile to people trying to learn genuinely

▲

rabf

1 hour ago

[-]

I had the books (from the library) but never managed to get a compiler for many years! Was quite confusing trying to understand all the unix references when my only experience with a computer was the Atari ST.

▲

Workaccount2

1 hour ago

[-]

It's just another layer.

Assembly programmers from years gone by would likley be equally dismissive of the self-aggrandizing code block stitchers of today.

(on topic, RCT was coded entirely in assembly, quite the achievement)

▲

lifetimerubyist

3 hours ago

[-]

It’s worse. They’re proud they don’t know.

▲

doug_durham

23 minutes ago

[-]

"They" are? I didn't see that in the article. It sounds like you are projecting your prejudices on to a non-defined out group.

▲

risyachka

3 hours ago

[-]

Its like ordering a project from upwork- someone did it for you, you have no idea what is going on, kinda works though.

▲

kmijyiyxfbklao

2 hours ago

[-]

Since there are no humans involved, it's more like growing a tree. Sure it's good to know how trees grow, but not knowing about cells didn't stop thousands of years of agriculture.

▲

kshri24

50 minutes ago

[-]

I wouldn't say it is a tree as such as at least trees are deterministic where input parameters (seed, environment, sunlight) define the output.

LLM outputs are akin to a mutant tree that can decide to randomly sprout a giant mushroom instead of a branch. And you won't have any idea why despite your input parameters being deterministic.

▲

dpc050505

12 minutes ago

[-]

You haven't done a lot of gardening if you don't know plants get 'randomly' (there's a biological explanation, but with the massive amounts of variables it feels random) attacked by parasites all the time. Go look at pot growing subreddits, they spend an enormous chunk of their time fighting mites.

▲

doug_durham

22 minutes ago

[-]

In what world are trees deterministic? There are a set of parameters that you can control that give you a higher probability of success, but uncontrollable variables can wipe you out.

▲

Jaysobel

1 hour ago

[-]

The Gas Town piece reminded me of this as well. The author there leaned into role playing, social and culture analogies, and it made a lot more sense than an architecture diagram in which one node is “black box intelligence” with a single line leading out of it…

▲

risyachka

1 hour ago

[-]

Its not like tree at all because tree is one and done.

Code is a project that has to be updated, fixed, etc.

So when something breaks - you have to ask the contractor again. It may not find an issue, or mess things up when it tries to fix it making project useless, etc.

Its more like a car. Every time something goes wrong you will pay for it - sometimes it will get back in even worse shape (no refunds though), sometimes it will cost you x100 because there is nothing you can do, you need it and you can't manage it on your own.

▲

ambicapter

2 hours ago

[-]

Very interesting analogy

▲

amlib

1 hour ago

[-]

Except that the tree is so malformed and the core structure so unsound that it can't grow much past its germination and dies of malnourishment because since you have zero understanding of biology, forestry and related fields there is no knowledge to save it or help it grow healthy.

Also out of nowhere an invasive species of spiders that was inside the seed starts replicating geometrically and within seconds wraps the whole forest with webs and asks for a ransom in order to produce the secret enzyme that can dissolve it. Trying to torch it will set the whole forest on fire, brute force is futile. Unfortunately, you assumed the process would only plagiarize the good bits, but seems like it also sometimes plagiarizes the bad bits too, oops.

▲

datsci_est_2015

2 hours ago

[-]

Great analogy. “I don’t know any C++ but I hired some people on Upwork and they delivered this software demo.”

▲

whateveracct

1 hour ago

[-]

Con fuckign gratys, u can buy compute

▲

kinduff

15 minutes ago

[-]

It's been several times that I see ASCII being used initially for these kinds of problems. I think it's because its counter-intuitive, in the sense that for us humans ASCII is text but we tend to forget spacial awareness.

I find this very interesting of us humans interacting with AIs.

▲

nipponese

3 hours ago

[-]

> kept the context above the ~60% remaining level where coding models perform at their absolute best

Maybe this is obvious to Claude users but how do you know your remaining context level? There is UI for this?

▲

adithyareddy

3 hours ago

[-]

You can also show context in the statusline within claude code: https://code.claude.com/docs/en/statusline#context-window-us...

▲

nipponese

2 hours ago

[-]

Follow up Q: what are you supposed to do when the context becomes too large? Start a new conversation/context window and let Claude start from scratch?

▲

kcoddington

2 hours ago

[-]

Either have Claude /compact or have it output things to a file it can read in on the next session. That file would be a summary of progress for work on a spec or something similar. Also good to prime it again with the Readme or any other higher level context

▲

AlexMoffat

1 hour ago

[-]

I ask it to write a markdown file describing how it should go about performing the task. Then have it read the file next time. Works well for things like creating tests for controller methods where there is a procedure it should follow that was probably developed over a session with several prompts and feedback on its output.

▲

pbhjpbhj

2 hours ago

[-]

It feels like one could produce a digest of the context that works very similarly but fits in the available context window - not just by getting the LLM to use succinct language, but also mathematically; like reducing a sparse matrix.

There might be an input that would produce that sort of effect, perhaps it looks like nonsense (like reading zipped data) but when the LLM attempts to do interactive in it the outcome is close to consuming the context?

▲

neilfrndes

3 hours ago

[-]

Claude code has a /context command.

▲

MattGaiser

2 hours ago

[-]

/context

▲

fnordpiglet

3 hours ago

[-]

Interesting article but it doesn’t actually discuss how well it performs at playing the game. There is in fact a 1.5 hour YouTube video but it woulda been nice for a bit of an outcome postmortem. It’s like “here’s the methods and set up section of a research paper but for the conclusion you need to watch this movie and make your own judgements!”

▲

Sharlin

3 hours ago

[-]

It does discuss that? Basically it has good grasp of finances and often knows what "should" be done, but it struggles with actually building anything beyond placing toilets and hotdog stalls. To be fair, its map interface is not exactly optimal, and a multimodal model might fare quite a bit better at understanding the 2D map (verticality would likely still be a problem).

▲

cyanydeez

3 hours ago

[-]

I was told the important part of AI is the generation part, not the verification or quality.

▲

TaupeRanger

1 hour ago

[-]

I corroborate that spatial reasoning is a challenge still. In this case, it's the complexity of the game world, but anyone who has used Codex/Claude with complex UIs in CSS or a native UI library will recognize the shortcomings fairly quickly.

▲

phreeza

2 hours ago

[-]

Claude Code in dwarf fortress would be wild

▲

rsanek

1 hour ago

[-]

https://www.youtube.com/watch?v=FLmPN03ZQbM

▲

__turbobrew__

28 minutes ago

[-]

Given dwarf fortress has an ASCII interface it may actually be a lot easier to set up claude to work with it. Also, a lot of the challenges of dwarf fortress is just knowing all the different mechanics and how they work which is something claude should be good at.

▲

haunter

3 hours ago

[-]

This is what I want but for PoE/PoE2 builds. I always get a headache just looking at the passive tree https://poe.ninja/poe2/passive-skill-tree

▲

equinumerous

3 hours ago

[-]

This is a cool idea. I wanted to do something like this by adding a Lua API to OpenRCT2 that allows you to manipulate and inspect the game world. Then, you could either provide an LLM agent the ability to write and run scripts in the game, or program a more classic AI using the Lua API. This AI would probably perform much better than an LLM - but an interesting experiment nonetheless to see how a language model can fare in a task it was not trained to do.

▲

equinumerous

3 hours ago

[-]

As far as a scripting API, it looks like the devs beat me to it with a JS/TS plugin system: https://github.com/OpenRCT2/OpenRCT2/blob/develop/distributi...

▲

Kapura

2 hours ago

[-]

"i vibe coded a thing to play video games for me"

i enjoy playing video games my own self. separately, i enjoy writing code for video games. i don't need ai for either of these things.

▲

markbao

1 minute ago

[-]

That’s not the point of this. This was an exercise to measure the strengths and weaknesses of current LLMs in operating a company and managing operations, and the video game was just the simulation engine.

▲

gordonhart

2 hours ago

[-]

Yeah, but can you use your enjoyment of video games as marketing material to justify a $32B valuation?

▲

Jaysobel

1 hour ago

[-]

actually it was all to drive traffic to my 'rollercoaster coasters' Etsy store

https://bansostudio.etsy.com

▲

TaupeRanger

1 hour ago

[-]

^ this guy funds

▲

SV_BubbleTime

1 hour ago

[-]

Not so sure. He said justify.

▲

rangestransform

1 hour ago

[-]

I actually think it would be pretty fun to code something to play video games for me, it has a lot of overlap with robotics. Separately, I learned about assembly from cheat engine when I was a kid.

▲

bigyabai

2 hours ago

[-]

That's fine. Tool-assisted speedruns long predate LLMs and they're boring as hell: https://youtu.be/W-MrhVPEqRo

It's still a neat perspective on how to optimize for super-specific constraints.

▲

echelon

1 hour ago

[-]

You do you. I find this exceedingly cool and I think it's a fun new thing to do.

It's kind of like how people started watching Let's Plays and that turned into Twitch.

One of the coolest things recently is VTubers in mocap suits using AI performers to do single person improv performances with. It's wild and cool as hell. A single performer creating a vast fantasy world full of characters.

LLMs and agents playing Pokemon and StarCraft? Also a ton of fun.

▲

jsbisviewtiful

2 hours ago

[-]

AI for the sake of AI. Feels like a lot of the internet right now

▲

js4ever

1 hour ago

[-]

Most interesting phrase: "Keeping all four agents busy took a lot of mental bandwidth."

▲

khoury

4 hours ago

[-]

Can't wait for someone to let Claude control a runescape character from scratch

▲

itsgrimetime

19 minutes ago

[-]

I've done this! Given the right interface I was surprised at how well it did. Prompted it "You're controlling a character in Old School RuneScape, come up with a goal for yourself, and don't stop working on it until you've achieved it". It decided to fish for and cook 100 lobsters, and it did it pretty much flawlessly!

Biggest downside was it's inability to see (literally), getting lists of interact-able game objects, NPCs, etc was fine when it decided to do something that didn't require any real-time input. Sailing, or anything that required it to react to what's on screen was pretty much impossible without more tooling to manage the reacting part for it (e.g. tool to navigate automatically to some location).

▲

ASpring

2 hours ago

[-]

People have been botting on Runescape since the early 2000s. Obviously not quite at the Claude level :). The botting forums were a group of very active and welcoming communities. This is actually what led me to Java programming and computer science more broadly--I wrote custom scripts for my characters.

I still have some parts of the old Rei-net forum archived on an external somewhere.

▲

reactordev

4 hours ago

[-]

https://www.reddit.com/r/2007scape/comments/1qeh3nc/i_added_...

https://ubos.tech/mcp/runescape-mcp-server-rs-osrs/

▲

ideashower

2 hours ago

[-]

Wouldn't that break Jagex's TOS though? Is there a way of getting caught?

▲

AstroBen

2 hours ago

[-]

I imagine Jagex must be up there with having the most sophisticated bot detection out of anyone. Its been a thing for decades

▲

dpc050505

7 minutes ago

[-]

They detect bots but let a ton of them run free because any character having membership = revenue and an extremely significant chunk of active characters are bots. They nuked them all in 2011 I think and the game was nearly empty.

SirPugger's youtube channel has loads of videos monitoring various bot farms.

▲

sriram_sun

2 hours ago

[-]

> "Where Claude excels:"

Am I reading a Claude generated summary here?

▲

alt227

1 hour ago

[-]

I thought it sounded more like an ad for Claude written by Anthropic:

> "This was surprising, but fits with Claude's playful personality and flexible disposition."

▲

vidarh

1 hour ago

[-]

This sounds as expected to me as a heavy user of Opus. Claude absolutely has a "personality" that is a lot less formal and more willing to "play along" with more creative tasks than Codex. If you want an agent that's prepared to just jump in, it's a plus. If you want an agent that will be careful, considered and plan things out meticulously, it's not always so great - I feel that when you want Claude to do reptitive, tedious tasks, you need to do more work to prevent it from getting "bored" and try to take shortcuts or find something else to do, for example.

▲

alt227

46 minutes ago

[-]

> when you want Claude to do reptitive, tedious tasks, you need to do more work to prevent it from getting "bored"

Is this sentance seriously about a computer? Have we gone so far that computers wont just do what we tell them to anymore?

▲

mentos

3 hours ago

[-]

The opening paragraph I thought was the agent prompt haha

> The park rating is climbing. Your flagship coaster is printing money. Guests are happy, for now. But you know what's coming: the inevitable cascade of breakdowns, the trash piling up by the exits, the queue times spiraling out of control.

▲

neom

3 hours ago

[-]

Wonder how it would do with Myst.

▲

alt227

1 hour ago

[-]

Surely it must have digested plenty of walkthroughs for any game?

A linear puzzle game like that I would just expect the ai to fly through first time, considering it has probably read 30 years of guides and walkthroughs.

▲

singpolyma3

54 minutes ago

[-]

The real test would be to try it on a new game of the same style and complexity

▲

rnmmrnm

2 hours ago

[-]

this is cute but i imagined prompting the ai for a loop-di-loop roller coaster. If this could build complex ride it would be a game changer.

▲

blibble

2 hours ago

[-]

yeah I was expecting it to... do something in the game? like build a ride

not just make up bullshit about events

▲

skybrian

4 hours ago

[-]

Would a way to take screenshots help? It seems to work for browser testing.

▲

joshribakoff

4 hours ago

[-]

I’ve been doing game development and it starts to hallucinate more rapidly when it doesn’t understand things like the direction it placing things or which way the camera is oriented

Gemini models are a little bit better about spatial reasoning, but we’re still not there yet because these models were not designed to do spatial reasoning they were designed to process text

In my development, I also use the ascii matrix technique.

▲

kleene_op

3 hours ago

[-]

Spatial awareness was also a huge limitation to Claude playing pokemon.

It really seems to me that the first AI company getting to implement "spatial awareness" vector tokens and integrating them neatly with the other conventional text, image and sound tokens will be reaping huge rewards. Some are already partnering with robot companies, it's only a matter of time before one of those gets there.

▲

nszceta

3 hours ago

[-]

This is also my experience with attempting to use Claude and GLM-4.7 with OpenSCAD. Horrible spatial reasoning abilities.

▲

hypercube33

3 hours ago

[-]

I disagree. With opus I'll screenshot an app and draw all over it like a child with me paint and paste it into the chat - it seems to reasonably understand what I'm asking with my chicken scratch and dimensions.

As far as 3d I don't have experience however it could be quite awful at that

▲

miohtama

3 hours ago

[-]

They would need a spatial reason or layout specific tool, to translate to English and back

▲

falcor84

3 hours ago

[-]

I wonder if they could integrate a secondary "world model" trained/fine-tuned on Rollercoaster Tycoon to just do the layout reasoning, and have the main agent offload tasks to it.

▲

joshcsimmons

2 hours ago

[-]

Interesting this is on the ramp.com domain? I'm surprised in this tech market they can pay devs to hack on Rollercoaster Tycoon. Maybe there's some crossover I'm missing but seems like a sweet gig honestly.

▲

emeril

1 hour ago

[-]

yeah really - ramp.com is a credit card/expense platform that surely loses money right now...

pretty heavy/slow javascript but pretty functional nonetheless...

▲

HelloUsername

4 hours ago

[-]

*OpenRCT2

▲

sodafountan

2 hours ago

[-]

This was an interesting application of AI, but I don't really think this is what LLMs excel at. Correct me if I'm wrong.

It was interesting that the poster vibe-coded (I'm assuming) the CTL from scratch; Claude was probably pretty good at doing that, and that task could likely have been completed in an afternoon.

Pairing the CTL with the CLI makes sense, as that's the only way to gain feedback from the game. Claude can't easily do spatial recognition (yet).

A project like this would entirely depend on the game being open source. I've seen some very impressive applications of AI online with closed-source games and entire algorithms dedicated to visual reasoning.

I'm still trying to figure out how this guy: https://www.youtube.com/watch?v=Doec5gxhT_U

Was able to have AI learn to play Mario Kart nearly perfectly. I find his work to be very impressive.

I guess because RCT2 is more data-driven than visually challenging, this solution works well, but having an LLM try to play a racing game sounds like it would be disastrous.

▲

tadfisher

1 hour ago

[-]

Not sure if you clocked this, but the Mario Kart AI is not an LLM. It's a randomized neural net that was trained with reinforcement learning. Apologies if I misread.

▲

sodafountan

43 minutes ago

[-]

Yeah, that was the point of my post. LLMs traditionally aren't used in gaming like this.

▲

nacozarina

5 days ago

[-]

next up: Crusader Kings III

▲

Deukhoofd

3 hours ago

[-]

Crusader Kings is a franchise I really could see LLMs shine. One of the current main criticisms on the game is that there's a lack of events, and that they often don't really feel relevant to your character.

An LLM could potentially make events far more aimed at your character, and could actually respond to things happening in the world far more than what the game currently does. It could really create some cool emerging gameplay.

▲

Braini

2 hours ago

[-]

In general you are right, I expect something like this to appear in the future and it would be cool.

But isn't the criticism rather that there are too many (as you say repetitive, not relevant) events - its not like there are cool stories emerging from the underlying game mechanics anymore ("grand strategy") but players have to click through these boring predetermined events again and again.

▲

Deukhoofd

1 hour ago

[-]

You get too many events, but there aren't actually that many different events written, so you repeat the same ones over and over again. Eventually it just turns into the player clicking on the 'optimal' choice without actually reading the event.

▲

mcphage

4 hours ago

[-]

> You’re right, I did accidentally slaughter all the residents of Béziers. I won’t do that again. But I think that you’ll find God knows his own.

▲

Forgeties79

3 hours ago

[-]

Paradox future hire right here

▲

azhenley