FAWK: LLMs can write a language interpreter
58 points
2 hours ago
| 9 comments
| martin.janiczek.cz
| HN
vidarh
51 minutes ago
[-]
It's a fun post, and I love language experiments with LLMs (I'm close to hitting the weekly limit of my Claude Max subscription because I have a near-constantly running session working on my Ruby compiler; Claude can fix -- albeit with messy code sometimes -- issues that requires complex tracing of backtraces with gdb, and fix complex parser interactions almost entirely unaided as long as it has a test suite to run).

But here's the Ruby version of one of the scripts:

    BEGIN {
      result = [1, 2, 3, 4, 5]
        .filter {|x| x % 2 == 0 }
        .map {|x| x * x}
        .reduce {|acc,x| acc + x }
     puts "Result: #{result}"
    }
The point being that running a script with the "-n" switch un runs BEGIN/END blocks and puts an implicit "while gets ... end" around the rest. Adding "-a" auto-splits the line like awk. Adding "-p" also prints $_ at the end of each iteration.

So here's a more typical Awk-like experience:

    ruby -pe '$_.upcase!' somefile.txt ($_ has the whole line)
Or:

    ruby -F, -ane '$F[1]' # Extracts the second field field -F sets the default character to split on, and -a adds an implicit $F = $_.split.
That is not to detract from what he's doing because it's fun. But if your goal is just to use a better Awk, then Ruby is usually better Awk, and so, for that matter, is Perl, and for most things where an Awk script doesn't fit on the command line the only reason to really use Awk is that it is more likely to be available.
reply
qsort
1 hour ago
[-]
The money shot: https://github.com/Janiczek/fawk

Purely interpretive implementation of the kind you'd write in school, still, above and beyond anything I'd have any right to complain about.

reply
skydhash
1 hour ago
[-]
Commendable effort, but I expected at least a demo, which would showcase working code (even if it’s hacky). It’s like someone talking about a sheet music without playing it once.
reply
epolanski
1 hour ago
[-]
Even more, it's like talking about a sheet without seeing the sheet itself.
reply
slybot
1 hour ago
[-]
I did AoC 2021 until D10 using awk, it was fun but not easy and couldn't proceed further: https://github.com/nusretipek/Advent-of-Code-2021
reply
jamesu
52 minutes ago
[-]
A few months ago I used ChatGPT to rewrite a bison based parser to recursive descent and was pretty surprised how well it held up - though I still needed to keep prompting the AI to fix things or add elements it skipped, and in the end I probably rewrote 20% of it because I wasn't happy with its strange use of C++ features making certain parts hard to follow.
reply
artpar
1 hour ago
[-]
I wrote two

jslike (acorn based parser)

https://github.com/artpar/jslike

https://www.npmjs.com/package/jslike

wang-lang ( i couldn't get ASI to work like javascript in this nearley based grammar )

https://www.npmjs.com/package/wang-lang

https://artpar.github.io/wang/playground.html

https://github.com/artpar/wang

reply
TeodorDyakov
21 minutes ago
[-]
So you are using a tool to help you write code because you dont enjoy coding in order to make a tool used for coding(a computer language). Why?
reply
killerstorm
12 minutes ago
[-]
Coding has many aspects: conceptual understanding of problem domain, design, decomposition, etc, and then typing code, debugging. Can you imagine person might enjoy conceptual part more and skip over some typing exercises?
reply
bgwalter
5 minutes ago
[-]
The whole blog post does not mention the word "grammar". As presented, it is examples based and the LLM spit out its plagiarized code and beat it into shape until the examples passed.

We do not know whether the implied grammar is conflict free. We don't know anything.

It certainly does not look like enjoying the conceptual part.

reply
cl3misch
18 minutes ago
[-]
For the same reason we have Advent of Code: for fun!

I mean, he's not solving the puzzles with AI. He's creating his own toy language to solve the puzzles in.

reply
Y_Y
1 hour ago
[-]
I've been trying to get LLMs to make Racket "hashlangs"† for years now, both for simple almost-lisps and for honest-to-god different languages, like C. It's definitely possible, raco has packages‡ for C, Python, J, Lua, etc.

Anyway so far I haven't been able to get any nice result from any of the obvious models, hopefully they're finally smart enough.

https://williamjbowman.com/tmp/how-to-hashlang/

https://pkgd.racket-lang.org/pkgn/search?tags=language

reply
keepamovin
1 hour ago
[-]
Yes! I'm currently using copilot + antigravity to implement a language with ergonomic syntax and semantics that lowers cleanly to machine code targeting multiple platforms, with a focus on safety, determinism, auditability and fail-fast bugs. It's more work than I thought but the LLMs are very capable.

I was dreaming of a JS to machine code, but then thought, why not just start from scratch and have what I want? It's a lot of fun.

reply
lionkor
1 hour ago
[-]
Curious why you do this with AI instead of just writing it yourself?

You should be able to whip up a Lexer, Parser and compiler with a couple weeks of time.

reply
My_Name
30 minutes ago
[-]
Because he did it in a day, not a few weeks.

If I want to go from Bristol to Swindon, I could walk there in about 12 hours. It's totally possible to do it by foot. Or I could use a car and be there in an hour. There and back, with a full work day in-between done, in a day. Using the tool doesn't change what you can do, it speeds up getting the end result.

reply
bgwalter
23 minutes ago
[-]
There is no end result. It's a toy language based on a couple of examples without a grammar where apparently the LLM used its standard (plagiarized) parser/lexer code and reiterated until the examples passed.

Automating one of the fun parts of CS is just weird.

So with this awesome "productivity" we now can have 10,000 new toy languages per day on GitHub instead of just 100?

reply
TeodorDyakov
17 minutes ago
[-]
That was exactly my thought. Why automate the coding part to create something that will be used for coding (and in itself can be automated , going buy the same logic)? This makes zero sense.
reply
epolanski
57 minutes ago
[-]
I'm not the previous user, but I imagine that weeks of investment might be a commitment one does not have.

I have implemented an interpreter for a very basic stack-based language (you can imagine it being one of the simplest interpreters you can have) and it took me a lot of time and effort to have something solid and functional.

Thus I can absolutely relate to the idea of having an LLM who's seen many interpreters lay out the ground for you and make you play as quickly as possible with your ideas while procrastinating delving in details till necessary.

reply
64718283661
53 minutes ago
[-]
What's the point of making something like this if you don't get to deeply understand what your doing?
reply
My_Name
34 minutes ago
[-]
What's the point of owning a car if you don't build it by hand yourself?

Anyway, all it will do is stop you being able to run as well as you used to be able to do when you had to go everywhere on foot.

reply
purple_turtle
30 minutes ago
[-]
What is the point of car that on Mondays changes colour to blue and on each first Friday of the year explodes?

If neither you not anyone else can fix it, without more cost than making a proper one?

reply
ChrisGreenHeur
4 minutes ago
[-]
Code review exists.
reply