Even with a citation, I'm not quite comfortable just reusing someone else's figures so many times when they're doing so much heavy lifting. But an explicit citation is the minimum.
I should probably draw my own.
https://github.com/munificent/craftinginterpreters/blob/mast...
Though now that I look at it, I apparently completely forgot to specify how the images should be licensed. Oops.
It's not a big deal and I really appreciate you reading and writing about the book, but I would prefer to not have the images reused without attribution.
I see that the license now includes images. I was unsure of how to have proper attribution and had just a link to a particular section of the book containing the image. I've updated it with a link to your profile, and now I should also add a link to the license.
There's at least one other Rust implementation of lox that I know of (https://github.com/tdp2110/crafting-interpreters-rs) (no unsafe)
It's always interesting to see how different people approach the problems in their own language or relative isolation. I agree with others here, the real value of the original work lies in avoiding copy and paste.
https://github.com/munificent/craftinginterpreters/wiki/Lox-...
I've also found the exercise of implementing Lox in another language as highly instructive in learning how to write idiomatic code in that language. I continue to learn more about the best way to express patterns as I work on my own implementation. I'd recommend the journey to any professional developer for this side-benefit alone.
> Lox must have the highest ratio of (implementations :
> production usage) of any language on the planet.
It's probably up there, for sure! But I'd guess that there are a million toy Lisp implementations, and more people are interested in writing a FORTH interpreter than actually using one in production. So I'd guess if we tried to get statistics it wouldn't be at the top.Though there's probably a similar claim to be made for the Monkey-language from Torsten Bell, via his books on compilers and interpreters.
Maybe! It's definitely getting there. I suspect "semi-arbitrary subset of C" still has me beat but who knows for how much longer.
On a more serious note, have you thought about trying to aim lightning at the same spot again and write another book about implementing something most programmers take for granted?
Just want to hopefully make you feel warm inside knowing you've made a really big difference to people's lives
Thanks for being you
I also wrote a post about it: https://ceronman.com/2021/07/22/my-experience-crafting-an-in...
I ended up having two implementations: One in purely safe Rust and another one with unsafe.
Note that if you're interesting in the "object manager" approach mentioned, I did that in my safe implementation, you can take a look at https://github.com/ceronman/loxido
I'll +1 and say I highly recommend going through Crafting Interpreters, particularly as a way of building non-trivial programs in languages you are less familiar with - If you just follow the java/C example you are tempted to lean into copy/pasting the samples, but figuring things out in other languages is a great experience.
I spent a longass time tracking down an issue with how the reference interpreter implementation defines tokens: https://github.com/munificent/craftinginterpreters/issues/11... which was frustrating but good debugging practice.
I'll echo this recommendation.
I haven't gone through Crafting Interpreters yet (it's on the bookshelf) but I did go through a big chunk of Building Git and instead of Ruby I did it in Rust, to get back into the language and go a little deeper.
It was really great and it gave me a lot more than if I'd just copy paste the code as you say.
There's really no perfect solution. Some readers prefer bottom up where they are given individual pieces they can understand which then get composed. Others prefer top down where they are given the high level problem being solved in terms of intermediate steps which then get incrementally defined. Some want to see code first so that they have something concrete to hang concepts off of. Others want concepts first and then to see the code as an illustration of it.
This is made even harder with Crafting Interpreters because the book's central conceit is that every line of code in the interpreters is shown in the book. There's nothing left to the reader. And, also, the book tries to give the user a program they can compile and run as often as possible as they work through things.
I did the best I could, but there are definitely some places where it's really hard to figure out a good order. Sometimes the book explicitly says "sorry, this won't make sense but bear with me and we'll circle back".
Also, at the macro structure, the entire book is organized into two parts where the first one is higher-level and more concept driven and then the second part circles all the way back to build the entire language from scratch again but at a much lower level of abstraction.
I appreciate your feedback. It's a hard problem and a skill I'm always trying to improve.
It's by far the best introductory book on this topic I've come across yet. It's a fun read and completely demystifies the "magic" of how interpreters work.
I think it also heavily depends on the reader. I also followed the book in a different lang than Java and only then I had the aforementioned problems.
And yeah, as someone who's trying to start writing a blog on ~technical topics, I get what you mean. Not every explanation can be linearized perfectly, because sometimes two concepts are coupled, motivating and influencing each other's design. Sometimes there are just, so to say, dependency cycles in the "knowledge graph". Can't turn that into a directed acyclic graph no matter how hard you try. The most prudent solution probably is just resolving the cycle iteratively by starting somewhere.
That's the second half of the book.
In the first half, what we gain from implementing it in Java is mostly the garbage collector. The rest of the static and runtime semantics basically have to be implemented from scratch on top of the JVM.
This was a deliberate choice. Most intro compiler/interpreter books don't do OOP. But most languages in wide use today are object oriented. That felt like a significant gap to me. One of the main goals with my book is to give people insight on how the languages they actually use work under the hood. Skipping objects and method dispatch would limit that.
My swing at the Crafting Intepreters in Rust can be found here: https://github.com/mwerezak/sphinx-lang
Although, at the time I was really interested in Lua's syntax so for extra credit I designed a completely different syntax for Lox closer to Lua's.
I'd love to revisit this project sometime and implement a register-based VM similar to Lua's.
I did write a blog post about some of my design thoughts, although it doesn't dig into the technical guts: https://samsartor.com/guis-3/
Haha this is gold. We ALL do this ALL the time.
https://journal.stuffwithstuff.com/2023/10/19/does-go-have-s...
>I’ve been noodling on a static type system for my current hobby language.
As an inveterate language tinkerer for decades, Crafting Interpreters is like the intersection of Class A narcotic addiction, Feynman level understanding and religious enlightenment. It's in my top 3 favourite books of all time.
If you were to /happen/ to write an extension/follow up book that addressed static typing, then I can personally guarantee at least one sale.
Independent of that, thanks for the work as it stands. My copy is healthily dog-eared and still gets lugged around on holidays for refresh reading.
*source: trust me bro
Shameless plug: you may want to check out Loxcraft: https://github.com/ajeetdsouza/loxcraft
I too followed the path of "ignore safe Rust for maximum performance". It got pretty close to the C version, even beating it on some benchmarks.
Great journey. That's the thing about "unsafe" and "questionable parts"... if they really are questionable, then don't do 'em. In this case, if it is the case that holding a reference to a GC object guarantees that that object can't be freed, then it's not questionable. Proving that to be case can be fun tho.
The question is: does my unsafe code allow violating the borrow rules?
Cell, RefCell, RwLock etc give you interior mutability, but there is no way to violate the borrow rules with them. On the surface, it looks like you are violating them because "I have a non-mut thing that I can mutate".
If you build a thing that allows two distinct &mut T's to the same thing, then you are basically asking for a bug. Don't use unsafe to break the rules. Use unsafe to enforce the rules in new ways.
Based on what I've read, it's entirely possible. If my Rust knowledge is correct, there are 2 things that have to be proven - the object's lifetime and the borrow rules.
Proving the lifetime can be done at compile-time based on what was discussed in these articles https://without.boats/tags/shifgrethor/. But that still leaves us with proving that we don't violate the borrow rules. AFAIK, the only way to do it in my implementation is with Cell and RefCell enforcing the borrow rules at runtime.
Before removing all the safety checks, I actually used Gc<RefCell<T>> for mutable objects, so the borrow rules were enforced but not the lifetime. However, I decided to remove RefCell<T> to have some fun and see how I could shoot myself in the foot.
Doing this properly means putting `GcData`'s fields (and/or `ObjFun`/`ObjClosure`/etc's fields) in plain `Cell`s, and using that for all mutation without ever forming any `&mut` references.
If I remember correctly, there were some panics cause by RefCell borrows. But those got sorted out once I handled borrowing in the correct order.
What I would expect a production language implementation to do is build its own table that integrates its solution for interior mutability into its design. There are several other things (e.g. iterators, concurrent GC) that will also contribute to the need for a custom table, so (again for a production language) this is not too crazy.
And yeah, RefCells everywhere.
With the way Rust may perform optimizations on the assumption of no alias, I guess there's currently no way to avoid RefCell if we want to be 100% on the safe side.
Do you have any resources about Rust inlining and the issues it might cause? I'd love to read more about that.
- You can have any number of simultaneous readers,
- Or one writer and no readers.
- But if you ever break these rules, the world may burn.
Using unsafe and violating these rules is one of the cases where the Rust compiler can inflict worlds of misery: Incorrect code generation, CPUs seeing different versions of the same memory location, etc., depending on the exact context. Once you use "unsafe" and break the rules, Rust can be more dangerous than C. Rust even reserves the right to generate code using "noalias" (except doing so often triggers LLVM bugs, so it's usually turned off).
"Undefined behavior" means "anything at all can happen, and some of it is deeply weird and awful and counterintuitive."
You could enforce the borrowing rules at runtime by using std::cell::Cell in your heap objects, which is exactly what it exists for. Or you could package everything inside a tiny core module and audit it extremely carefully.
It's been enabled for a couple years now.
If you write e.g.
``` let obj = some_object(); let len : &mut usize = &mut obj.len; // deref_mut trigger_gc(); use(*len); ```
then you held a reference across the GC, and while it's mark/sweeping created an aliases `&mut` to `len`.
Inlining was mention just because it causes function bodies to be splatting together, and so puts together code that is fine in isolation in a way that may allow Rust to observe UB: if `trigger_gc` was inlined for example then Rust has more information and code available at once, and might use the knowledge to apply some optimizations that are invalid because you caused UB.
Actually, looking at your code the much larger issue is that nothing stops you from doing
``` let obj = some_object(); let obj2 = obj.clone(); let len1 = &mut obj.len; let len2 = &mut obj2.len; let n = *len1; *len2 = 0; println!("{n}"); // what does this print? ``` because your Deref mut are just creating unconstrained borrow from pointer types that you can copy. This is almost definitely going to cause you bugs in the future, since it opens you up to accidentally writing iterator invalidation and other issues (and even worse than C++ does, because C++ doesn't optimize references as heavily as Rust does borrows)
It's not just that it's possible to break the code, but that lack of any checks makes it impossible to detect at either compile-time or runtime and have it "appear to work".
One way to solve it to remove the DerefMut implementation and have it work more like Rc - as in, the user is forced to write `Gc<Cell<T>>` or `Gc<RefCell<T>>` if they want mutability. This solves the aliasing borrows issue at the cost of extra runtime checks with RefCell (and still doesn't prevent you from unsoundly calling `sweep()` while holding a Gc)
Link: https://app.codecrafters.io/courses/interpreter/overview
(Disclaimer: I work for CodeCrafters, and built this challenge).
Technically, not always. You can allocate on the stack at runtime, see "alloca" in C.
But is alloca even that useful in practice? I struggle to find a good example.
What is done at compile time by the compiler, is calculating the amount to decrement the CPU stack pointer register at runtime for every stack variables.
The compiler and alloca must work together, because optimization can easily break something here. For example, when a function return, the stack pointer register better be restored properly. Furthermore, in Rust you must not forget to call Drop::drop() for all types that have such a destructor.
- you need a lot of memory
- it needs to be as fast as possible
- it's a leaf function that can never call another function
99.9% of the time those aren't all true.
It's not all that much worse than using a huge, static stack allocation. In fact, there are cases where alloca is safer since you can take into account how much stack memory you already have available before incrementing the stack pointer.
In any case it is up to you to do the stack check. Though alloca can easily be wrapped by a function running the check.
In Rust I would write a check stack function that takes a function/closure as parameters. Within the closure, you are guaranteed to have at least the requested stack space.
The compiler will inline it all nicely, while preserving the semantic of the stack allocation.
Conceptually, if the compiler could just inline all of the child calls, it's fine. But if this call won't return for a long time, or may be called again (even worse). You're setting aside stack that you'll never get back. Which is a memory leak, and your program will crash.
int a(){
void* mem = allocal(some_number);
b();
}
int b(){
a();
}
if a() is a leaf (or can be trivially inlined) it's fine!on the other hand, if a() calls something else, that's complicated, it's way way harder to figure out why you're crashing. I think you'll segfault as a() eventually overwrites malloc'ed memory. but different systems are going to have different characteristics.
It's not that you can't, you just have to be real damn smart and make sure no one ever messes with your code.
or you can just make sure it's a leaf. much easier rule, much easier to explain.
It will run until it runs out of stack. Or it might just run forever because the compiler optimized it as an infinite loop.
In any case, same behavior.
Or am I crazy, and there is no point to being careful with alloca?
That is really the only difference between `in foo[...N] = ...N` and alloca(n). alloca works with a value computed at runtime. And this value could be from an input.
you really do need confidence the little helper can be fully inlined. if the helper is out of your control, and someone greatly expands its responsibility, it can suck.
So, yeah, effectively a leaf.