While Claude Code might have been the reason this bug became triggered by more people, there are some of us who were hitting it without ever having used Claude Code at all. Maybe the assumption about what makes a page non-standard, isn't as black-and-white as presumed. And I wonder if the leak would have been triggered more often for people who use scrollback-limit = 0, or something very small.
Probably not a huge deal, but it does seem the fix will needlessly delete and recreate non-standard pages in the case where the new page needs to be non-standard, and the oldest one (that needs to be pruned) already is non-standard and could be reused.
This is addressed in the blog post.
It is how the PageList has always worked, and also how it worked before with the bug, because during capacity adjustment we would see the wrong size. This shouldn't change any perceived performance.
And as I note in the blog post, there are alternative approaches such as the one you suggested, but we don't have enough empirical data to support changing our viewpoint on that whereas our current viewpoint (standard sizes are common) is well supported by known benchmarks. I'm open to changing my mind here, but I didn't want to change worldviews AND fix the leak in the same go.
I'm feeling a bit lucky I was able to sneak in an issue during the beta phase, but it was a real reproducible one that led to a segfault.
However, I am a somewhat surprised that the fix is reserved for a feature release in a couple of months. I would have expected this to be included in a bug fix release.
Kudos, that was a good read. Just remember that every time you do something novel, there’s potential for leaks :D
1|Flakes © 2|Installed © 3|Store © € 4|Security © €
──────────────────────────────────────────────────────────────
This works fine normally, but resizing the terminal would quickly trigger the crash - easy to avoid but still annoying!I was already preparing myself to file a bug report with the easy repro, but this sounds suspiciously close to what the blog post is describing. Fingers crossed :)
(EDIT: HN filters unicode, booo :( )
I like using AI for visualizations because it is one-time use throwaway code, so the quality doesn't matter at all (above not being TOTALLY stupid), it doesn't need to be maintained. I review the end result carefully for correctness because it's on a topic I'm an expert of.
I produce non-reusable diagrams namespaced by blog post (so they're never used by any other post). I just sanity check that the implementation isn't like... mining bitcoin or leaking secrets (my personal site has no secrets to build) or something. After that, I don't care at all about that quality.
The information is conveys is the critical part, and diagrams like this make it so much more consumable for people.
GC languages are fast these days. If you don't want a runtime like C# (which has excellent performance) a language like Go would have worked just fine here, compiling to a small native binary but with a GC.
I don't really understand the aversion to GC's. In memory constrained scenarios or where performance is an absolute top priority, I understand wanting manual control. But that seems like a very rare scenario in user space.
There’s even a standard, non-unsafe API for leaking memory[1].
(What Rust does do is make it harder to construct programs that leak memory unintentionally. It’s possible but not guaranteed that a similar leak would be difficult to express idiomatically in Rust.)
[1]: https://doc.rust-lang.org/std/boxed/struct.Box.html#method.l...
Rust has Affine Types. This means Rust cares that for any value V of type T, Rust can see that we did not destroy V twice (or more often).
With Linear Types the compiler checks that you destroyed V exactly once, not less and not more.
However, one reason I don't end up caring about Leak Safety of this sort is that in fact users do not care that you didn't "leak" data in this nerd sense. In this nerd sense what matters is only leaks where we lost all reference to the heap data. But from a user's perspective it's just as bad if we did have the reference but we forgot - or even decided explicitly not - to throw it away and get back the RAM.
The obvious way to make this mistake "by accident" in Rust is to have two things which keep each other alive via reference counting and yet have been disconnected and forgotten by the rest of the system. A typical garbage collected language would notice that these are garbage and destroy them both, but Rust isn't a GC language of course. Calling Box::leak isn't likely to happen by accident (though you might mistakenly believe you will call it only once but actually use it much more often)
I think the main part of Ghostty's design mentioned here that - as a Rust programmer - I think is probably a mistake is the choice to use a linked list. To me this looks exactly like it needs VecDeque, a circular buffer backed by a growable array type. Their "clever" typical case where you emit more text and so your oldest page is scrapped and re-used to form your newest page, works very nicely in VecDeque, and it seems like they never want the esoteric fast things a linked list can do, nor do they need multi-writer concurrency like the guts of an OS kernel, they want O(1) pop & push from opposite ends. Zig's Deque is probably that same thing but in Zig.
The way to solve this in Rust would be to put this logic in the drop and hide each page type in an enum. That way you can’t ever confuse the types or what happens when you drop.
As you're saying, the bug was the equivalent of an incorrectly written Drop implementation.
Nothing against Zig, and people not using Rust is just fine, but this is what happens when you want C-like feel for your language. You miss out on useful abstractions along with the superfluous ones.
"We don't need destructors, defer/errdefer is enough" is Zig's stance, and it was mostly OK.
Impossible to predict this kind of issue when choosing a project language (and it's already been discussed why Zig was chosen over Rust for Ghostty, which is fine!), so it's not a reason to always choose Rust over Zig, but sometimes that slightly annoying ceremony is useful!
Maybe some day I'll be smart enough to write Zig as a default over Rust, but until that day I'm going to pay the complexity price to get more safety and keep more safety mechanisms on the shotgun aimed at my foot. I've got plenty of other bugs I can spend time writing.
Another good example is the type vs type alias vs wrapper type debate. It's probably not reasonable to use a wrapper type every single time (e.g. num_seconds probably can probably be a u32 and not a Seconds type), but it's really a Rorschach test because some people lean towards one end versus the other for whatever reason, and the plusses/minuses are different depending on where you land on the spectrum.
[EDIT] also some good discussion here
https://ziggit.dev/t/zig-what-i-think-after-months-of-using-...
Calling unsafe mmap APIs not only is unlikely to run into the corner cases where unsafe Rust is tricky to get right, there’s “millions” of crates that offer safe APIs to do so and it’s fundamentally not hard to write it safely (it would be very hard to write it to have any issues).
And fundamentally I think Rust is much more likely to be easier to get high performance because the vast majority of safe code you write is amenable to the compiler performing safe optimizations that Zig just can’t do regarding pointer aliasing (or if it does brings all the risks of of unsafe Rust when the user annotates something incorrectly).
This comment [0] by mitchellh on the corresponding lobste.rs submission discusses the choice of data structure a bit more:
> Circular buffer is a pretty standard approach to this problem. I think it's what most terminal emulators do.
> The reason I went with this doubly linked list approach with Ghostty is because architecturally it makes it easier for us to support some other features that either exist or are planned.
> As an example of planned, one of the most upvoted feature requests is the ability for Ghostty to persist scroll back across relaunch (macOS built-in terminal does this and maybe iTerm2). By using a paged linked list architecture, we can take pages that no longer contain the active area (and therefore are read-only) and archive them off the IO thread during destroy when we need to prune scroll back. We don't need to ever worry that the IO thread might circle around and produce a read/write data race.
> Or another example that we don't do yet, we can convert the format of scroll back history into a much more compressed form (maybe literally compressed memory using something like zstd) so we can trade off memory for cpu if users are willing to pay a [small, probably imperceptible] CPU time cost when you scroll up.
[0]: https://lobste.rs/s/vlzg2m/finding_fixing_ghostty_s_largest_...
I don't understand why that is the preferred fix. I would have solved it other ways:
1. When resizing the page, leave some flag of how it was allocated. This tagging is commonly done as the always 0 bits in size or address fields to save space.
2. Since the pool is a known size of contiguous memory, check if the memory to be freed is within that range
3. Make the size immutable. If you want to realloc, go for it, and have the memory manager handle that boundary for you.
Both of those not only maintain functionality which seems to have been lost with the feature reduction but also are more future proof to any other changes in size.
At the end of the day, #1 and #3 both probably add a fairly significant amount of code and complexity that it's not clear to me adds robustness or clarity. From the fix:
``` // If our first node has non-standard memory size, we can't reuse // it. This is because our initBuf below would change the underlying // memory length which would break our memory free outside the pool. // It is easiest in this case to prune the node. ```
https://github.com/ghostty-org/ghostty/commit/17da13840dc71b...
#3, it seems, would require making a broader change. The size effectively is immutable now (assuming I'm understanding your comment correctly): non-standard pages never change size, they get discarded without trying to change their size.
#2 is interesting, but I think it won't work because the implementation of MemoryPool doesn't seem like it would make it easy to test ownership:
https://github.com/ghostty-org/ghostty/blob/17da13840dc71ba3...
You'd have to make some changes to be able to check the arena buffers, and that check would be far slower than the simple comparison.
#1 and #2 are fixes for breaking that implicit trust. #1 still trusts the metadata, #2 is what I'd consider the most robust solution is that not only is it ideally trivial (just compare if a pointer is within a range, assuming zig can do that) but it doesn't rely on metadata being correct. #3 prevents the desync.
I really don't understand the code base enough to say definitively that my ways work, which is I guess what I'm really looking for feedback on. Looking at the memorypool, I think you're right that my assumption of it being a simple contiguous array was incorrect.
ETA: I think I'm actually very wrong for #2. Color me surprised that the zig memory pool allocated each item separately instead of as one big block. Feels like a waste, but I'm sure they have their reasons. That's addCapacity in memory_pool.zig
Which is to say, I don't think it was actually being resized. I think it was the metadata for the page saying it had the (incorrect) standard size (and the incorrect handling after the metadata was changed).
23 minutes later I'm at +2
6 minutes after, +5 +4min now +6, another 20 minutes +8. I think I'm in the clear
This excellent write-up from michellh explains the issue in depth and all his blogs in building Ghostty are a recommended read on the Ghostty's internals.
Similarly, these write-ups are a great read. Here is another one that documents a goroutine leak and how it was detected, fixed without restarting production. [0]
This is what most vibe-coders will NOT do when faced with a non-trivial issue, with a serious software product.
[0] https://skoredin.pro/blog/golang/goroutine-leak-debugging
A user came along and provided a reliable reproduction for me (last night) that allowed me to find and fix the issue. Simultaneously they found the same thing and produced a similar fix, which also helped validate both our approaches. So, we were able to move forward. I said in the linked comment that I believed the leak existed, just couldn't find it.
It also was fairly limited in impact. As far as Ghostty bugs go, the number of upvotes the bug report had (9) is very small. The "largest" in the title is with regards to the size of the leak in bytes, not the size of the leak in terms of reach.
As extra data to support this, this bug has existed for at least 3 years (since the introduction of this data structure in Ghostty during the private beta). The first time I even heard about it in a way where I can confidently say it was this was maybe 3 or 4 months ago. It was extremely rare. I think the recent rise in popularity of Claude Code in particular was bringing this to the surface more often, but never to the point it rose to a massively reported issue.
Like I said, this bug has existed for 3 years at this point and Ghostty is likely used by hundreds of thousands if not a million+ people daily (we don't have any analytics at all but have some side signals based on terminal reports from 3rd party CLIs). Trust me when I say that when there is a widespread issue, we hear it MUCH more loudly. :)
In the meantime they apparently got one (edit: per their sibling comment they got it yesterday evening) and were finally able to figure out the issue.
edit: https://github.com/ghostty-org/ghostty/discussions/10244 is where it was cracked.
As you can see, there's no hint or evidence they even are on HN let alone saw that discussion.
I have never found myself in the situation where my terminal emulator would be too slow and I‘m using it for the majority of my day-to-day work.
I honestly never ran into a situation where I would habe blamed the terminal emulator for being too slow.
> Ghostty is a terminal emulator that differentiates itself by being fast, feature-rich, and native. While there are many excellent terminal emulators available, they all force you to choose between speed, features, or native UIs. Ghostty provides all three.
> In all categories, I am not trying to claim that Ghostty is the best (i.e. the fastest, most feature-rich, or most native). But when I set out to create Ghostty, I felt all terminals made you choose at most two of these categories. I wanted to create a terminal that was competitive in all three categories and I believe Ghostty achieves that goal.
> Before diving into the details, I also want to note that Ghostty is a passion project started by Mitchell Hashimoto (that's me!). It's something I work on in my free time and is a labor of love. Please don't forget this when interacting with the project. I'm doing my best to make something great along with the lovely contributors, but it's not a full-time job for any of us.