If you really want an arena like behavior you could allocate a byte slice and use unsafe to cast it to literally any type.
But like… the write up completely missed that manual memory management exists, and Golang considers it “unsafe” and that’s a design principle of the language.
You could argue that C++ RAII overhead is “bounded performance” compared to C. Or that C’s stack frames are “bounded performance” compared to a full in-register assembly implementation of a hot loop.
But that’s bloody stupid. Just use the right tool for the job and know where the tradeoffs are, because there’s always something. The tradeoff boundary for an individual project or person is just arbitrary.
And it also completely ignores the fascinating world of “GC-free Java”, which more than a few of the clients I work with use: Java with garbage collection entirely disabled. It’s used in finance a lot.
Is it pretty? No.
Is it effective? Yes.
Regarding Go’s memory arenas, do you need to use memory arenas everywhere ? Absolutely not. Most high performance code has a hot part that’s centered (like the tokenizer example that OP used). You just make that part reuse memory instead of alloc / dealloc and that’s it.
I'm not sure if these are just passerbys, or people who actually use Go but have never strayed from the std lib.
The big miss of the OP is that it ignores the Go region proposal, which is using lessons learned from this project to solve the issue in a more tractable way. So while Arenas won't be shipped as they were originally envisioned, it isn't to say no progress is being made.
If I’m doing that with a lot of ugly code - I might as well use idiomatic Zig with arenas. This is exactly the point the author tried to make.
Your last paragraph captures the tension perfectly. Go just isn’t the tool we thought for some jobs, and maybe that’s okay. If you’re going to count nanoseconds or measure total allocations, it’s better to stick to a non-GC language. Or a third option can be to write your hot loops in one such language; and continue using Go for everything else. Problem solved.
Go made it explicitly clear when it was released that it was designed to be a language that felt dynamically-typed, but with performance closer to statically-typed languages, for only the particular niche of developing network servers.
Which job that needs to be a network server, where a dynamically-typed language is a appropriate, does Go fall short on?
One thing that has changed in the meantime is that many actually dynamically-typed languages have also figured out how to perform more like a statically-typed language. That might prompt you to just use one of those dynamically-typed languages instead, but I'm not sure that gives any reason to see Go as being less than it was before. It still fulfills the expectation of having performance more like a statically-typed language.
A job where nanosecond routing decisions need to be made.
Or use Go and write ugly code for those hot loops instead of introducing another language and build system. Then you can still enjoy nicety of GC in other parts of your code.
Though it is my personal opinion that forcing a GC-based language to do a task best suited for manual memory management is like swimming against the tide. It’s doable but more challenging than it ought to be. I might even appreciate the challenge but the next person maintaining the code might not.
A word of caution. If you do this and then you store pointers into that slice, the GC will likely not see them (as if you were just storing them as `uintptr`s)
No out pointers. If you can do that, you're fine.
If you really store just references to the same arena, better to use an offset from the start of the arena. Then it does not matter whether allocations are moved around.
Only if the type is not a pointer per se or does not contain any inner pointers.
Otherwise the garbage collector will bite you hard.
The runtime only exposes a small subset of what it uses internally and there's no stable ABI for runtime internals. If you're lucky to get big enough and have friends they might not break you, some internal linkage is being preserved, but in the general case for a general user, nope. Updates might make your code untenable.
> If you really want an arena like behavior you could allocate a byte slice and use unsafe to cast it to literally any type.
AIUI the prior proposals still provided automated lifetime management, though that's related to various of the standing concerns, so you can't match that from "userspace" of go, finalizers don't get executed on a deterministic schedule. Put simply: that's not the same thing.
As someone else points out this is also much more fraught with error than just typing what you described. On top of the GC issue pointed out already, you'll also hit memory model considerations if you're doing any concurrency, which if you actually needed to do this surely you are. Once you're doing that you'll run into the issue, if you're trying to compete with systems languages, that Go only provides a subset of the platform available memory model, in the simplest form it only offers acq/rel atomic semantics. It also doesn't expose any notion of what thread you're running on (which can change arbitrarily) or even which goroutine you're running on. This limits your design space quite significantly at the bounds your performance for high frequency small region operations. I'd actually hazard an educated guess that an arena written as you casually suggest would perform extremely poorly at any meaningful scale (lets say >=32 cores, still fairly modest).
> You could argue that C++ RAII overhead is “bounded performance” compared to C. Or that C’s stack frames are “bounded performance” compared to a full in-register assembly implementation of a hot loop. > But that’s bloody stupid. Just use the right tool for the job and know where the tradeoffs are, because there’s always something. The tradeoff boundary for an individual project or person is just arbitrary.
Sure, reducto ad absurdum, though I typically would optimize against the (systems language) compiler long before I drop to assembly, it's 2025 systems compilers are great and have many optimizations, intrinsics and hints.
> Man this person is mediocre at best.
Harsh, I think the author is fine really. I think their most significant error isn't in missing or not discussing difficult other things they could do with Go, it's seemingly being under the misconception prior to the Arena proposal that Go actually cedes control for lower level optimization. It doesn't, and it never has, and it likely never will (it will gain other semi-generalized internal optimizations over time, lots of work goes into that).
In some cases you can hack some in on your own, but Go is not well placed as a "systems language" if you mean by that something like "competitive efficiency at upper or lower bound scale tasks", it is much better placed as a framework for writing general purpose servers at middle scales. It's best placed on systems that don't have batteries, and that have plenty of ram. It'll provide you with a decent opportunity to scale up and then out in that space as long as you pay attention to how you're doing along the way. It'll hurt if you need to target state of the art efficiency at extreme ends, and very likely block you wholesale.
I'm glad Go folks are still working on ideas to try to find a way for applications to get some more control over allocations. I'm also not expecting a solution that solves my deepest challenges anytime soon though. I think they'll maybe solve some server cases first, and that's probably good, that's Go's golden market.
The other is that almost no library is written in such a way that buffer re-use is possible (looking at you, typical kafka clients that throw off a buffer of garbage per message and protobuf). The latter could be fixed if people paid more attention to returning buffers to the caller.
Granted the latter leads to more verbose code and chaining of several calls is no longer possible.
But I am puzzled that even performance-oriented libraries both in Go and Rust still prefer to allocate the results themselves.
It may be that you can "just" build one, but you can't "just" use it and expect any of the available libraries and built ins to work with it.
How many things would you have to "just" rewrite?
(Yes: I used arenas a lot when I was shipping C code; they're a very easy way to get big speed boosts out of code that does a lot of malloc).
One could imagine a language that allows syntax like CallWithArena(functionPointer, someSetupInfo) and any standard library allocation therein would use the arena, releasing on completion or error.
Languages like Python and modern Java would typically use a thread/virtualthread/greenlet-local variable to track the state for this kind of pattern. The fact that Go explicitly avoids this pattern is a philosophical choice, and arguably a good one for Go to stick to, given its emphasis on avoiding the types of implicit "spooky action at a distance" that often plague hand-rolled distributed systems!
But the concept of arenas could still apply in an AlternateLowLevelLanguage where a notion of scoped/threaded context is implicit and language-supported, and arena choice is tracked in that context and treated as a first-class citizen by standard libraries.
> Operations such as new, free and delete by default will use context.allocator, which can be overridden by the user. When an override happens all called procedures will inherit the new context and use the same allocator.
[0] https://standards.scheme.org/corrected-r7rs/r7rs-Z-H-6.html#...
This is something I look forwards to exploring later in my current pet project, right now it has possibly the stupidest GC (just tracks C++ 'new' allocated objects) but is set up for drop in arena allocation with placement new so, we'll see how much that matters later on. There are two allocation patterns, statements and whatnot get compiled to static continuation graphs which push and pop secondary continuations and Value objects to do the deed so, I believe, the second part with the rapid temporary object creation will see the most benefit.
Anyhoo, slightly different pattern where the main benefits will most likely be from the cache locality or whatever, assuming I can even make a placement new arena allocator which is better than the performance of the regular C++ new. Never know, might even add more overhead than just tracking a bunch of raw C++ pointers as I can't imagine there's even a drop of performance which C++ new left on the table?
There are plenty of specialisations that get more performance out, e.g. multi-threaded code in NUMA architectures.
The same ones you'd have to rewrite using the (experimental) arenas implementation found in the standard library. While not the only reason, this is the primary reason for why it was abandoned; it didn't really fit into the ecosystem the way users would expect — you included, apparently.
"Memory regions" is the successor, which is trying to tackle the concerns you have. Work on it is ongoing.
I'm still optimistic about potential improvements. (Granted, I doubt there will be anything landing in the near future beyond what the author has already mentioned.)
For example, there is an ongoing discussion on "memory regions" as a successor to the arena concept, without the API "infection" problem:
Somehow concluding that "By killing Memory Arenas, Go effectively capped its performance ceiling" seems quite misguided.
With time Go is also getting knobs, and turns out various GC algorithms are actually useful.
Also I kind of foresee they will discover there are reasons why multiple GC algorithms are desired, and used in other programming ecosystems, thus the older one might stay anyway.
Since Zig built up the standard library where you always pass an allocator, they avoided the problem that the article mentions, about trying to retrofit Go's standard library to work with an arena allocator.
Although, that's not the case for IO in Zig. The most recent work has actually been reworking the standard library to be where you explicitly pass IO like you pass an allocator.
But it's still a young language so it's still possible to rework it.
I really do enjoy using the arena allocator. It makes things really easy, if your program follows a cyclical pattern where you allocate a bunch of memory and then when you're done just free the entire arena
Simple arenas are easy enough to write yourself, even if it does make unidiomatic code as the author points out. Pretty much anything that allocates tons of slices sees a huge performance bump from doing this. I -would- like that ability in an easier fashion.
On the other, hand, new users will abuse arenas and use them everywhere because "I read they are faster", leading to way worse code quality and bugs overall.
I do agree it would become infectious. Once people get addicted to microbenchmarking code and seeing arenas a bit faster in whatever test they are running, they're going to ask that all allocating functions often used (especially everything in http and json) have the ability to use arenas, which may make the code more Zig-like. Not a dig at Zig, but that would either make the language rather unwieldy or double the number of functions in every package as far as I can see.
you can write arena yourself, but it is useless if lang doesn't allow you to integrate it, e.g. allocate objects and vars on that arena..
I'm naively thinking, the performance bottleneck is not with tracking allocations but constantly freeing them and then reallocating. Let the GC track allocations, but prevent it from doing anything else so long as it is under the pre-allocated memory limit for the process. When resumed, it will free unreferenced memory. That way, the program can suspend GC before a performance sensitive block and resume it afterwards. API's don't need to change, because the change at all that way.
There is Odin, Zig or Jai(likely next year) as new kids on the block and alternatives to the cancer that is Rust or the more mainstream C, C++, Java or even C#.
Go definitely does not have to try and replace any of them. Go has its own place and has absolutely no reason to be fearful of becoming obsolete.
After all, in rare/extreme cases, one can always allocate big array and use flatbuffers for data structures to put into it.
There is some interesting proposals on short term allocations, being able to specify that a local allocation will not leak.
Most recently, I've been fighting with the ChaCha20-Poly1305 implementation because someone in their 'wisdom' added a requirement for contiguous memory for the implementation, including extra space for a tag. Both ChaCha20 and Poly1305 are streaming algorithms, but the go authors decide 'you cannot be trusted' - here's a safe one-shot interface for you to use.
Go really needs a complete overhaul of their Standard Library to fix this, but I can't see this ever getting traction due to the focus on not breaking anything.
Go really is a great language, but should include performance / minimise the GC burden as a key design consideration for it's APIs.
JSON's just a nightmare though. The inane legacy of UCS2 / UTF16 got baked into Unicode 8, and UTF16 escapes into JSON.
There are obviously other factors in play as well, or languages that are really good at both but weak in other areas (like adoption and mind share) would dominate. And I for sure don't see a lot of Crystal around..
The post could have also mentioned that the Go project hasn't given up. There are alternatives being explored to accomplish the same outcome. But, as before, that would invalidate the basis of the post and the sunk cost fallacy cannot stand the idea of having to throw the original premise into the trash.
For those interested, here's an article where Miguel Young implements a Go arena: https://mcyoung.xyz/2025/04/21/go-arenas/. I couldn't find references to Go's own experimental arena API in this article. Which is a shame since it'd be if this knowledgeable author traded them off. IIUC, Miguels version and the Go experimental version do have some important differences even apart from the API. IIRC, the Go experimental version doesn't avoid garbage collection. It's main performance benefit is that the Go runtimes' view on allocated memory is decreased as soon as `arena.Free` is called. This delays triggering the garbage collector (meaning it will run less frequently, saving cycles).
It's interesting the author decides to describe Rust in this way, but then spends the next 90% of the article lambasting the Go authors for having the restraint to not turn Go into Rust.
Arenas are simple to write, and if you need one, there are a lot of implementations available. If you want the language to give you complete flexibility on memory allocations then Go is the wrong language to use. Rust and Zig are right there, you pay upfront from that power with "difficult syntax".
I'm completely okay with that. In fact I much prefer it.
Writing high performance code is expensive in any language. Expensive in terms of development time, maintenance cost, and risk. It doesn't really matter what language we are talking about. The language usually isn't the limiting factor. Performance is usually lost in the design stage - when people pick the wrong strategies for solving a particular problem.
Not all code needs to be as fast as it can be. The priority for any developer should always be:
1. Correct
2. Understandable
3. Performant
If you haven't achieved 1, then 2 and 3 doesn't matter. At all. If you haven't achieved 2, then the lifetime cost and risk introduced by your code may not have an acceptable cost. When I was inexperienced I only focused on 3. The code needed to be fast. I didn't care if it was impossible for others to maintain. That works if you want no help. Ever. But that isn't how you create lasting value.Good programmers achieve all three and respect the priority. The programmers you don't really want on your team only focus on 3. Their code will be OK in the short term, but in the long term it tends to be a liability. I have seen several commercial products have to rewrite huge chunks of code that was impenetrable to anyone but the original author. And I have seen original authors break under the weight of their own code because they can no longer reason about what it does.
Go tries to not be complex. That is its strength. Introducing complexity that isn't needed by the vast majority of developers is a very bad idea.
If I need performance Go can't deliver there are other languages I could turn to. So far I haven't needed to.
(From the other comment I surmise that there are plenty of tricks one can use in Go to solve scenarios where you need to resort to trickery to get higher performance for various cases. So it seems that what you are asking for isn't even needed)
I think a core thing that's missing is that code that performs well is (IME) also the simplest version of the thing. By that, I mean you'll be;
- Avoiding virtual/dynamic dispatch
- Moving what you can up to compile time
- Setting limits on sizing (e.g. if you know that you only need to handle N requests, you can allocate the right size at start up rather than dynamically sizing)
Realistically for a GC language these points are irrelevant w.r.t. performance, but by following them you'll still end up with a simpler application than one that has no constraints and hides everything behind a runtime-resolved interface.
Also, if someone can understand the code, they can optimize it if needed. So in a way, trying to express oneself clearly and simply can be a way to help optimization later.
But maybe it's like exceptions, where people get involved with a project originally written by people who misused all sorts of language constructs and came away thinking the language was awful, or don't learn idiomatic usage or something.
I agree with author Go is getting squeezed, but it has its use cases. "COBOL of could native" implies it's not selected for new things, but I reach for it frequently (Go > Java for "enterprise software" backends, Go > others for CLI tools, obviously cloud native / terraform / CI ecosystem, etc.).
However in "best of suite" world, ecosystem interop matters. C <> Go is a pain point. As is WASM <> Go. Both make me reach for Rust.
Some should, maybe. But Go said right from day one that it doesn't aspire to be anything more than a language that appears dynamically-typed with static-type performance for the creation of network servers. It has no reason to. It was very much built for a specific purpose.
It has found other uses, but that was a surprise to its creators.
> Go is getting squeezed
Is it? I don't really see anything new that is trying to fill the same void. There are older solutions that are still being used, but presumably more would use them if Go hadn't been invented. So it is Go doing the squeezing, so to speak.
Yes it got rid of it's rough edges. People solely look positive at it because it has become more familiar with mainstream OOP languages. But it has no identity anymore. It is still simpler for the web then most competitors, but it doesn't matter because you install 30 packages for an hello world anyway. The community doesn't want simplicity, they want easy, with glorious looking code.
The irony is that PHP is perceived more attractive by coders, but it's so generic now, that a newbie is unlikely to choose it.
C: So powerful you can shoot your foot off!
Rust: Now that you've shot your foot off, let's not do that a second time.
Javascript: It runs on the server and in the browser.
Typescript: It runs on the server and in the browser, now with types!
In contrast,
PHP: Not sure if I want to be a templating language or general-purpose programming language.
Regardless, you’re thinking of Perl/CGI. PHP did attract the Perl crowd away from Perl, but it wasn’t for that reason. That was already the norm.
In Rust, can the lifetime of objects be tied to that of the arena to prevent this?
Asking as a C/C++ programmer with not much Rust experience.
But Bump::reset() takes a &mut self, while Bump::alloc() takes a &self reference and gives back a &mut T reference of the same lifetime. In Rust, &mut references are exclusive, so creating one for Bump::reset() ends the lifetime of all the old &self references, and thus all the old &mut T references you obtained from Bump::alloc(). Ergo, once you call Bump::reset(), none of the contained objects are accessible anymore. The blogpost at [2] gives a few other crates with this same &self -> &mut T pattern.
Meanwhile, some crates such as slab [1] effectively give you a numeric key or token to access objects, and crates differ in whether they have protections to guarantee that keys are unique even if objects are removed. All UAF protection must occur at runtime.
[0] https://docs.rs/bumpalo/3.19.0/bumpalo/struct.Bump.html
You could ask the programmer to mark some callstack as arena allocated and redirect all allocations to there while active and move everything that is still live once you leave the arena marked callstack (should be cheap if the live set is small, expensive but still safe otherwise).
Okay, this is just Lisp being Lisp, but it's still an example...
You could probably also do it without moving actually, it just gets a little more complex.
>They have been trying to prove they can achieve Arena-like benefits with, for example, improved GC algorithms, but all have failed to land
The new Green Tea GC from Go 1.25 [0]:
Instead of scanning objects we scan whole pages. Instead of tracking objects on our work list, we track whole pages. We still need to mark objects at the end of the day, but we’ll track marked objects locally to each page, rather than across the whole heap.
Sounds like a similar direction: "let's work with many objects at once". They mention better cache-friendliness and all.That is a problem, and the biggest reason for why the arenas proposal was abandoned. But if you were willing to accept that tradeoff in order to use the Go built-in arenas, why wouldn't you also be willing to do so for your own arenas implementation?
> If there was a paradigm of libraries not doing allocations and requiring the caller to allocate this wouldn't be such an issue.
I suppose that is what was at the heart of trying out arenas in an "official" capacity: To see if everyone with bespoke implementations updated them to use a single Go-blessed way to share around. But there was no sign of anyone doing that, so maybe it wasn't a "big miss" after all. Doesn't seem like there was much interest in collaborating on libraries using the same interface. If you're going to keep your code private, you can do whatever you want.
I think the deeper issue is that Go's garbage collector is just not performant enough. And at the same time, Go is massively parallel with a shared-everything memory model, so as heaps get bigger, the impact of the imperfect GC becomes more and more noticeable.
Java also had this issue, and they spent decades on tuning collectors. Azul even produced custom hardware for it, at one point in time. I don't think Go needs to go in that direction.
Sorry, but it doesn't seem that difficult (famous last words). Add a new implicit parameter to all objects just like "this" called "thisArena". When a call to any allocation is made, pass "thisArena" implicitly, unless something else passed explicitly.
That way the arena is viral all the way down and you can create sub-arenas. It also does not require actually passing the arena as parameter.
You don't even need to rewrite any new code, just recompile it.
- You need a pointer to the allocator (presumably you’d want to leave room for types beyond arenas). That’s 8 bytes of extra size on every object.
- You need to dynamically dispatch on this pointer for every allocation. Dynamic dispatch is a lot cheaper than it used to be on modern architectures, but it’s a barrier for basic inlining, which is a big deal when alloc() for an arena is otherwise just a pointer bump.
This really isn't very accurate. It is for Python, but JavaScript is massively performant. It's so performant that you can write game loops in it provided you work around the garbage collector, which, as noted, is a foible golang shares.
The solution is the same, to pre-allocate memory.
There is now new ABI proposal that should work across Python implementations, proposed by PyPy, but the uptake seems slow.
https://discuss.python.org/t/c-api-working-group-and-plan-to...
https://doc.pypy.org/en/latest/extending.html
With a good enough JIT, the amount of native libraries wouldn't be needed to the extent it is..
Minimizing allocation inside a loop is not a huge insight, nor very rare in any language including python.
The speed-up was impossible to measure because deallocation that used to take up to 30 seconds (especially after repeat cycles of allocating/deallocating) was now instant.
Even though we had very little experience it was trivial to do in C. Imo it's critical for performance oriented language to make using multiple allocators convenient. GC is a known performance killer but so is malloc in some circumstances.
There is almost never a step three.
But if there is, it's this: Step three: measure.
Now enter a loop of "try something, measure, go to step 2".
Of the things you can try, optimizing GC overhead is but one of many options. Arenas are but one of many options for how to do that.
And the thing about performance optimizations are that they can be intensely local. If you can remove 100% of the allocations on just the happy path inside of one hot loop in your code, then when you loop back to step two, you might find you are done. That does not require an arena allocator with global applicability.
Go gives realistic programmers the right tools to succeed.
And Go's limitations give people like the author plenty of ammunition to fight straw men that don't exist. Tant pis.
since we still have the tracing overhead and the same lifetimes, we haven't really gained that much by having manual memory.
D's best take at this is a compile-time assert that basically forbids us from allocating GC memory in the affected region (please correct me if I'm wrong), but that is pretty limited.
does anyone else have a good narrative for how this would work?
I'd have thought that allocating a block of memory per-GC type would work. As-per Rust you can use mainly one type of GC with a smaller section for eg. cyclic data allocated in a region, which can be torn down when no longer in use.
If you think about it like a kernel, you can have manual management in the core (eg. hard-realtime stuff), and GC in userland. The core can even time-slice the GC. Forth is particularly amenable as it uses stacks, so you can run with just that for most of the time.
From a quick search, _An Implementation of Scoped Memory for Real-Time Java_ (https://people.csail.mit.edu/rinard/paper/emsoft01.pdf) provides a decent overview:
> Real-Time Java extends this memory model to support two new kinds of memory: immortal memory and scoped memory. Objects allocated in immortal memory live for the entire execution of the program. The garbage collector scans objects allocated in immortal memory to find (and potentially change) references into the garbage collected heap but does not otherwise manipulate these objects.
> Each scoped memory conceptually contains a preallocated region of memory that threads can enter and exit. Once a thread enters a scoped memory, it can allocate objects out of that memory, with each allocation taking a predictable amount of time. When the thread exits the scoped memory, the implementation deallocates all objects allocated in the scoped memory without garbage collection. The specification supports nested entry and exit of scoped memories, which threads can use to obtain a stack of active scoped memories. The lifetimes of the objects stored in the inner scoped memories are contained in the lifetimes of the objects stored in the outer scoped memories. As for objects allocated in immortal memory, the garbage collector scans objects allocated in scoped memory to find (and potentially change) references into the garbage collected heap but does not otherwise manipulate these objects.
> The Real-Time Java specification uses dynamic access checks to prevent dangling references and ensure the safety of using scoped memories. If the program attempts to create either 1) a reference from an object allocated in the heap to an object allocated in a scoped memory or 2) a reference from an object allocated in an outer scoped memory to an object allocated in an inner scoped memory, the specification requires the implementation to throw an exception.
Phil Walder delivered a design within Go's team goals.
Java team has told nothing to Go's team, they have acknowldeged their bias anti-generics.
". They are likely the two most difficult parts of any design for parametric polymorphism. In retrospect, we were biased too much by experience with C++ without concepts and Java generics. We would have been well-served to spend more time with CLU and C++ concepts earlier."
https://go.googlesource.com/proposal/+/master/design/go2draf...
I have no idea what you think is on the other side of that exception. Please clarify.
Especially when it doesn't even begin to address the concern. The disconnect is not in where Go generics are limited, but where you find the exception. There is nothing that I can find to "except against". What was "Except" in reference to?
ArrayPool<T>.NET has value types, explicit stack allocation, low level unsafe programming C style, and manual memory management as well.
You can either do the whole boilerplate manually with Panama set of APIs, or write a C header file and let jextract do the work of boilerplate generation.
They and their friends spoke disdainfully of the "short toothbrush brigade". These were the climbers who sawed the handles off their toothbrushes, to save like four grammes in their backpack weight. Massively inconveniencing themselves but they sure were a teaspoon lighter!
This feels like that. Really do you think that playing childish pranks on the garbage collector is going to speed up anything? Pick a faster sorting algorithm or something.