Tail Call Recursion in Java with ASM (2023)
96 points
2 days ago
| 7 comments
| unlinkedlist.org
| HN
fsckboy
1 day ago
[-]
the "lambda the ultimate" papers and the birth of scheme was a loong time ago, so it grates on my ears to hear this topic presented as "an optimization". Yes, it is sometimes an optimization a compiler can make, but the idea is much better presented as a useful semantic of a language.

in the same way that passing parameters to a subfunction "creates" a special set of local variables for the subfunction, the tail recursion semantic updates this set of local variables in an especially clean way for loop semantics, allowing "simultaneous assignment" from old values to new ones.

(yes, it would be confusing with side effected C/C++ operators like ++ because then you'd need to know order of evaluation or know not to do that, but those are already issues in those languages quite apart from tail recursion)

because it's the way I learned it, I tend to call the semantic "tail recursion" and the optimization "tail call elimination", but since other people don't do the same it's somewhat pointless; but I do like to crusade for awareness of the semantic beyond the optimization. If it's an optimization, you can't rely on it because you could blow the stack on large loops. If it's a semantic, you can rely on it.

(the semantic is not entirely "clean" either. it's a bit of a subtle point that you need to return straightaway the return value of the tail call or it's not a tail call. fibonacci is the sum of the current with the next so it's not a tail call unless you somewhat carefully arrange the values you pass/keep around. also worth pointing out that all "tail calls" are up for consideration, not just recursive ones)

reply
ekimekim
1 day ago
[-]
In a weird way it kinda reminds me of `exec` in sh (which replaces the current process instead of creating a child process). Practically, there's little difference between these two scripts:

    #!/bin/sh
    foo
    bar
vs

    #!/bin/sh
    foo
    exec bar
And you could perhaps imagine a shell that does "tail process elimination" to automatically perform the latter when you write the former.

But the distinction can be important due to a variety of side effects and if you could only achieve it through carefully following a pattern that the shell might or might not recognize, that would be very limiting.

reply
nagaiaida
1 day ago
[-]
this is pretty much exactly how my "forth" handles tail call elimination, and it's the main thing that's added the quotes so far since it shifts the mental burden to being aware of this when writing code to manipulate the return stack.

as you imply towards the end, i'm not confident this is a trick you can get away with as easily without the constraints of concatenative programming to railroad you into it being an easily recognizable pattern for both the human and the interpreter.

reply
LeFantome
1 day ago
[-]
One of the issues with Java is that it is two levels of language. You compile Java into Java Byte code which is further compiled into native machine code. There is no concept of tail call recursion in Java Byte code. So, it is difficult to propagate the semantics. So it really has to be a programmer or compiler optimization to implement the tail call optimization into the generated intermediate bytecode before that is further compiled.

.NET is an interesting contrast. The equivalent of Java Bytecode in .NET (CIL) does have the concept of tail calls. This allows a functional language like F# to be compiled to the intermediate form without losing the tail call concept. It is still up to the first level compiler though. C# for example does not support tail calls even though it’s intermediate target (CIL) does.

reply
ghoul2
1 day ago
[-]
Sigh. I have been kicking this horse forever as well: an "optimization" implies just a performance improvement.

Tail call elimination, if it exists in a language, allows coding certain (even infinite) loops as recursion - making loop data flow explicit, easier to analyze, and at least in theory, easier to vectorize/parallelize, etc

But if a language/runtime doesn't do tail call elimination, then you CAN'T code up loops as recursion, as you would be destroying you stack. So the WAY you code, structure it, must be different.

Its NOT an optimization.

I have no idea who even came up with that expression.

reply
ameliaquining
1 day ago
[-]
I mean, in the particular case demonstrated in this blog post it can only be an optimization, because semantically guaranteeing it would require language features that Java doesn't have.
reply
bradley13
2 days ago
[-]
Every compiler should recognize and optimize for tail recursion. It's not any harder than most other optimizations, and some algorithms are far better expressed recursively.

Why is this not done?

reply
SkiFire13
2 days ago
[-]
In general, tail recursion destroys stacktrace information, e.g. if f calls g which tail calls h, and h crashes, you won't see g in the stacktrace, and this is bad for debuggability.

In lower level languages there are also a bunch of other issues:

- RAII can easily make functions that appear in a tail position not actually tail calls, due to destructors implicitly running after the call;

- there can be issues when reusing the stack frame of the caller, especially with caller-cleanup calling conventions;

- the compiler needs to prove that no pointers to the stack frame of the function being optimized have escaped, otherwise it would be reusing the memory of live variables which is illegal.

reply
chowells
1 day ago
[-]
I'll believe destroying stacktrace information is a valid complaint when people start complaining that for loops destroy the entire history of previous values the loop variables have had. Tail recursion is equivalent to looping. People should stop complaining when it gives them the same information as looping.
reply
roenxi
1 day ago
[-]
> I'll believe destroying stacktrace information is a valid complaint when people start complaining that for loops destroy the entire history of previous values the loop variables have had.

That is a common criticism. You're referring to the functional programmers. They would typically argue that building up state based on transient loop variables is a mistake. The body of a loop ideally should be (at the time any stack trace gets thrown) a pure function of constant values and a range that is being iterated over while being preserved. That makes debugging easier.

reply
ameliaquining
1 day ago
[-]
I mean, if I were doing an ordinary non-recursive function call that just happened to be in tail position, and it got eliminated, and this caused me to not be able to get the full stack trace while debugging, I might be annoyed.

In a couple languages I've seen proposals to solve this problem with a syntactic opt-in for tail call elimination, though I'm not sure whether any mainstream language has actually implemented this.

reply
chowells
1 day ago
[-]
Language designers could keep taking ideas from Haskell, and allow functions to opt in to appearing in stack traces. Give the programmer control, and all.
reply
SamLL
1 day ago
[-]
Kotlin has a syntactic opt-in for tail call elimination (the "tailrec" modifier).
reply
michaelmrose
1 day ago
[-]
reply
vbezhenar
1 day ago
[-]
Some of the issues partially alleviated by using limited part of tail recursion optimization. You mark some function with tailrec keyword, and compiler verifies that this function calls itself as the last statement. You also wouldn't expect complete stack trace from that function. At the same time it probably helps with 90% of recursive algorithms which would benefit from the tail recursion.
reply
LeFantome
1 day ago
[-]
That is what Clojure does I believe.
reply
hyperbrainer
2 days ago
[-]
AFAIK Zig is the only somewhat-big and known low-level language with TCO. Obviously, Haskell/Ocaml and the like support and it are decently fast too, but system programming languages they are not.
reply
vlovich123
2 days ago
[-]
For guarantee:

https://crates.io/crates/tiny_tco

https://crates.io/crates/tco

As an optimization my understanding is that GCC and LLVM implement it so Rust, C, and C++ also have it implicitly as optimizations that may or may not apply to your code.

But yes, zig does have a formal language syntax for guaranteeing tail calls to happen at the language level (which I agree with as the right way to expose this optimization).

reply
SkiFire13
2 days ago
[-]
Zig's tco support is not much different than Clang's `[[clang::musttail]]` in C++. Both have the big restriction that the two functions involved are required to have the same signature.
reply
hyperbrainer
2 days ago
[-]
> Both have the big restriction that the two functions involved are required to have the same signature.

I did not know that! But I am a bit confused, since I don't really program in either language. Where exactly in the documentation could I read more about this? Or see more examples?

The language reference for @call[0] was quite unhelpful for my untrained eye.

[0] https://ziglang.org/documentation/master/#call

reply
SkiFire13
1 day ago
[-]
Generally I also find Zig's documentation pretty lacking, instead I try looking for the relevant issues/prs. In this case I found comments on this issues [1] which seem to still hold true. That same issue also links to the relevant LLVM/Clang issue [2], and the same restriction is also being proposed for Rust [3]. This is were I first learned about it and prompted me to investigate whether Zig also suffers from the same issue.

[1]: https://github.com/ziglang/zig/issues/694#issuecomment-15674... [2]: https://github.com/llvm/llvm-project/issues/54964 [3]: https://github.com/rust-lang/rfcs/pull/3407

reply
ufo
1 day ago
[-]
This limitation is to ensure that the two functions use the exact same calling convention (input & output registers, and values passed via stack). It can depend on the particular architecture.
reply
Thorrez
2 days ago
[-]
C++:

> All current mainstream compilers perform tail call optimisation fairly well (and have done for more than a decade)

https://stackoverflow.com/questions/34125/which-if-any-c-com... (2008)

reply
hyperbrainer
2 days ago
[-]
I couldn't actually figure out whether this TCO being done "fairly well" was a guarantee or simply like Rust (I am referring to the native support of the language, not what crates allow)
reply
Thorrez
4 hours ago
[-]
When that SO answer was written, it was not a guarantee.

You can now get a guarantee by using non-standard compiler attributes:

https://clang.llvm.org/docs/AttributeReference.html#musttail

https://gcc.gnu.org/onlinedocs/gcc/Statement-Attributes.html...

reply
johnisgood
1 day ago
[-]
Depends on what you mean by "systems programming", you can definitely do that in OCaml.
reply
pjmlp
1 day ago
[-]
reply
hyperbrainer
1 day ago
[-]
I know of these. Almost added a disclaimer too -- that was not my point, as I am sure, you understand. Also Ocaml has a GC, unsuitable for many applications common to systems programming.
reply
vlovich123
2 days ago
[-]
My bigger issue with tail call optimization is that you really want it to be enforceable since if you accidentally deoptimize it for some reason then you can end up blowing up your stack at runtime. Usually failure to optimize some pattern doesn’t have such a drastic effect - normally code just runs more slowly. So tail call is one of those special optimizations you want a language annotation for so that if it fails you get a compiler error (and similarly you may want it applied even in debug builds).
reply
_old_dude_
2 days ago
[-]
Parroting something i have heard at a Java conference several years ago, tail recursion remove stack frames but the security model is based on stack frames, so it has to be a JVM optimization, not a compiler optimization.

I've no idea if this fact still holds when the security manager will be removed.

reply
smarks
2 days ago
[-]
The security manager was removed (well, “permanently disabled”) in Java 24. As you note, the permissions available at any given point can depend on the permissions of the code on the stack, and TCO affects this. Removal of the SM thus removes one impediment to TCO.

However, there are other things still in the platform for which stack frames are significant. These are referred to as “caller sensitive” methods. An example is Class.forName(). This looks up the given name in the classloader of the class that contains the calling code. If the stack frames were shifted around by TCO, this might cause Class.forName() to use the wrong classloader.

No doubt there are ways to overcome this — the JVM does inlining after all — but there’s work to be done and problems to be solved.

reply
thfuran
1 day ago
[-]
Is there? As you say, there's already inlining, and I don't see how tco presents a harder case for that.
reply
smarks
19 hours ago
[-]
There are similarities in the problems, but there are also fundamental differences. With inlining, the JVM can always decide to deoptimize and back out the inlining without affecting the correctness of the result. But it can't do that with tail calls without exposting the program to a risk of StackOverflowError.

We've been using TCO here ("tail call optimization") but I recall Guy Steele advocating for calling this feature TCE ("elimination") because programs can rely on TCE for correctness.

reply
javier2
1 day ago
[-]
In theory, if all you do is implement algorithms, this sounds fine. But most apps implement horrible business processes, so what would one do with missing stacktraces? Maybe in languages that can mark functions as pure.
reply
cempaka
2 days ago
[-]
Very nice article demonstrating a neat use of ASM bytecode. The Java language devs are also working on Project Babylon (code reflection), which will bring additional techniques to manipulate the output from the Java compiler: https://openjdk.org/projects/babylon/articles/code-models
reply
gavinray
2 days ago
[-]
This was delivered in JDK 24 as the "Class-File API"

https://openjdk.org/jeps/484

reply
algo_trader
2 days ago
[-]
Can this improve/replace AspectJ and similar instrumentations? We do lots of instruction level modifications
reply
1932812267
2 days ago
[-]
Scala has been using this technique for years with its scala.annotation.tailrec annotation. Regardless, it's cool to see this implemented as a bytecode pass.
reply
gavinray
2 days ago
[-]
Kotlin as well, with the "tailrec" keyword, e.g. "tailrec fun fibonacci()"

https://kotlinlang.org/docs/functions.html#tail-recursive-fu...

Kotlin also has a neat other tool, "DeepRecursiveFunction<T, R>" that allows defining deep recursion that is not necessarily tail-recursive.

Really useful if you wind up a problem that is most cleanly solved with mutual recursion or similar:

https://kotlinlang.org/api/core/kotlin-stdlib/kotlin/-deep-r...

reply
deepsun
2 days ago
[-]
Interesting, does it depend on Kotlin compiler or it can be implemented in Java as well?
reply
gavinray
1 day ago
[-]
The "DeepRecursiveFunction<T,R>" could be implemented in Java. The Kotlin implementation leverages Kotlin's native coroutines and uses continuations.

It'd require a bit of engineering to get something working in native Java I'd imagine, even with the new JDK Structured Concurrency API offering you a coroutines alternative.

On the other hand, "tailrec" is a keyword and implemented as a compiler optimization.

The closest I've seen in Java is a neat IntelliJ plugin that has a transformation to convert recursive method calls into imperative loops with a stack frame.

This transformation and resulting tool was the result of someone's thesis, it's pretty cool:

https://github.com/andreisilviudragnea/remove-recursion-insp...

reply
ncruces
1 day ago
[-]
It's been a long time since I've messed with Java bytecode [1], but shouldn't the private method call use INVOKESPECIAL?

In general I don't think you can do this to INVOKEVIRTUAL (or INVOKEINTERFACE) as it covers cases where your target is not statically resolved (virtual/interface calls). This transformation should be limited to INVOKESTATIC and INVOKESPECIAL.

You also need lots more checks to make sure you can apply the transformations, like ensure the call site is not covered by a try block, otherwise this is not semantics preserving.

1: https://jauvm.blogspot.com/

reply
lukaslalinsky
1 day ago
[-]
I never understood the need for tail recursion optimization in imperative languages. Sure, you need it in FP if you don't have loops and recursion is you only option, but what is the benefit of recursive algorithms, that could benefit from tail optimization (i.e recursive loops), in a language like Java?
reply
droideqa
2 days ago
[-]
Cool, now ABCL can have TCO!
reply
1932812267
2 days ago
[-]
This isn't a _general_ tail call optimization--just tail recursion. The issue is that this won't support mutual tail recursion.

e.g.:

(defun func-a (x) (func-b (- x 34)) (defun func-b (x) (cond ((<= 0 x) x) ('t (func-a (-x 3))))

Because func-a and func-b are different (JVM) functions, you'd need an inter-procedural goto (i.e. a tail call) in order to natively implement this.

As an alternative, some implementations will use a trampoline. func-a and func-b return a _value_ which says what function to call (and what arguments) for the next step of the computation. The trampoline then calls the appropriate function. Because func-a and func-b _return_ instead of actually calling their sibling, the stack depth is always constant, and the trampoline takes care of the dispatch.

reply
knome
1 day ago
[-]
Sounds like a manual form of clojures recur function.

https://clojuredocs.org/clojure.core/recur

reply
1932812267
1 day ago
[-]
Clojure's loop/recur is specifically tail recursion like scala's tailrec or the optimization described in the blogpost. It doesn't use trampolines to enable tail calls that aren't tail recursion.
reply
dapperdrake
2 days ago
[-]
Finally.

The ANTLR guys went through terrible contortions for their parsers.

Never felt like working those details out for ABCL.

reply