[0] https://github.com/oxidecomputer/omicron/issues/9259
[1] https://rfd.shared.oxide.computer/rfd/397
> Why does this situation suck? It’s clear that many of us haven’t been aware of cancellation safety and it seems likely there are many cancellation issues all over Omicron. It’s awfully stressful to find out while we’re working so hard to ship a product ASAP that we have some unknown number of arbitrarily bad bugs that we cannot easily even find. It’s also frustrating that this feels just like the memory safety issues in C that we adopted Rust to get away from: there’s some dynamic property that the programmer is responsible for guaranteeing, the compiler is unable to provide any help with it, the failure mode for getting it wrong is often undebuggable (by construction, the program has not done something it should have, so it’s not like there’s a log message or residual state you could see in a debugger or console), and the failure mode for getting it wrong can be arbitrarily damaging (crashes, hangs, data corruption, you name it). Add on that this behavior is apparently mostly undocumented outside of one macro in one (popular) crate in the async/await ecosystem and yeah, this is frustrating. This feels antithetical to what many of us understood to be a core principle of Rust, that we avoid such insidious runtime behavior by forcing the programmer to demonstrate at compile-time that the code is well-formed
The new write-up from OP is that you can "forget" a future (or just hold onto it longer than you meant to), in which case the code in the async function stops running but the destructors are NOT executed.
Both of these behaviors are allowed by Rust's fairly narrow definition of "safety" (which allows memory leaks, deadlocks, infinite loops, and, obviously, logic bugs), but I can see why you'd be disappointed if you bought into the broader philosophy of Rust making it easier to write correct software. Even the Rust team themselves aren't immune -- see the "leakpocalypse" from before 1.0.
If you're relying for global correctness on some future being continuously polled, you should just be spawning async tasks instead. Then the runtime takes care of the polling for you, you can't just neglect it - unless the whole thread is blocked, which really shouldn't happen. "Futures" are intentionally a lower-level abstraction than "async runtime tasks".
Look at .NET, it took almost a decade to sort out async/await across all platform and language layers, and even today there are a few gotchas.
https://github.com/gerardo-lijs/Asynchronous-Programming
Rust still has a similar path to trail, with async traits, better Pin ergonomics, async lambdas, async loops,..... (yes I know some of them have been dealt with).
It does feel like there's still generally possibilities of deadlocks in Rust concurrency right? I understand the feeling here that it feels like ... uhh... RAII-style _something_ should be preventing this, because it feels like statically we should be able to identify this issue in this simple case.
I still have a hard time understanding how much of this is incidental and how much of this is just downstream of the Rust/Tokio model not having enough to work on here.
Something like Actors, on top of Tokio, would be one way: https://ryhl.io/blog/actors-with-tokio/
In a rationale world, this works. In a prejudiced world, devs fight against locks in actor models.
Hence why I had to roll my own …
Maybe actor abstractions end up compiling away fairly nicely in Rust though!
I mean, is there any generic computation model where you can't have deadlocks? Even with stuff like actors you can trivially have cycles and now your blocking primitive is just different (not CPU-level), and we call it a livelock, but it's fundamentally the same.
Doesn't help in this case, but it does suggest that we might be able to do better.
When using “intra-task” concurrency, you really have to ensure that none of the futures are starving.
Spawning task should probably be the default. For timeouts use tokio::select! but make sure all pending futures are owned by it. I would never recommend FuturesUnordered unless you really test all edge-cases.
The OS can detect this and make T_low "inherit" the priority of T_high. I wonder if there is a similar idea possible with tokio? E.g. if you are awaiting a Mutex held by a future that "can't run", then poll that future instead. I would guess detecting the "can't run" case would require quite a bit of overhead, but maybe it can be done.
I think an especially difficult factor is that you don't even need to use a direct await.
let future1 = do_async_thing("op1", lock.clone()).boxed();
tokio::select! {
_ = &mut future1 => {
println!("do_stuff: arm1 future finished");
}
_ = sleep(Duration::from_millis(500)) => {
// No .await, but both will futurelock on future1.
tokio::select! {
_ = do_async_thing("op2", lock.clone()) => {},
_ = do_async_thing("op3", lock.clone()) => {},
};
}
};
I.e. so "can't run" detector needs to determine that no other task will run the future, and the future isn't in the current set of things being polled by this task.Something like this could make sense for Tokio tasks. (I don't know how complicated their task scheduler is; maybe it already does stuff like this?) But it's not possible for futures within a task, as in this post. This goes all the way back to the "futures are inert" design of async Rust: You don't necessarily need to communicate with the runtime at all to create a future or to poll it or to stop polling it. You only need to talk to the runtime at the task level, either to spawn new tasks, or to wake up your own task. Futures are pretty much just plain old structs, and Tokio doesn't know how many futures my async function creates internally, any more than it knows about my integers or strings or hash maps.
Go, unlike Rust, does not really have a notion of intra-task concurrency; goroutines are the fundamental unit of concurrency and parallelism. So, the Go runtime can reason about dependencies between goroutines quite easily, since goroutines are the things which it is responsible for scheduling. The fact that channels are a language construct, rather than a library construct implemented in the language, is necessary for this too. In (async) Rust, on the other hand, tasks are the fundamental unit of parallelism, but not of concurrency; concurrency emerges from the composition of `Future`s, and a single task is a state machine which may execute any number of futures concurrently (but not in parallel), by polling them until they cannot proceed without waiting and then moving on to poll another future until it cannot proceed without waiting. But critically, this is not what the task scheduler sees; it interacts with these tasks as a single top-level `Future`, and is not able to look inside at the nested futures they are composed of.
This specific failure mode can actually only happen when multiple futures are polled concurrently but not in parallel within a single Tokio task. So, there is actually no way for the Tokio scheduler to have insight into this problem. You could imagine a deadlock detector in the Tokio runtime that operates on the task level, but it actually could never detect this problem, because when these operations execute in parallel, it actually cannot occur. In fact, one of the suggestions for how to avoid this issue is to select over spawned tasks rather than futures within the same task.
I haven't yet read a way to prove it's correct, or even to reasonably prove a given program's use is not going to block.
With more traditional threads my mental model is that _everything_ always has to be interrupt-able, have some form of engineer chosen timeout for a parallel operation, and address failure of operation in design.
I never see any of that in the toy examples that are presented as educational material. Maybe Rust's async also requires such careful design to be safely utilized.
Really beautiful language design imo. Does a great job avoiding the typelevel brainfuck problem I have with Haskell.
From that it also follows that it may not be too fruitful to try to tackle every domain there is with a single language only.
(With that said, I absolutely love sync Rust, and Go is definitely not a good example of an elegantly designed language, I am talking in a more general way here)
Yeap. And this footgun is yet another addition to the long list of reasons why I consider the Rust async model with its "inert" futures managed in user space a fundamentally flawed un-Rusty design.
It's not at all obvious that Rust's is the only possible design that would work here. I strongly suspect it is not.
In fact, early Rust did some experimentation with exactly the sort of stack layout tricks you would need to approach this differently. For example, see Graydon's post here about the original implementation of iterators, as lightweight coroutines: https://old.reddit.com/r/ProgrammingLanguages/comments/141qm...
A few bare metal projects use stackless coroutines (technically resumable functions) for concurrency, but it has turned out to be a much smaller use-case than anticipated. In practice C and C++ coroutines are really not worth the pain that they are to use, and Rust async has mostly taken off with heavy-duty executors like Tokio that very much don't target tiny #[no-std] 16-bit microcontrollers.
The Kernel actually doesn't use resumable functions for background work, it uses kernel threads. In the wider embedded world threads are also vastly more common than people might think, and the really low-end uniprocessor systems are usually happy to block. Since these tiny systems are not juggling dozens of requests per second that are blocking on I/O, they don't gain that much from coroutines anyways.
We mostly see bigger Rust projects use async when they have to handle concurrent requests that block on IO (network, FS, etc), and we mostly observe that the ecosystem is converging on tokio.
Threads are not free, but most embedded projects today that process requests in parallel — including the kernel — are already using them. Eager futures are more expensive than lazy futures, and less expensive than threads. They strike an interesting middle ground.
Lazy futures are extremely cheap at runtime. But we're paying a huge complexity cost in exchange that benefits a very small user-base than hasn't really fully materialized as we hoped it would.
Well, no, at the time of the design of Rust's async MVP, everyone was pretty well aware that the vast majority of the users would be writing webservers, and that the embedded use case would be a decided minority, if it ever existed at all. That Embassy exists and its ecosystem as vibrant as it is is, if anything, an unexpected triumph.
But regardless of how many people were actually expected to use it in practice, the underlying philosophy remained thus: there exist no features of Rust-the-language that are incompatible with no_std environments (e.g. Rust goes well out of its way, and introduces a lot of complexity, to make things like closures work given such constraints), and it would be exceptional and unprecedented for Rust to violate this principle when it comes to async.
With my C++ background, I'm very much at home with that philosophy, but I think there is room for nuance in how strictly orthodox we are.
C++ does have optional language features that introduce some often unwelcone runtime overhead, like RTTI and unwinding.
Rust does not come configured for freestanding environments out of the box either. Like C++, you are opting out of language features like unwinding as well as the standard library when going freestanding.
I want to affirm that I'm convinced Rust is great for embedded. It's more that I mostly love async when I get to use it for background I/O with a full fledged work stealing thread-per-core marvel of engineering like tokio!
In freestanding Rust the I/O code is platform specific, suddenly I'd have to write the low-level async code myself, and it's not clear this makes the typical embedded project that much higher performance, or all that easy to maintain.
So, I don't want to say anything too radical. But I think the philosophy doesn't have to be as clear cut as no language feature ever incompatible with no-std. Offering a std only language feature is not necessarily closing a door to embedded. We sort of already make opt-out concessions to have a friendlier experience for most people.
(Apologies for the wall of text)
The problem is that the particular interface Rust chose for controlling dispatch is not granular enough. When you are doing your own dispatch, you only get access to separate tasks, but for individual futures you are at the mercy of combinators like `select!` or `FuturesUnordered` that only have a narrow view of the system.
A better design would continue to avoid heap allocations and allow you to do your own dispatch, but operate in terms of individual suspended leaf futures. Combinators like `join!`/`select!`/etc. would be implemented more like they are in thread-based systems, waiting for sub-tasks to complete, rather than being responsible for driving them.
For better or worse eager dispatch I think generally implies also not being able to cancel futures since ownership is transferred to the executor rather than being retained by your code.
What people care about are semantics. async/await leaks implementation details. One of the reasons Rust does it the way it currently does is because the implementation avoids requiring support from, e.g., LLVM, which might require some feature work to support a deeper level of integration of async without losing what benefits the current implementation provides. Rust has a few warts like this where semantics are stilted in order to confine the implementation work to the high-level Rust compiler.
Yes, I totally agree, and this is sort of what I imagine a better design would look like.
> One of the reasons Rust does it the way it currently does is because the implementation avoids requiring support from, e.g., LLVM
This I would argue is simply a failure of imagination. All you need from the LLVM layer is tail calls, and then you can manage the stack layout yourself in essentially the same way Rust manages Future layout.
You don't even need arbitrary tail calls. The compiler can limit itself to the sorts of things LLVM asks for- specific calling convention, matching function signatures, etc. when transferring control between tasks, because it can store most of the state in the stack that it laid out itself.
To explain, generally speaking, stackless coroutine async only need coloring because they are actually “independent stack”less coroutines. What they actually do is that they share the stack for their local state. This forces async function execution to proceed in LIFO order so you do not blow away the stack of the async function executing immediately after which demands state machine transforms to be safe. This is why you need coloring unlike stackful coroutine models which can execute, yield, and complete in arbitrary order since their local state is preserved in a safe location.
I have a blog series that goes into the concrete details if you like: https://jacko.io/async_intro.html
There's more nuance than this. You can keep polling futures as often as you want. When an async fn gets converted into the state machine, yielding is just expressed as the poll fn returning as not ready.
So it is actually possible for "a little bit" of work to happen, although that's limited and gets tricky because the way wakers work ensure that normally futures only get polled by the runtime when there's actually work for them to do.
It is good that not every language gives you this much control and gives some easier options for when those are adequate, but it is also good that there is some set of decent languages that do give you this degree of control for when it is necessary, and it is good that we are not surrendering that space to just C and/or C++. Unfortunately such control comes with footguns, at least over certain spans of time. Perhaps someone will figure out a way to solve this problem in Rust in the future.
The point of Erlang/Elixir is that it is as performant as possible, and Erlang's history is a testament to this. BEAM is wonderful, and really fast, along with the languages on it being ergonomic (OTP behaviors, supervisors, etc.).
Now BEAM is far from the only runtime juggling that many processes, but it remains a relatively slow VM. I rule-of-thumb it at 10x slower than C, making it a medium performance VM at best, and you want to watch your abstraction layers in those nicer languages like Gleam because further multiplicative slow downs can really start to bite.
The first serious Go program I wrote was a replacement for something written in Erlang, there was no significant architectural improvement in the rewrite (it was already reasonably well-architected), and from the first deployment, we went from 4 systems, sometimes struggling with the load spikes, to where just one could handle it all, even with BEAM being over a decade more mature and the Go clustering code being something I wrote over a few weeks rather than battle tested, optimized code.
BEAM is good at managing concurrency, but it is slowish in other ways. It's better than the dynamic scripting languages like Python by a good amount but it is not performance-competitive with a modern compiled language.
Thread 1 acquires A. Thread 2 acquires B. Thread 1 tries to acquire B. Thread 2 tries to acquire A.
In this case, the role "A" is being played by the front of the Mutex's lock queue. Role "B" is being played by the Tokio's actively executed task.
Based on this understanding, I agree that the surprising behavior is due to Tokio's Mutex/Lock Queue implementation. If this was an OS Mutex, and a thread waiting for the Mutex can't wake for some reason, the OS can wake a different thread waiting for that Mutex. I think the difficulty in this approach has to do with how Rust's async is implemented. My guess is the algorithm for releasing a lock goes something like this:
1. Pop the head of the wait queue. 2. Poll the top level tokio::spawn'ed task of the Future that is holding the Mutex.
What you want is something like this
For each Future in the wait queue (Front to Back): Poll the Future. If Success - Break ???Something if everything fails???
The reason this doesn't work has to do with how futures compose. Futures compile to states within a state machine. What happens when a future polled within the wait queue completes? How is control flow handed back to the caller?
I guess you might be able to have some fallback that polls the futures independently then polls the top level future to try and get things unstuck. But this could cause confusing behavior where futures are being polled even though no code path within your code is await'ing them. Maybe this is better though?
I fully agree that this and the cancellation issues discussed before can lead to surprising issues even to seasoned Rust experts. But I’m not sure what really can be improved under the main operating model of async rust (every future can be dropped).
But compared to working with callbacks the amount of surprising things is still rather low :)
On the other hand, it also wasn't our coworker who had written the code where we found the bug who was to blame, either. It wasn't a case of sloppy programming; he had done everything correctly and put the pieces together the way you were supposed to. All the pieces worked as they were supposed to, and his code seemed to be using them correctly, but the interaction of these pieces resulted in a deadlock that it would have been very difficult for him to anticipate.
So, our conclusion was, wow, this just kind of sucks. Not an indictment of async Rust as a whole, but an unfortunate emergent behavior arising from an interaction of individually well-designed pieces. Just something you gotta watch out for, I guess. And that's pretty sad to have to admit.
But it still suggests that `tokio::select` is too powerful. You don't need to get rid of `tokio::select`, you just need to consider creating a less powerful mechanism that doesn't risk exhibiting this problem. Then you could use that less powerful mechanism in the places where you don't need the full power of `tokio::select`, thereby reducing the possible places where this bug could arise. You don't need to get rid of the fully powerful mechanism, you just need to make it optional.
The ways I can think of for making select!() safer all involve runtime checks and allocations (possibly this is just a failure of my imagination!). But if that's the case, I would find it bothersome if our basic async building blocks like select/timeout in practice turn out to require more expensive runtime checks or allocations to be safe.
We have a point in the async design space where we pay a complexity price, but in exchange we get really neat zero-cost futures. But I feel like we only get our money's worth if we can actually statically prove that correct use won't deadlock, without the expensive runtime checks! Otherwise, can we afford to spend all this complexity budget?
The implementation of select!() does feel way too powerful in a way (it's a whole mini scheduler that creates implicit future dependencies hidden from the rest of the executor, and then sometimes this deadlocks!). But the need is pretty foundational, it shows up everywhere as a building block.
There is no real difference between a deadlock caused by a single thread acquiring the same non reentrant lock twice and a single thread with two virtual threads where the the first thread calls the code of the second thread inside the critical section. They are the same type of deadlock caused by the same fundamental problem.
>Remember too that the Mutex could be buried beneath several layers of function calls in different modules or packages. It could require looking across many layers of the stack at once to be able to see the problem.
That is a fundamental property of mutexes. Whenever you have a critical section, you must be 100% aware of every single line of code inside that critical section.
>There’s no one abstraction, construct, or programming pattern we can point to here and say "never do this". Still, we can provide some guidelines.
The programming pattern you're looking for is guaranteeing forward progress inside critical sections. Only synchronous code is allowed to be executed inside a critical section. The critical section must be as small as possible. It must never be interrupted, ever.
Sounds like a pain in the ass, right? That's right, locks are a pain in the ass.
No, just have select!() on a bunch of owned Futures return the futures that weren't selected instead of dropping them. Then you don't lose state. Yes, this is awkward, but it's the only logically coherent way. There is probably some macro voodoo that makes it ergonomic. But even this doesn't fix the root cause because dropping an owned Future isn't guaranteed to cancel it cleanly.
For the real root cause: https://news.ycombinator.com/item?id=45777234
How does that prevent this kind of deadlock? If the owned future has acquired a mutex, and you return that future from the select so that it might be polled again, and the user assigns it to a variable, then the future that has acquired the mutex but has not completed is still not dropped. This is basically the same as polling an `&mut future`, but with more steps.
Like I said, it doesn't:
> even this doesn't fix the root cause because dropping an owned Future isn't guaranteed to cancel it cleanly.
It fixes this:
> However you also wouldn’t be able run use select! in a while loop and try to acquire the same lock (or read from the same channel) without losing your position in the queue.
If you want to fix the root cause, see https://news.ycombinator.com/item?id=45777234
Ever since I started using Erlang it felt like I finally found 'the right way' when before then I did a lot of work with sockets and asynchronous worker threads. But even though it usually worked as advertised it had a large number of really nasty pitfalls which the actor model seemed to - effortlessy - step aside.
So I'm seriously wondering what the motivation was. I get why JS uses async, there isn't any other way there, by the time they added async it was too late to change the fundamentals of the language to such a degree. But rust was a clean slate.
You can write code using the actor model with Tokio. But it's not natural to do so.
1. embedded hardware, like you mentioned
2. high-performance stuff
3. "embedding" in the cross-language sense, with foreign function calls
Of course the "don't use a lot of resources" thing that makes Rust/C/C++ good for tiny hardware also tends to be helpful for performance on bigger iron. Similarly, the "don't assume much about your runtime" thing that's necessary for bare metal programming also helps a lot with interfacing with other languages. And "run on a GPU" is kind of all three of those things at once.
So yeah, which of those concerns was async Rust really designed around? All of them I guess? It's kind of like, once you put on the systems programming goggles for long enough, all of those things kind of blend together?
Yes, all of them. Futures needed to work on embedded platforms (so no allocation), needed to be highly optimizable (so no virtual dispatch), and need to act reasonably in the presence of code that crosses FFI boundaries (so no stack shenanigans). Once you come to terms with these constraints--and then add on Rust's other principles regarding guaranteed memory safety, references, and ownership--there's very little wiggle room for any alternative designs other than what Rust came up with. True linear types could still improve the situation, though.
Speaking of which, I'm kind of surprised we landed on a Waker design that requires/hand-rolls virtual dispatch. Was there an alternate universe where every `poll()` function was generic on its Waker?
.await(DEADLINE) (where deadline is any non 0 unit, and 0 is 'reference defined' but a real number) should have been the easy interface. Either it yields a value or it doesn't, then the programmer has to expressly handle failure.
Deadline would only be the minimum duration after which the language, when evaluating the future / task, would return the empty set/result.
This appears to be misunderstanding how futures work in Rust. The language doesn't evaluate futures or tasks. A future is just a struct with a poll method, sort of like how a closure in Rust is just a struct with a call method. The await keyword just inserts yield points into the state machine that the language generates for you. If you want to actually run a future, you need an executor. The executor could implement timeouts, but it's not something that the language could possibly have any way to enforce or require.
At least for Linux, offhand, popular task scheduler frequencies used to be 100 and 1000hz.
Looks like the Kernel's tracking that for tasks:
https://www.kernel.org/doc/html/latest/scheduler/sched-desig...
"In CFS the virtual runtime is expressed and tracked via the per-task p->se.vruntime (nanosec-unit) value."
I imagine the .vruntime struct field is still maintained with the newer "EEVDF Scheduler".
...
A Userspace task scheduler could similarly compare the DEADLINE against that runtime value. It would still reach that deadline after the minimum wait has passed, and thus be 'background GCed' at a time of the language's choice.
Getting back to Rust, even if not natural, I agree with the parent that the actor model is simply the better paradigm. Zero runtime allocation should still be possible, you just have to accept some constraints.
I think async looks simple because it looks like writing imperative code; unfortunately it is just obfuscating the complex reality underlying. The actor model makes things easier to reason about, even if it looks more complicated initially.
Question is - when will Zig become mature enough to become a legit choice next to say, Go or Rust?
I mean for a regular dev team, not necessarily someone who works deeply along with Andrew Kelley etc like Tigerbeetle.
I tend to write most of my async Rust following the actor model and I find it natural. Alice Rhyl, a prominent Tokio contributor, has written about the specific patterns:
So its a pity that Rust async design tried so hard to avoid any explicit allocations rather than using an explicit allocator that embedding can use to preallocate and reuse objects.
There is a lot riding on that “just”. Hardware stacks are very, very unlike heap memory allocators in pretty much every possible way other than “both systems provide access to memory.”
Tons and tons of embedded code assumes the stack is, indeed, a hardware stack. It’s far from trivial to make that code “just use a dummy/static allocator with the same api as a heap”; that code may not be in Rust, and it’s ubiquitous for embedded code to not be written with abstractions in front of its allocator—why would it do otherwise, given that tons of embedded code was written for a specific compiler+hardware combination with a specific (and often automatic or compiler-assisted) stack memory management scheme? That’s a bit like complaining that a specific device driver doesn’t use a device-agnostic abstraction.
And then the need to poll features by the runtime means that async in Rust requires non-trivial runtime going against the desire to avoid abstractions in the embedded.
Async without polling while stack-unfriendly requires less runtime. And if Rust supported type-safe region-based allocators when a bunch of things are allocated one by one and then released at once it could be a better fit for the embedded world.
This problem does not happen with a custom allocator where things to allocate are of roughly the same size and allocator uses same-sized cells to allocate.
that said there are a lot of parts of a lot of programs where a fully inlined and shake optimized async state machine isn't so critical.
it's reasonable to want a mix, to use async which can be heavily compiler optimized for performance sensitive paths, and use higher level abstractions like actors, channels, single threaded tasks, etc for less sensitive areas.
in a single threaded fully cooperative environment you could ensure this by implication of only one coroutine running at a time, removing data races, but retaining logical ones.
if you want to eradicate logical races, or have actual parallel computation, then the source data must be copied into the message, or the content of the message be wrapped in a lock or similar.
in almost all practical scenarios this means the data source copies data into messages.
This actually caused some issues with rust in the kernel because moving large structs could cause you to run out the small amount of stack space availabe on kernel threads (they only allocate 8-16KB of stack compared to a typical 8MB for a userspace thread). The pinned-init crate is how they ended solving this [1].
all of the complexity comes in when more than one part of the code is interested in the state at the same time, which is what this thread is about.
I'm not the right person to write a tl;dr, but here goes.
For actors, you're basically talking about green threads. Rust had a hard constraint that calls to C not have overhead and so green threads were out. C is going to expect an actual stack so you have to basically spin up a real stack from your green-thread stack, call the C function, then translate it back. I think Erlang also does some magic where it will move things to a separate thread pool so that the C FFI can block without blocking the rest of your Erlang actors.
Generally, async/await has lower overhead because it gets compiled down to a state machine and event loop. Languages like Go and Erlang are great, but Rust is a systems programming language looking for zero cost abstractions rather than just "it's fast."
To some extent, you can trade overhead for ease. Garbage collectors are easy, but they come with overhead compared to Rust's borrow checker method or malloc/free.
To an extent it's about tradeoffs and what you're trying to make. Erlang and Go were trying to build something different where different tradeoffs made sense.
EDIT: I'd also note that before Go introduced preemption, it too would have "pitfalls". If a goroutine didn't trigger a stack reallocation (like function calls that would make it grow the stack) or do something that would yield (like blocking IO), it could starve other goroutines. Now Go does preemption checks so that the scheduler can interrupt hot loops. I think Erlang works somewhat similarly to Rust in scheduling in that its actors have a certain budget, every function call decrements their budget, and when they run of of budget they have to yield back to the scheduler.
But Go does compile down to machine code, so that's why until it did pre-emption it needed that yield or hook.
Come to think of it: it is strange that such quota management isn't built into the CPU itself. It seems like a very logical thing to do. Instead we rely on hardware interrupts for pre-emption and those are pretty fickle. It also means that there is a fixed system wide granularity for scheduling.
Interrupts are at the most basic level an electrical signal to the CPU to tell it to load a new address into the next instruction pointer after pushing the current one and possibly some other registers onto the stack. That means you don't actually know when they will happen and they are transparent to the point that those two instructions that you put right after one another are possibly detoured to do an unknown amount of work in some other place.
Any kind of side effect from that detour (time spent, changes made to the state of the machine) has the potential to screw up the previously deterministic path that you were on.
To make matters worse, there are interrupts that can interrupt the detour in turn. There are ways in which you can tell the CPU 'not now' and there are ways in which those can be overridden. If you are lucky you can uniquely identify the device that caused the interrupt to be triggered. But this isn't always the case and given the sensitivity of the inputs involved it isn't rare at all that your interrupt will trigger without any ground to do so. If that happens and the ISR is not written with that particular idea in mind you may end up with a system in an undefined state.
Interrupts are a very practical mechanism. But they're also a nightmare to deal with in the otherwise orderly affairs of computing and troubleshooting interrupt related issues can eat up days, weeks or even months if you are really unlucky.
You can do straight line, single threaded, non concurrent code in an actor model. Mostly, that's what most of the actor code will look like. Get a message, update local state in a straight forward way, send a response, repeat.
Rust async is, however, very useful in single-core embedded systems that don't have an operating system with preemptive multitasking, where one thread of execution is all you ever get. It's nice to have a way to express that you might be doing multiple things concurrently in an event-driven way without having to have an OS to manage preemptive multitasking.
I guess cancellation is really two different things, which usually happen at the ~same time, but not in this case: 1) the future stops getting polled, and 2) the future gets dropped. In this example the drop is delayed, and because the future is holding a guard,* the delay has side effects. So the future "has been cancelled" in the sense that it will never again make forward progress, but it "hasn't been cancelled yet" in the sense that it's still holding resources. I wonder if it's practical to say "make sure those two things always happen together"?
* Technically a Tokio-internal `Acquire` future that owns a queue position to get a guard, but it sounds like the exact same bug could manifest after it got the guard too, so let's call it a guard.
Holding a lock while waiting for IO can destroy a system's performance. With async Rust, we can prevent this by making the MutexGuard !Send, so it cannot be held across an await. Specifically, because it is !Send, it cannot be stored in the Future [2], so it must be dropped immediately, freeing the lock. This also prevents Futurelock deadlock.
This is how I wrote safina::sync::Mutex [0]. I did try to make it Send, like Tokio's MutexGuard, but stopped when I realized that it would become very complicated or require unsafe.
> You could imagine an unfair Mutex that always woke up all waiters and let them race to grab the lock again. That would not suffer from risk of futurelock, but it would have the thundering herd problem plus all the liveness issues associated with unfair synchronization primitives.
Thundering herd is when clients overload servers. This simple Mutex has O(n^2) runtime: every task must acquire and release the mutex, which adds all waiting tasks to the scheduler queue. In practice, scheduling a task is very fast (~600ns). As long as polling the lock-mutex-future is fast and you have <500 waiting tasks, then the O(n^2) runtime is fine.
Performance is hard to predict. I wrote Safina using the simplest possible implementations and assumed they would be slow. Then I wrote some micro-benchmarks and found that some parts (like the async Mutex) actually outperform Tokio's complicated versions [1]. I spent days coding optimizations that did not improve performance (work stealing) or even reduced performance (thread affinity). Now I'm hesitant to believe assumptions and predictions about performance, even if they are based on profiling data.
[0] https://docs.rs/safina/latest/safina/sync/struct.MutexGuard....
[1] https://docs.rs/safina/latest/safina/index.html#benchmark
[2] Multi-threaded async executors require futures to be Send.
To clarify, I do still think it's probably wise to prefer using a mutex whose LockGuard is not Send. If you're in an async context though, it seems clearly preferable to use a mutex that lets you await on lock instead of possibly blocking. Looks like that's what that Safina gives you.
It does bring to mind the point though - does it really make sense to call all of these things Mutexes? Most Mutexes, including the one in std, seem relatively simplistic, with no provision for exactly what happens if multiple threads/tasks are waiting to acquire the lock. As if they're designed for the case of, it's probably rare to never for multiple threads to actually need this thing at once, but we have to guard against it just to be certain. The case of this resource is in high demand by a bunch of threads, we expect there to be a lot of time spent by a lot of threads waiting to get the lock, so it's actually important which lock requesters actually get the lock in what order, seems different enough that it maybe ought to have a different name and more flexibility and selection as to what algorithm is being used to control the lock order.
In real world, the futurelock could occur even with very short locks, it just wouldn't be so deterministic. Having a minimal reproducer that you have to run a thousand times and it will maybe futurelock doesn't really make for a good example :)
You have to explain the problem properly then. The problem here has nothing to do with duration whatsoever so don't bring that up. The problem here is that if you acquire a lock, you're inside a critical section. Critical sections have a programming paradigm that is equivalent to writing unsafe Rust. You're not allowed to panic inside unsafe Rust or inside critical sections. It's simply not allowed.
You're also not allowed to interrupt the critical section by something that does not have a hard guarantee that it will finish. This rules out await inside the critical section. You're not allowed to do await. It's simply not allowed. The only thing you're allowed to do is execute an instruction that guarantees that N-1 instructions are left to be executed, where N is a finite number. Alternatively you do the logical equivalent. You have a process that has a known finite bound on how long it will take to execute and you are waiting for that external process.
After that process has finished, you release the lock. Then you return to the scheduler and execute the next future. The next future cannot be blocked because the lock has already been released. It's simply impossible.
You now have to explain how the impossible happened. After all, by using the lock you've declared that you took all possible precautions to avoid interrupting the critical section. If you did not, then you deserve any bugs coming your way. That's just how locks are.
https://play.rust-lang.org/?version=stable&mode=debug&editio...
It still futurelocks.
> After that process has finished, you release the lock. Then you return to the scheduler and execute the next future. The next future cannot be blocked because the lock has already been released. It's simply impossible.
This is true with threads and with tasks that only ever poll futures sequentially. It is not true in the various cases mentioned in this RFD (notably `tokio::select!`, but also others). Intuitively: when you have one task polling on multiple futures concurrently, you're essentially adding another layer to the scheduler (kernel thread scheduler, tokio task scheduler, now some task is acting as its own future scheduler). The problem is it's surprisingly easy to (1) not realize that and (2) accidentally have that "scheduler" not poll the next runnable future and then get stuck, just like if the kernel scheduler didn't wake up a runnable thread.
Same thing with task preemption, though that one has less organisatorial impact.
In general, getting something to perform well enough on specific tasks is a lot easier than performing well enough on tasks in general. At the same time, most tasks have kinda specific needs when you start looking at them..
https://notes.eatonphil.com/2024-08-20-deterministic-simulat...
I hope something like this becomes popular in the Rust/Tokio space. It seems like Turmoil is that?
Sadly, the only solution I know of is to use an explicit cancellation signal, and to modify ~everything to work with it. In that world, almost all async functions would need to accept a cancellation parameter of some sort, like a Go Context or like the tokio-utils CancellationToken, and explicitly check it every time they await a function. The new select!-equivalent would need to signal cancellations and then keep polling all unfinished cancellation-aware futures in a loop until they finished, and maybe immediately drop all non-aware futures to prevent futurelock. The entire Tokio API would need to be wrapped to take into account cancellation tokens, as well as any other async library you would want to use.
A lot of work, and you would need to do something if cancel-aware futures get dropped anyway. What a mess.
We are (on brand?) going to do a podcast episode on this on Monday[1]; ahead of that conversation I'm going to get a clip of that video out, just because it's interesting to see the team work together to debug it.
Its footgun-y nature has been known for years (IIRC even the first version of the tokio documentation warned against that) and as such I don't really understand why people are still using it. (For context I was the lead of a Rust team working on a pretty complex async networking program and we had banned select! very early in the project and never regretted this decision once).
> The lock is given to future1
> future1 cannot run (and therefore cannot drop the Mutex) until the task starts running it.
This seems like a contradiction to me. How can future1 acquire the Mutex in the first place, if it cannot run? The word "given" is really odd to me.
Why would do_async_thing() not immediately run the prints, return, and drop the lock after acquiring it? Why does future1 need to be "polled" for that to happen? I get that due to the select! behavior, the result of future1 is not consumed, but I don't understand how that prevents it from releasing the mutex.
It's more typical in my experience that the act of granting the lock to a thread is what makes it runnable, and it runs right then. Having to take some explicit second action to make that happen seems fundamentally broken to me...
EDIT: Rephrased for clarity.
`future1` did run for a bit, and it got far enough to acquire the mutex. (As the article mentioned, technically it took a position in a queue that means it will get the mutex, but that's morally the same thing here.) Then it was "paused". I put "paused" in scare quotes because it kind of makes futures sound like processes or threads, which have a "life of their own" until/unless something "interrupts" them, but an important part of this story is that Rust futures aren't really like that. When you get down to the details, they're more like a struct or a class that just sits there being data unless you call certain methods on it (repeatedly). That's what the `.await` keyword does for you, but when you use more interesting constructs like `select!`, you start to get more of the details in your face.
It's hard to be more concrete than that without getting into an overwhelming amount of detail. I wrote a set of blog posts that try to cover it without hand-waving the details away, but they're not short, and they do require some Rust background: https://jacko.io/async_intro.html
If I'm writing bare metal code for e.g. a little cortex M0, I can very much see the utility of this abstraction.
But it seems like an absolutely absurd exercise for code running in userspace on a "real" OS like Linux. There should be some simpler intermediate abstraction... this seems like a case of forcing a too-complex interface on users who don't really require it.
On the other hand, some direct uses of futures are reminiscent of the tendency to obsess over ownership and borrowing to maximize sharing, when you could just use .clone() and it wouldn’t make any practical difference. Because Rust is so explicit, you can see the overhead so you want to minimize it.
> But it seems like an absolutely absurd exercise for code running in userspace on a "real" OS like Linux
Clearly you have a point here, which is why these blog posts are making an impact. That said, one counterpoint is, have you ever wished you could kill a thread? The reason there are so many old Raymond Chen "How many times does it have to be said: Never call TerminateThread" blog posts, is that lots of real world applications really desperately want to call TerminateThread, and it's hard to persuade them to stop! The ability to e.g. put a timeout on any async function call is basically this same superpower, without corrupting your whole process (yay), but still with the unavoidable(?) difficulty of thinking about what happens when random functions give up halfway through.
> It's more typical in my experience that the act of granting the lock to a thread is what makes it runnable, and it runs right then.
This gets at why this felt like a big deal when we ran into this. This is how it would work with threads. Tasks and futures hook into our intuitive understanding of how things work with threads. (And for tasks, that's probably still a fair mental model, as far as I know.) But futures within a task are different because of the inversion of control: tasks must poll them for them to keep running. The problem here is that the task that's responsible for polling this future has essentially forgotten about it. The analogous thing with threads would seem to be something like if the kernel forgot to enqueue some runnable thread on a run queue.
So async Rust introduces an entire novel class of subtle concurrent programming errors? Ugh, that's awful.
> The analogous thing with threads would seem to be something like if the kernel forgot to enqueue some runnable thread on a run queue.
Yes. But I've never written code in a preemptible protected mode environment like Linux userspace where it is possible to make that mistake. That's nuts to me.
From my POV this seems like a fundamental design flaw in async rust. Like, on a bare metal thing I expect to deal with stuff like this... but code running on a real OS shouldn't have to.
To keep it in perspective, though: we've been operating a pretty good size system that's heavy on async Rust for a few years now and this is the first we've seen this problem. Hitting it requires a bunch of things (programming patterns and runtime behavior) to come together. It's really unfortunate that there aren't guard rails here, but it's not like people are hitting this all over the place.
The thing is that the alternatives all have tradeoffs, too. With threaded systems, there's no distinction in code between stuff that's quick vs. stuff that can block, and that makes it easy to accidentally do time-consuming (blocking) work in contexts that don't expect it (e.g., a lock held). With channels / message passing / actors, having the receiver/actor go off and do something expensive is just as bad as doing something expensive with a lock held. There are environments that take this to the extreme where you can't even really block or do expensive things as an actor, but there the hidden problem is often queueing and backpressure (or lack thereof). There's just no free lunch.
I'd certainly think carefully in choosing between sync vs. async Rust. But we've had a lot fewer issues with both of these than I've had in my past experience working on threaded systems in C and Java and event-oriented systems in C and Node.js.
In Python, I often use the Trio library, which offers "structured, concurrency": tasks are (only) spawned into lexical scopes, and they are all completed (waited for) before that scope is left. That includes waiting for any cancelled tasks (which are allowed to do useful async work, including waiting for any of their own task scopes to complete).
Could Rust do something like that? It's far easier to reason about than traditional async programs, which seems up Rust's street. As a bonus it seems to solve this problem, since a Rust equivalent would presumably have all tasks implicitly polled by their owning scope.
A task is the thing that drives progress by polling some futures. But one of those futures may want to handle polling for other futures that it made, which is where this arises.
As the article says, one option is to spawn everything as a task, but that doesn't solve all problems, and precludes some useful ways of using futures.
A task is a different construct and usually tied to the runtime. If you look at the suggestions in the RFD they call out using a task explicitly instead of polling a future in place.
There's some debate to be had over what constitutes "cancellation." The article and most colloquial definitions I've heard define it as a future being dropped before being polled to completion. Which is very clean - if you want to cancel a future, just drop it. Since Rust strongly encourages RAII, cleanup can go in drop implementations.
A much tougher definition of cancellation is "the future is never polled again" which is what the article hits on. The future isn't dropped but its poll is also unreachable, hence the deadlock.
Yes, this is correct. However, many of the use cases for select rely on the fact that it doesn't run all the tasks to completion. I've written many a select! statement to implements timeouts or other forms of intentionally preempting a task. Sometimes I want to cancel the task and sometimes I want to resume it after dealing with the condition that caused the preemption -- so the behavior in the article is very much an intentional feature.
> even if the async runtime is careful, is it still possible to create and fail to poll a raw Future by accident?
This is also the case. There's nothing magic about a future; it's just an ordinary object with a poll function. Any code can create a future and do whatever it likes with it; including polling it a few times and then stopping.
Despite being included as part of Tokio, select! does not interact with the runtime or need any kind of runtime support at all. It's an ordinary function that creates a future which waits for the first of several "child" futures to complete; similar functions are also provided in other prominent ecosystem crates besides Tokio and can be implemented in user code as well.
That seems like a different requirement than "all arguments are tasks". If I understand it right (and quite possibly I don't), making them all tasks means that they are all polled and therefore continue progressing until they are dropped. It doesn't mean that select! would have to run them all the way to completion.
> making them all tasks means that they are all polled and therefore continue progressing until they are dropped. It doesn't mean that select! would have to run them all the way to completion.
This is exactly correct, but oftentimes the reason you're using select is because you don't want to run the futures all the way to completion. In my experience, the most common use cases for select are:
- An event handler loop that receives input from multiple channels. You could replace this with multiple tasks, one reading from each channel; but this could potentially mess with your design for queueing / backpressure -- often it's important for the loop to pause reading from the channels while processing the event.
- An operation that's run with a timeout, or a shutdown event from a controlling task. In this case I want the future to be dropped when the task is cancelled.
The example in the original post was the second case: an operation with a timeout. They wanted the operation to be cancelled when the timeout expired, but because the select statement borrowed the future, it only suspended the future instead of cancelling it. This is a very common code pattern when calling select! in a loop, when you want a future to be resumed instead of restarted on the next loop iteration -- it's very intentional that select! allows you to use either way, because you often want either behavior.
This problem would have been avoided by taking the future by value instead of by reference.
There's a lot of talk about Rust's await implementation, but I don't really think that's the issue here. After all, Rust doesn't guarantee convergence. Tokio, on the other hand (being a library that handles multi-threading), should (at least when using its own constructs, e.g. the `select!` macro).
So, since the crux of the problem is the `tokio::select!` macro, it seems like a pretty clear tokio bug. Side note, I never looked at it before, but the macro[1] is absolutely hideous.
[1] https://docs.rs/tokio/1.34.0/src/tokio/macros/select.rs.html
In the case of `select!`, it is a direct consequence of the ability to poll a `&mut` reference to a future in a `select!` arm, where the future is not dropped should another future win the "race" of the select. This is not really a choice Tokio made when designing `select!`, but is instead due to the existence of implementations of `Future` for `&mut T: Future + Unpin`[1] and `Pin<T: Future>`[2] in the standard library.
Tokio's `select!` macro cannot easily stop the user from doing this, and, furthermore, the fact that you can do this is useful --- there are many legitimate reasons you might want to continue polling a future if another branch of the select completes first. It's desirable to be able to express the idea that we want to continually poll drive one asynchronous operation to completion while periodically checking if some other thing has happened and taking action based on that, and then continue driving forward the ongoing operation. That was precisely what the code in which we found the bug was doing, and it is a pretty reasonable thing to want to do; a version of the `select!` macro which disallows that would limit its usefulness. The issue arises specifically from the fact that the `&mut future` has been polled to a state in which it has acquired, but not released, a shared lock or lock-like resource, and then another arm of the `select!` completes first and the body of that branch runs async code that also awaits that shared resource.
If you can think of an API change which Tokio could make that would solve this problem, I'd love to hear it. But, having spent some time trying to think of one myself, I'm not sure how it would be done without limiting the ability to express code that one might reasonably want to be able to write, and without making fundamental changes to the design of Rust async as a whole.
[1] https://doc.rust-lang.org/stable/std/future/trait.Future.htm... [2]: https://doc.rust-lang.org/stable/std/future/trait.Future.htm...
It feels like a lot of the way Rust untangles these tricky problems is by identifying slightly more contextful abstractions, though at the cost of needing more scratch space in the mind for various methods
1. Create future A.
2. Poll future A at least once but not provably poll it to completion and also not drop it. This includes selecting it.
3. Pause yourself by awaiting anything that does not involve continuing to poll A.
I’m struggling a bit to imagine the scenario in which it makes sense to pause a coroutine that you depend on in the middle like this. But I also don’t immediately see a way to change a language like Rust to reliably prevent doing this without massively breaking changes. See my other comment :)
Although the design of the `tokio::select!` macro creates ways to run into this behavior, I don't believe the problem is specific to `tokio`. Why wouldn't the example from the post using Streams happen with any other executor?
For example, I can just do `loop { }` which the language is perfectly okay with letting me do anywhere in my code (and essentially hanging execution). But if I'm using a library and I'm calling `innocuous()` and there's a `loop { }` buried somewhere in there, that is (in my opinion) the library's responsibility.
N.B. I don't know enough about tokio's internals to suggest any changes and don't want to pretend like I'm an expert, but I do think this caveat should be clearly documented and a "safe" version of `select!` (which wouldn't work with references) should be provided.
one of the key advertised selling points in some of the other runtimes was specifically around behavior of tasks on drop of their join handles for example, for reasons closely related to this post.
In my mind futurelock is similar to keeping a sync lock across an await point. We have nothing right now to force a drop and I think the solution to that problem would help here.
[1] timestamped: https://youtu.be/zrv5Cy1R7r4?t=1067
So maybe all that is needed is a lint that warns if you keep a Future (or a reference to one) across an await point? The Future you are awaiting wouldn't count of course. Is there some case where this doesn't work?
Fundamentally, if you have two coroutines (or cooperatively scheduled threads or whatever), and one of them holds a lock, and the other one is awaiting the lock, and you don’t schedule the first one, you’re stuck.
I wonder if there’s a form of structured concurrency that would help. If I create two futures and start both of them (in Rust this means polling each one once) but do not continue to poll both, then I’m sort of making a mistake.
So imagine a world where, to poll a future at all, I need to have a nursery, and the nursery is passed in from my task and down the call stack. When I create a future, I can pass in my nursery, but that future then gets an exclusive reference to my future until it’s complete or cancelled. If I want to create more than one future that are live concurrently, I need to create a FutureGroup (that gets an exclusive reference to my nursery) and that allows me to create multiple sub-nurseries that can be used to make futures but cannot be used to poll them — instead I poll the FutureGroup.
(I have yet to try using an async/await system or a reactor or anything of the sort that is not very easy to screw up. My current pet peeve is this pattern:
data = await thingy.read()
What if thingy.read() succeeds but I am cancelled? This gets nasty is most programming languages. Python: the docs on when I can get cancelled are almost nonexistent, and it’s not obviously possible to catch the CancelledError such that I still have data and can therefore save it somewhere so it’s not lost. Rust: what if thingy thinks it has returned the data but I’m never polled again? Maybe this can’t happen if I’m careful, but that requires more thought than I’m really happy with.)And it looks like it's still just an unaddressed well known problem [2].
Honestly, once the Mozilla sackening of rust devs happened it seems like the language has been practically rudderless. The RFC system seems almost dead as a lot of the main contributors are no longer working on rust.
This initiative hasn't had motion since 2021. [3]
[1] https://rust-lang.github.io/async-fundamentals-initiative/ro...
[2] https://rust-lang.github.io/async-fundamentals-initiative/
[3] https://github.com/rust-lang/async-fundamentals-initiative
I think "practically rudderless" here is fairly misinformed and a little harmful/rude to all the folks doing tons of great work still.
It's a shame there are some stale pages around and so on, but they're not good measures of the state of the project or ecosystem.
The problem of holding objects across async points is also partially implemented in this unstable lint marker which is used by some projects: https://dev-doc.rust-lang.org/unstable-book/language-feature...
You also get a similar effect in multi-threaded runtimes by not arbitrarily making everything in your object model Send and instead designing your architecture so that most things between wake-ups don't become arbitrarily movable references.
These aren't perfect mitigations, but some tools.
That great work is mostly opaque on the outside.
What's been noticeable as an observer is that a lot of the well known names associated with rust no longer work on it and there's been a large amount of turnover around it.
That manifests in things like this case where work was in progress up until ~2021 and then was ultimately backburnered while the entire org was reshuffled. (I'd note the dates on the MCP as Feb 2024).
I can't tell exactly how much work or what direction it went in from 2021 to 2024 but it does look apparent that the work ultimately got shifted between multiple individuals.
I hope rust is in a better spot. But I also don't think I was being unfair in pointing out how much momentum got wrecked when Mozilla pulled support.
That's not always super visible if you're not following the working groups or in contact with folks working on the stuff. It's entirely fair that they're prioritizing getting work done than explaining low level language challenges to everyone everywhere.
I think you're seeing a lack of data and trying to use that as a justification to fit a story that you like, more than seeing data that is derivative of the story that you like. Of course some people were horribly disrupted by the changes, but language usage also expanded substantially during and since that time, and there are many team members employed by many other organizations, and many independents too.
And there are more docs, anyway:
https://rust-lang.github.io/rust-project-goals/2024h2/async.... https://rust-lang.github.io/rust-project-goals/2025h1/async.... https://rust-lang.github.io/rust-project-goals/2025h2/field-... https://rust-lang.github.io/rust-project-goals/2025h2/evolvi... https://rust-lang.github.io/rust-project-goals/2025h2/goals....
The discarded futures will never be run again.
Normally when a future is discarded it's dropped. When a future holding lock is dropped, lock is released, but it's passing future borrow to select so the discarded future is not dropped while holding lock.
So it leaves a future that holds a lock that will never run again.
It's hard to verify these protocols and very easy to write something fragile.
I'm going to write down the order of events.
1. Background task takes the lock and holds it for 5 seconds.
2. Async Thing 1 tries to take the lock, but must wait for background task to release it. It is next in line to get the lock.
3. We fire off a goroutine that's just sleeping for a second.
4. Select wants to find a channel that is finished. The sleepChan finishes first (since it's sleeping for 1 second) while Async Thing 1 is still waiting 4 more seconds for the lock. So select will execute the sleepChan case.
5. That case fires off Async Thing 2. Async Thing 2 is waiting for the lock, but it is second in line to get the lock after Async Thing 1.
6. Async Thing 1 gets the lock and is ready to write to its channel - but the main is paused trying to read from c2, not c1. Main is "awaiting" on c2 via "<-c2". Async Thing 1 can't give up its lock until it writes to c1. It can't write to c1 until c1 is "awaited" via "<-c1". But the program has already gone into the other case and until the sleepChan case finishes, it won't try to await c1. But it will never finish its case because its case depends on c1 finishing first.
You can use buffered channels in Go so that Async Thing 1 can write to c1 without main reading from it, but as the article notes you could use join_all in Rust.
But the issue is that you're saying with "select" in either Go or Rust "get me the first one that finishes" and then in the branch that finishes first, you are awaiting a lock that will get resolved when you read the other branch. It just doesn't feel like something that is Rust specific.
func main() {
lock := sync.Mutex{}
c1 := make(chan string)
c2 := make(chan string)
sleepChan := make(chan bool)
go start_background_task(&lock)
time.Sleep(1 * time.Millisecond) //make sure it schedules start_background_task first
go do_async_thing(c1, "op1", &lock)
go func() {
time.Sleep(1 * time.Second)
sleepChan <- true
}()
for range 2 {
select {
case msg1 := <-c1:
fmt.Println("In the c1 case")
fmt.Printf("received %s\n", msg1)
case _ = <-sleepChan:
fmt.Println("In the sleepChan case")
go do_async_thing(c2, "op2", &lock)
fmt.Printf("received %s\n", <-c2) // "awaiting" on c2 here, but c1's lock won't be given up until we read it
}
}
fmt.Println("all done")
}
func start_background_task(lock *sync.Mutex) {
fmt.Println("starting background task")
lock.Lock()
fmt.Println("acquired background task lock")
defer lock.Unlock()
time.Sleep(5 * time.Second)
fmt.Println("dropping background task lock")
}
func do_async_thing(c chan string, label string, lock *sync.Mutex) {
fmt.Printf("%s: started\n", label)
lock.Lock()
fmt.Printf("%s: acuired lock\n", label)
defer lock.Unlock()
fmt.Printf("%s: done\n", label)
c <- label
}Menawhile in Rust it looks like it took thousands of dollars in engineering time to find the issue.
This the programming equivalent of using welding (locks) to make a chain loop, you've just done it with the 3D space impossible two links case.
As with the sin of .await(no deadline), the sin here is not adding a deadline.
async code is so so so much more complex. It’s so hard to read and rationalize. I could not follow this post. I tried. But it’s just a full extra order of complexity.
Which is a shame because async code is supposed to make code simpler! But I’m increasingly unconfident that’s true.
A fair lock[1] is designed to wake up the longest-waiting task, since it got to the queue first and might otherwise be starved if the algorithm doesn't guarantee it gets to the head of the queue.
BUT CRITICALLY: a future isn't a task. It's not a thread, it's not guaranteed to "run". It's just a flag that gets set somewhere. But it can consume that wakeup nonetheless. So it's possible to "wake up"[2] a future that isn't actually being polled, and won't be, until something else that is waiting on the resource that just tried to wake it up.
I don't see that these concepts are ever going to work together. You can't have locks generating wakeup events that aren't consumed. If you're going to use them with async, you need to do something like a broadcast to guarantee that every waiter sees an event.
Stated differently: the lock is signalling an edge-triggered interrupt, but rust async demands level sensititivity.
[1] In one sense of fair. There are others, like "switch now" vs. "defer context switch", but that's not relevant here.
[2] Which doesn't actually wake anything up, thus the bug.
> The behavior of tokio::select! is to poll all branches' futures only until one of them returns `Ready`. At that point, it drops the other branches' futures and only runs the body of the branch that’s ready.
This is, unfortunately, doing what it's supposed to do: acting as a footgun.
The design of tokio::select!() implicitly assumes it can cancel tasks cleanly by simply dropping them. We learned the hard way back in the Java days that you cannot kill threads cleanly all the time. Unsurprisingly, the same thing is true for async tasks. But I guess every generation of programmers has to re-learn this lesson. Because, you know, actually learning from history would be too easy.
Unfortunately there are a bunch of footguns in tokio (and async-std too). The state-machine transformation inside rustc is a thing of beauty, but the libraries and APIs layered on top of that should have been iterated many more times before being rolled out into widespread use.
Crucially, however, because Futures have no independent existence, they can be indefinitely paused if you don't actively and repeatedly .poll() them, which is the moral equivalent of cancelling a Java Thread. And this is represented in language state as a leaked object, which is explicitly allowed in safe Rust, although the language still takes pains to avoid accidental leakage. The only correct way to use a future is to poll it to completion or drop it.
The problem is that in this situation, tokio::select! only borrows the future and thus can't drop it. It also doesn't know that dropping the Future does nothing, because borrows of futures are still futures so all the traits still match up. It's a combination of slightly unintuitive core language design and a major infrastructure library not thinking things out.
Then you could merge a `Stream<A>` and `Stream<B>` into a `Stream<Either<A,B>>` and pull from that. Since you're dealing with owned streams, dropping the stream forces some degree of cleanup. There are still ways to make a mess, but they take more effort.
....................................
Ratelimit so I have to reply to mycoliza with an edit here:That example calls `do_thing()`, whose body does not appear anywhere in the webpage. Use better identifiers.
If you meant `do_stuff()`, you haven't replaced select!() with streams, since `do_stuff()` calls `select!()`.
The problem is `select!()`; if you keep using `select!()` but just slather on a bunch of streams that isn't going to fix anything. You have to get rid of select!() by replacing it with streams.
Perhaps the full example should have been reproduced in the RFD for clarity…
>This RFD describes futurelock: a type of deadlock where a resource owned by Future A is required for another Future B to proceed, while the Task responsible for both Futures is no longer polling A. Futurelock is a particularly subtle risk in writing asynchronous Rust.
I was honestly wondering how you could possibly cause this in any sane code base. How can an async task hold a lock and keep it open? It sounds illogical, because critical sections are meant to be short and never interrupted by anything. You're also never allowed to panic, which means you have to write no panic Rust code inside a critical section. Critical sections are very similar to unsafe blocks, but with the caveat that they cannot cause complete take over of your application.
So how exactly did they bring about the impossible? They put an await call inside the critical section. The part of the code base that is not allowed to be subject to arbitrary delays. Massive facepalm.
When you invoke await inside a critical section, you're essentially saying "I hereby accept that this critical section will last an indeterminate amount of time, I am fully aware of what the code I'm calling is doing and I am willing to accept the possibility that the release of the lock may never come, even if my own code is one hundred percent correct, since the await call may contain an explicit or implicit deadlock"
I'm not sure where you got the impression that the example code was where we found the problem. That's a minimal reproducer trying to explain the problem from first principles because most people look at that code and think "that shouldn't deadlock". It uses a Mutex because people are familiar with Mutexes and `sleep` just to control the interleaving of execution. The RFD shows the problem in other examples without Mutexes. Here's a reproducer that futurelocks even though nobody uses `await` with the lock held: https://play.rust-lang.org/?version=stable&mode=debug&editio...
> I was honestly wondering how you could possibly cause this in any sane code base.
The actual issue is linked at the very top of the RFD. In our cases, we had a bounded mpsc channel used to send messages to an actor running in a separate task. That actor was working fine. But the channel did become briefly saturated (i.e., at capacity) at a point where someone tried to send on it via a `tokio::select!` similar to the one in the example.
Structured concurrency will always win IMO.
Deadlocks can happen anywhere? You can replicate this pattern in golang.
Then maybe you should take a moment to pick more descriptive identifiers than future1, future2, future3, do_stuff, and do_async_thing. This coding style is atrocious.
> In this case, what’s dropped is &mut future1. But future1 is not dropped, so the actual future is not cancelled.
I always said if your code locks or use atomics, it's wrong. Everyone says I'm wrong but you get things like what's described in the article. I'd like to recommend a solution but there's pretty much no reasonable way to implement multi-threading when you're not an expert. I heard Erlang and Elixir are good but I haven't tried them so I can't really comment
Ok so say you are simulating high energy photons (x-rays) flowing through a 3d patient volume. You need to simulate 2 billion particles propagating through the patient in order to get an accurate estimation of how the radiation is distributed. How do you accomplish this without locks or atomics without the simulation taking 100 hours to run? Obviously it would take forever to simulate 1 particle at a time, but without locks or atomics the particles will step on each others' toes when updating radiation distribution in the patient. I suppose you could have 2 billion copies of the patient's volume in memory and each particle gets its own private copy and then you merge them all at the end...
I'm saying if you're not writing multi-threaded code everyday, use a library. It can use atomics/locks but you shouldn't use it directly. If the library is designed well it'd be impossible to deadlock.
With a library that encapsulates a low number of patterns (like message passing) you'll be very limited. If you never start learning about lower level multi-threading issues you'll never learn it. And it's not _that_ hard.
I'm not writing multi threaded every day (by far), but often enough that I can write useful things (using shared memory, atomics, mutexes, condition variables, etc). And I'm looking forward to learn more, better understand various issues, learn new patterns.
Why atomics?
Just say no to atomics (unless they're hidden in a well written library)
With a little bit of experience and a bit of care, multithreading isn't _that_ hard. You just need to design for it. You can reduce the number of critical pieces.