Avoid Async Rust At All Cost

Avoid async in your codebase and in your dependencies at all cost.

Reasons:

  1. Leaky abstraction problem which leads to "async contamination".
  2. Violation of the zero-cost abstractions principle.
  3. Major degradation in developer's productivity.
  4. Most advertised benefits are imaginary, too expensive (unless you are FAANG) or can be achieved without async.

Alternatives:

  1. Event loop using kernel mechanism - kqueue/epoll.
  2. Threads.
  3. Threads + Event Loop
  4. Golang/Erlang

The Long Story

Async Rust is objectively bad language feature that actively harms otherwise a good language.

There was a lot written on the subject. I will try to present a structured narrative while referring to original sources of ideas where possible. But before we begin a small intro to Async Rust is needed.

Official Rust documentation defines Async Rust as:

concurrent programming model that lets you run a large number of concurrent tasks on a small number of OS threads ...through the async/await syntax

In other words it is a form of M:N cooperative scheduling in userland where developer is responsible for ...well, cooperating with the scheduler. Most of the time it means correctly yielding execution control back to the scheduler.

If you can't get rid of the feeling that you've this heard somewhere already: in C programming language the developer is responsible for manually managing memory correctly. ( sincirely yours, Cap Obvious :)

Async provides significantly reduced CPU and memory overhead, especially for workloads with a large amount of IO-bound tasks, such as servers and databases. All else equal, you can have orders of magnitude more tasks than OS threads, because an async runtime uses a small amount of (expensive) threads to handle a large amount of (cheap) tasks.

Behind an attractive sales pitch there are several important questions that any engineer needs to ask:

  • What is the overhead exactly and how "significant" is its significant reduction?
  • Is it free? What price do I pay for this?
  • In what situations will I observe the advertised benefits?
  • In what situations will the benefits be worth the cost?
  • Am I in such situation?

If you are new to async in Rust I can recommend a good deep dive here

The C10K Problem

In 1999 Dan Kagel's has coined C10K Problem term. Ten thousand concurrent connections may not seem like a big deal today, but number itself is somewhat arbitrary. Indeed the hardware of 1999 was much weaker than today. But what is even more important is the idea that hardware is capable of delivering better performance on IO-bound tasks if special care is taken to avoid the overhead. A lot of optimizations that this article looks at were not considered to be premature. But even in 1999 Dan says this about cooperative M:N models:

There is a choice when implementing a threading library: you can either put all the threading support in the kernel (this is called the 1:1 threading model), or you can move a fair bit of it into userspace (this is called the M:N threading model). At one point, M:N was thought to be higher performance, but it's so complex that it's hard to get right, and most people are moving away from it.

Many operating systems have tried M:N scheduling models and all of them use 1:1 model today. It is tempting to reach to a premature conclusion that M:N model is inferior to 1:1, but how come M:N model is used in Golang and Erlang - 2 languages known for their superior concurrency features?

The Coloring Problem

In 2015 Robert Nystrom wrote his famous What Color Is Your Function blog post where he explains fundamental problem with cooperative scheduling models. He used colors to represent 2 different types of functions that can't interact with each other normally. This article needs to be included in the basic school education program, it is that good. Go read it now, if you haven't already.

Function coloring problem represents extremely bad leaky abstraction that breaks zero-overhead principle that Rust is so proudly advertising. Async effectively "infects" your code such that everything needs to be aware of async. This effect naturally follows from the rules of interaction between sync and async functions. The only way to "plug" the leak is to to tell the async runtime to emulate sync behaviour by calling block_on (or similar function) such that it polls all the futures until the one you are interested in returns. This is what postgres crate and many others have done.

This helps to stop the "infection", but still violates the zero-overhead promise. There is no way to use async code for free. Postgres crate comes with Tokio runtime and there is no way to disable it, you will still pay with longer compile times and performance overhead of running a userland scheduler. Only this time you won't know who are you paying to until you open the hood and look inside of your dependency graph.

But are colors the only reason that makes async a leaky abstraction?

The Human Problem

Golang and Erlang successfully employ preemptive M:N models and there are no coloring problems in those languages. The leakiness of async comes from "cooperative" design choice. One might be tempted to ask "who is failing to cooperate with who?".

Some might say humans are bad at performing tedious tasks, indicating that it is a problem between a human and a scheduler. And they will be right. But I think there is a bigger problem - cooperation between one developer and another. You see, in most places on Earth humans need money to exist and money are usually made by doing useful work. Helping a computer to do what it can do without you is not what they consider to be "useful work". [1]

It happens to be very difficult to negotiate the reasonable scope for async. One developer decides to use an infectious leaky API to solve their problem and now everybody who is touching this code with a 10ft pole are paying the price. Since application developers are paid to solve business problems and not scheduler problems it immediately translates into the questions like:

  • Is the choice of that person helping me do what I need to do?
  • Is this async stuff worth the price?

And if the answer happens to be negative then the next question immediately follows:

  • Why is that person affecting my productivity?

And happens so that there is no way to plug the leak in a good way. The only option is to abandon this leaky ship and build another one.

I believe this is the worst effect of Async Rust - community and ecosystem fragmentation. Now every API needs to be implemented twice. And even things like keyword generics are not going to save the day. To preserve backwards compatibility this feature will be opt-in and there will always be that guy who wrote his opinionated library in a chosen color and actively ignores the rest.

C has similar human-to-human problem, only with memory management. There is no good way to negotiate the ownership of memory between one developer and another. As a result both async and manual memory management do not compose very well.

The Rust Problem

On top of general design problems there are Rust specific problems.

Matt Kline brilliantly captured the gist of it: Async Rust Is A Bad Language

pain.await

On one hand, futures in Rust are exceedingly small and fast, thanks to their cooperatively scheduled, stackless design. But unlike other languages with userspace concurrency, Rust tries to offer this abstraction while also promising the programmer total low-level control.

There’s a fundamental tension between the two, and the poor async Rust programmer is perpetually caught in the middle, torn between the language’s design goals and the massively-concurrent world they’re trying to build. Rust attempts to statically verify the lifetime of every object and reference in your program, all at compile time. Futures promise the opposite: that we can break code and the data it references into thousands of little pieces, runnable at any time, on any thread, based on conditions we can only know once we’ve started! A future that reads data from a client should only run when that client’s socket has data to read, and no lifetime annotation will tells us when that might be.

General design problems are not lonley. They are in a company of immature implementations. Here @WormRabbit on Reddit: writes:

  • Still no async traits.

  • Still no async closures. Quite painful when you need to move stuff into it.

  • Still no async iterators. Working with Streams is painful, the terminology is inconsistent, many iterator methods are missing.

  • Pin is a huge ball of complexity dumped into the language, and it's basically useless outside of writing async (i.e. if you think it will help with your self-referential/non-movable type, think again). Anything meaningful done with it requires unsafe. At least there are pin and pin_project macros which automate some of it.

  • Basically all fundamental async stuff is still in crates and not in libstd.

  • No way to abstract over executors, leading to ecosystem split and de-facto monopoly of Tokio. If you aren't Google, writing a new executor isn't worth the hell of rewriting the whole ecosystem around it, so Rust could just go with a built-in executor to the same effect, saving people from a lot of pain.

  • No way to abstract over sync/async, leading to ecosystem split and infectious async.

  • Yes, basically the whole ecosystem from libstd upwards needs to be rewritten for async. Even bloody serde and rand.

  • select! macro is a mess.

  • Debugging and stacktraces are useless for async.

  • Generators are still not stable. Personally, for me pretty state machine syntax is like 95% benefits of async, but I'm forced to drag all the mess of executors and async IO with it.

  • Implicit Send/Sync propagation of async fn types is a mess.

  • Lack of async Drop is a huge pain point.

  • Future cancellation in general is a mess.

Or here Tima Kinsart shows you yet more problems: Rust Is Hard, Or: The Misery of Mainstream Programming

I can go on, but I think you get the idea. Lets ask another question instead.

Is the pain worth it?

The (Absence Of) Performance Problem

One of the main reasons why people think they need to use async in Rust is to make their I/O bound application go fast. But they rarely ask themselves these questions:

  • Do I have performance problem?
  • Did async solve my problem?

If you haven't answered the first question the second question becomes impossible to answer. Even more interesting question to ask is:

  • Do I have C10K problem?

Rust gives you plenty of performance to begin with and you need to push modern hardware pretty far before context switching or PCB size becomes your problem. Modern hardware is capable of runniung a lot more than 10k threads. My compuetr has 2045 threads running right now. And this is just just my Firefox browser and terminal+neovim and a handful of system services. As you can see the system is 99.2 % idle (it is an average 8 Core AMD Zen processor).

$ procstat -t -a | wc -l
    2045

$ top
last pid: 45104;  load averages:  0.81,  0.46,  0.29                                                                                                  up 3+12:24:47  20:23:34
101 processes: 1 running, 92 sleeping, 4 stopped, 4 zombie
CPU:  0.1% user,  0.0% nice,  0.0% system,  0.7% interrupt, 99.2% idle

But we need to be more specific, to support this claim.

I will quote this github project which did some benchmarking on the subject:

A context switch takes around 0.2µs between async tasks, versus 1.7µs between
kernel threads. But this advantage goes away if the context switch is due to
I/O readiness: both converge to 1.7µs.

IO-bound workloads are the main reason for async to exist but here we see that async provides no performance improvement over threads when the context switch happens due to IO. I will leave the question "What is the value of a context switch in userspace if it was not due to IO? Are you just artificially slowing down your program?" to the reader.

Creating a new task takes ~0.3µs for an async task, versus ~17µs for a new
kernel thread.

Ok, spawning tasks is faster, but unless you spawn hundreds of thousands of tasks per second it is not going to be your problem. I can go out on a limb and say that if you do need to continuously spawn this many tasks you have a design problem, not a performance problem.

And yes, 0.2 µs vs 1.7 is an order of magnitude difference. But don't over index on ratios. 2 µs is pretty damn fast. There are 1000 microseconds in one millisecond. How much money will these extra 2 microseconds going to make you? How much time are you going to spend writing and debugging async code?

Memory consumption per task (i.e. for a task that doesn't do much) starts at
around a few hundred bytes for an async task, versus around 20KiB (9.5KiB user,
10KiB kernel) for a kernel thread. This is a minimum: more demanding tasks will
naturally use more.

It's no problem to create 250,000 async tasks, but I was only able to get my
laptop to run 80,000 threads (4 core, two way HT, 32GiB).

4 core laptop runs 80K threads. We don't have C10K problems any more. It is not a problem at all to get an 8 or 16 core server these days, it is a low tier commodity hardware. We maybe have C200K problem. Do you really have over a hundred thousand concurrent connections to a single server? What will you do if AWS schedules a maintenance or your hardware dies? At this point you need to start thinking about horizontal scaling because loosing 100k clients is going to hurt. And when you do scale horizontally the amount of connections per server will go down. You want to keep this number small because you don't want to keep all eggs in one basket.

20 Kib vs 200-400 bytes is also an orders of magnitude difference. But even 100,000 threads will take 1953MiB which pales in comparison to how much memory your browser consumes. If you really need to go from 80K to 200K threads you will need 2 GiB more RAM. On AWS the price difference from t3.small to t3.medium is going to be about $15/mo which will give you the desired upgrade along with an extra CPU core. Is $15 going to break your bank? You likely spend more on coffee each day. And you will need a lot more coffee when you add async to your codebase :)

How many servers do you need to run before extra 2GB per server becomes a meaningful sum? How much larger does this sum needs to get for you to decide to allocate a couple of developers to fix it? I don't know exactly, but it is much larger than any single developer makes in a year.

So who does benefit from async? - I really don't know. I can speculate that large companies that have effects of scale, or businesses where 2 microseconds make or break the bank do.

Non-Existing Problems

Is there anything else to async in Rust? Maybe..?

Yoshua Wuyts argues here that Async enables timers, signals, cancellation as forms of structured concurrency. But he gracefully omits the fact that all of these features were parts of POSIX for decades now. See man pages for signal, timerfd, epoll, kqueue.

At this point I will be happy to use pthreads to cancel tasks if it helps to avoid async.

Conclusion

In the beginning I asked these questions:

  • What is the overhead exactly and how "significant" is its significant reduction?
  • Is it free? What price do I pay for this?
  • In what situations will I observe the advertised benefits?
  • In what situations will the benefits be worth the cost?
  • Am I in such situation?

And now I can confidently answer them all. The overhead of traditional threads approach is not significant for most people. You need to be an Amazon/Google size comany or do something really special to observe the benefits associated with overhead reduction.

Chances are you are not that kind of business. And chances are you don't have performance problems that async can fix for you.

Async comes with a heavy price that is not worth paying all tings considered.


  1. (Unless you are kernel developer working on a scheduler subsystem. And even then you are not working in pure cooperative mode. Hardware timers preempt you and you preempt user threads)