What If We Pretended That a Task = Thread?

Published: May 22, 2023, 9:12 p.m.

In my previous post I made a fairly inaccurate attempt at describing one of the primary problems plaguing current async ecosystem and an idea of solving it. Redditors criticized me, and rightfully so - the post was written in one go without giving it time to rest and the solution was not optimal. So I thought about it more, and I think I am on to something.

Recap

The core premise of my original post was stating that Send+Sync are constructs that work at thread boundaries and don't map well to async tasks (top level futures). I also stated that while these traits are core to memory safety, they introduce arbitrary limitations in async context, such as being unable to use fast single-threaded abstractions like Rc and RefCell in multi-threaded runtimes, even though no actual safety problems are present.

The core link was equating threads in Rust async context to CPU cores in OS scheduling context - it makes as much sense for an async task to be aware of the underlying thread executing it, as it is for an OS thread to be aware of which CPU core it's executing on. You may want to know and control the behavior in performance critical scenarios, but for a vast majority of applications it doesn't matter, and you certainly don't want to associate data to a particular lower-level execution unit (Thread for tasks and CPU core for threads). So my questions were: why tokio::spawn needs a Send future? Why would you need to specify Send-iness in async traits? Why is this complexity forced on the end user? And finally, can we hide this complexity at the runtime level?

The main problem why a runtime can't move !Send + !Sync tasks across multiple threads is because they may access Thread Local Storage (TLS) in their code, allowing one to clone Rc objects into potentially different threads and breaking the safety guarantees. The idea for solving this was to create a new AsyncSend trait and have a future implement it, unless there are raw pointers involved or some code of it accesses TLS. The problem with that approach is that it introduces an additional trait, and requires developing a whole new function effect system that would be incredibly complicated to pull off. Can we do simpler and better?

Let's Pretend

Let's pretend a task is a thread! It's that simple! While a Thread and a Task have some key differences, they both define a unit of execution, and it makes little sense for an async task to concern itself with the lower level unit of execution - a thread.

In my original idea the async runtime would have been responsible for promising the compiler that they were holding top level futures (tasks), meaning a runtime would probably have to use unsafe to move !Send futures across threads. What if instead of that, an async runtime was responsible for setting up a fake Thread for each task? This part wouldn't even need to be unsafe, the only unsafe part would be moving the tasks across threads, using a wrapper structure, like so:

struct Task<F> {
    future: F,
    thread: TaskThread,
}

unsafe impl<F: Future> Send for Task {}
unsafe impl<F: Future> Sync for Task {}

So long as the future and the thread are linked to each other and there is proper synchronization transfering ownership across actual OS threads, the task should be Send. And even this Task definition could live within the standard library!

Now, you're probably confused by the meaning of a fake thread. A fake thread corresponds to a unique per-task ThreadID that is loaded in by the async runtime. It masks thread::current and is used by the LocalKey (TLS) to give per-task static data. Code-wise, you'd end up with a distinction between OS threads and Task threads, like so:

// Fake thread data for an async task
pub struct TaskThread {
    inner: TaskInner,
    // This may contain actual TLS data.
    context: Pin<Box</* ... */>>,
}

// This is how a `Thread` is currently defined in STD
pub struct Thread {
    inner: Pin<Arc<Inner>>,
}

// Inner is a struct in STD, but we may want to have an enum-based distinction here
enum Inner {
    Os(OsInner),
    Task(TaskInner),
}

enum OsInner {
    name: Option<CString>, // Guaranteed to be UTF-8
    id: ThreadId,
    parker: Parker,
}

enum TaskInner {
    name: Option<String>,
    id: ThreadId,
    // Thread::unpark would simply call Waker::wake_by_ref on a task.
    waker: Waker,
}

To "enter" the fake task thread, async runtime would call thread::enter_task:

// Runtime gets a task to run. Internally it may steal the task from another thread - it's a runtime implementation detail.
let task = self.tasks.get()?;

// Create a future `Context` that attaches the internally stored waker.
let mut cx = task.task_context();

// Enters the task - panics if called from within task, although may not even be necessary.
thread::enter_task(&mut task, |future| {
    match future.poll(&mut cx) {
        /* ... */
    }
});

With this, you no longer need a Send bound in async spawn function, you no longer need to specify AsyncTrait::some_function(): Send, and generally speaking as a user, your life becomes much easier! However, it also brings Rust further away from bare metal. This can be solved relatively easily with the following 2 APIs:

// Get the underlying OS thread
let os_thread = thread::os_current();

// Access `LocalKey` in the OS thread context
GLOBAL_DATA.os_with(|data| {
    // The function is virtually the same as `LocalKey::with`, except for the following differences:
    // - The closure must be `Send`. This prevents leaking `!Send` data outside the closure with assignment.
    // - The closure must return `Send` data. This prevents leaking `!Send` data outside the closure through return.
    // !Send argument is allowed, because that is the point of TLS, and poses no risk if it's not being leaked.
});

Given that Thread would not necessarily refer to an OS thread anymore, we may also want to rename the structure to Task, which I believe should be doable through an edition change. The os_with function in LocalKey would also be faster than the regular task-local with function. Users may choose one over the other, but the beautiful thing is that neither of them would prevent async runtimes from moving threads around. We may also invert this completely, and have the following TLS API:

GLOBAL_DATA.with(|data| {
    // This closure needs to be `Send`, and return data must also be `Send`.
    // If the conditions mismatch, compiler directs users to use `LocalKey::task_with`.
});

GLOBAL_DATA.task_with(|data| {
    // This is the same as current `LocalKey::with` - no restrictions on sendability, but
    // the data would be per-task, and the implementation may be slower than `LocalKey::with`.
});

Third alternative would be to keep threads and async part separate - have both Task and Thread, with a higher level abstraction linking async tasks and OS threads together. This would keep Threads very explicitly linked to OS backed threads, while TLS could be redefined to be both Task and Thread Local Storage. These details are certainly important to get correct, and I don't have all the answers, but whichever way we go, it won't prevent async runtimes from transfering the tasks around.

Conclusion

Sending thread-unsafe futures poses no obvious risk for far majority of cases, however, combining them with TLS is risky, because TLS may allow one to leak data across multiple futures, and consequently, multiple threads. This can be solved by redefining TLS as "Task Local Storage" and Thread as a Task. With this, async runtimes would be able to move !Send futures across different threads, thus removing the need from distinguishing between Send and !Send spawn functions.

I asked GPT-4 for their thoughts, and this is what they had to say:

... while your proposed solution sounds promising and could potentially simplify async programming in Rust, it would be prudent to seek further reviews and perhaps implement a proof-of-concept to further validate your ideas.

Thus, I'd really like to hear your thoughts!