Epoll vs. Io_uring in Linux
Key takeaways
- It was super simple, worker-based, and it basically worked well.
- When I just started developing for Linux, epoll was a new feature, and basically it had no alternatives.
- The kernel consumes submissions from memory shared between your app and the kernel, and posts completions back into that same shared memory - both live in ring buffers, hence the name.
First, I want to tell you how exactly I got to this point and why I started researching different options for handling asynchronous I/O on Linux… Last year, my students and I built a reverse proxy server called Tiny Gate. It was super simple, worker-based, and it basically worked well. Of course, I didn t expect it to be very fast, but it was an educational project, and since we d made a real, kind of production-ready tool, I was really proud of it. But my students weren t as happy as I was - they wanted to build something genuinely useful, and they were really disappointed that our product had strong architectural limits and couldn t outperform titans like nginx and haproxy. So they literally forced me to research together how those tools work under the hood and how to handle asynchronous I/O to cut down on the heavy overhead… Long story short, we made a second version of TinyGate, based on epoll. It still lost to nginx/haproxy in benchmarks, but it had a dramatic performance boost compared to the first version. But epoll isn t perfect either (as I ll explain below), and we eventually switched to io_uring, which led to a full rewrite of our project from scratch, again… So it s a really interesting topic, and today I ll share an overview of the two queueing systems Linux gives you for asynchronous I/O.
When I just started developing for Linux, epoll was a new feature, and basically it had no alternatives. Everyone used it to manage asynchronous execution - there was no other choice. The problem is, epoll relies heavily on syscalls: it tells you when I/O is possible, but you still have to call read()/write() yourself afterward - that s two syscalls per I/O event, on top of the one-time epoll_ctl registration. Each of these syscalls causes a context switch between user and kernel mode, which creates HUGE overhead once you re handling a lot of connections. But we have a solution! About 17 years after epoll landed in the Linux kernel (2002), io_uring appeared (2019)! Instead of telling you when I/O is possible, it tells you when I/O is done - no polling loop, and far less associated syscalls.
The kernel consumes submissions from memory shared between your app and the kernel, and posts completions back into that same shared memory - both live in ring buffers, hence the name. The catch: by default you still have to call io_uring_enter() to tell the kernel go check the submission queue - but one call can submit a whole batch of operations and reap a whole batch of completions, instead of one syscall pair per operation like with epoll + read. If you want close to zero syscalls during steady state, there s IORING_SETUP_SQPOLL, which spins up a dedicated kernel thread that polls the submission queue for you - at the cost of that thread burning CPU (more on this below).