# Developing Fiber Scheduler for Ruby 3

## Ruby 3 Fiber Scheduler

I wrote an article in July 2020, Ruby 3 Fiber changes preview (in Chinese),
and followed up by another post in August A Walkthrough of Ruby 3 Scheduler.
Ruby 3 has updated lots of versions during these months, including ruby-3.0.0-preview1 ruby-3.0.0-preview2 and ruby-3.0.0-rc1,
which makes lots of improvements to the Fiber Scheduler API.

But as I mentioned before, what Ruby 3 implements is the interface.
It would not use the scheduler, unless a scheduler implementation is included.

I am very busy working and studying in the past four months,
and I took some time in the recent days to get updated with the API.

GitHub: Evt

## Use of Fiber Scheduler

Suppose we have a pair of fds generated by IO.pipe. When we write Hello World to one of them, we could read it from the other side of the pipe.
We would have code like this:

This program has lots of limitations. For example, you can’t write a string longer than the buffer size.
Since the other side is not reading at the same time, it would get stuck if the string is too long.
You would also have to write first, otherwise it would also get stuck.
Of course, we could use multi-threading to solve this problem.

But as we all know, using threads to solve I/O problems is very inefficient.
The OS context switch is slow. The fairness of thread scheduling is still a very hard problem in the field of OS.
For an I/O problem, which is not CPU-bound, all we need is to halt it and wait for the proper callback.
In this case, all you need is to call Ruby 3 scheduler.

In general, an async function requires keywords like callback, async, or await.
But this is not necessary in Ruby 3.
Ruby 3 lists all common situations where you need async functions: I/O multiplexing, process halting, kernel sleep, and mutex.
Ruby 3 exposes all of these interfaces for scheduler to improve the performance without adding any new keywords.
My project evt is such a scheduler to meet the needs of Ruby 3 Scheduler.

Comparing to the simple example above, here is an example of HTTP/1.1 server

We could see from this that, the code is almost the same with synchronous development.
All you need to do is to setup the scheduler with Fiber.set_scheduler,
and add Fiber.scheduler where you usually have to solve with multithreading.
Finally, use scheduler.run to start the scheduler.

## Backend support

### io_uring Support

Not only the Ruby API gets lots of updates in the recent months, but also my scheduler. Especially for a better I/O multiplexing backend support.
io_uring is included since Linux 5.4.
Since the io_uring could reduce the syscalls and could have direct iov calls to acheive better performance comparing to epoll,
the support of io_uring is important.
Direct iov support requires Ruby Fiber scheduler for some further changes.
These changes are introduced by ioquatix since Ruby 3.0.0-preview2.
What we need to implement is two parts.
One of them is epoll compatible API:

The other part is direct iov support:

But in some cases, the iov would not be called. I’m still figuring out the bug. But at least the performance is very close to epoll.

### IOCP Support

Another problem is to support Windows IOCP. I tried to implement somethine like this:

But the I/O scheduler receives the wrong pointers when callback. After some researches, to support IOCP, you have to initialize the I/O with FILE_FLAG_OVERLAPPED flag.
This may need some changes in Ruby win32/win32.c to support IOCP.
But at least I solved the problems of the IO.select fallback.
The problem is fine, since nobody cares about Windows production performance…

### kqueue Improvements

Another Improvement is to macOS kqueue.
kqueue on FreeBSD is good. Bug the performance on macOS is really weird.
Since all of our I/O registration is in one-shot, I used one-shot mode of kqueue to reduce the number of syscalls.

### Overall

At last, we support almost all I/O multiplexing backends of mostly used OS:

Linux Windows macOS FreeBSD
io_uring ✅ (See 1)
epoll ✅ (See 2)
kqueue ✅ (⚠️See 5)
IOCP ❌ (⚠️See 3)
Ruby (IO.select) ✅ Fallback ✅ (⚠️See 4) ✅ Fallback ✅ Fallback
1. when liburing is installed
2. when kernel version >= 2.6.8
3. WOULD NOT WORK until FILE_FLAG_OVERLAPPED is included in I/O initialization process.
4. Some I/Os are not able to be nonblock under Windows. See Scheduler Docs.
5. kqueue performance in Darwin is very poor. MAY BE DISABLED IN THE FUTURE.

## Benchmark

How is the overall performance?

The benchmark is running under v0.2.2 version and Ruby 3.0.0-rc1.
See evt-server-benchmark for test code, the test is running under a single-thread server.

The test command is wrk -t4 -c8192 -d30s http://localhost:3001.

All of the systems have set their file descriptor limit to maximum.

OS CPU Memory Backend req/s
Linux Ryzen 2700x 64GB epoll 54680.08
Linux Ryzen 2700x 64GB io_uring 50245.53
Linux Ryzen 2700x 64GB IO.select (using poll) 44159.23
macOS i7-6820HQ 16GB kqueue 37855.53
macOS i7-6820HQ 16GB IO.select (using poll) 28293.36

Very impressive. The results improvements are from lots of aspects.
Current async frameworks like Falcon uses nio4r.
The backend of nio4r is libev.
The performance of libev is average due to the extreme compatibility design.
Existing async frameworks also requires lots of meta-programming.
But this extension is almost written in C, with only the features the scheduler need.

Comparing to my previous tests on preview 1, this version uses long connection, and Ruby nonblock I/O also has fixed a lot.
The wrk results are very error-sensitive. All of these things makes our performance 10 times faster comparing to what we have done 3 months ago.

wrk is very error-sensitive, the parser in the benchmark is incorrect, which could not close the socket properly. I updated my Midori to a Ruby 3 Scheduler project, the performance could reach 247k req/s with kqueue and 647k req/s with epoll, which is more than 100x times faster comparing to blocking I/O.

## Combining with Ractor

I also wrote a post on November about Ractor Ruby 3 Ractor Dev Guide (in Chinese)
Combining Fiber with Ractor is always a interesting thing. We have two routes for that:

1. Receive accpets in the main Ractor, and dispatch the request to sub-Ractors. After transferring the results back, return it from the main Ractor with scheduler.
2. Use Linux SO_REUSEPORT feature to let all Ractor listen to the port at the same time, which is very easy to deal with with exisiting server archs.

Unfortunately, either of these are functioning correctly now. Some Fiber features are not available in Ractor.
I suppose this is a bug, and have submitted a patch GitHub #3971.
According to my previous benchmarks, Ractor my increase about 4 times the performance by multi-core.

But since API servers are usually stateless, these improvements could be acheived by multi-processes.
Ractor’s majot contribution may be fewer memory consumption.

I would test it with Ruby 3.0 future updates.

## Conclusion

We acheived a 10 times performance improvement comparing to preview 1, and almost 36 times faster comparing to blocking I/O. The major performance issue of Ruby servers are I/O blocking instead of VM performance.
With the I/O scheduler is included, we could improve the I/O performance of Ruby 3 into a new era.
The future work is to wait for the updates of some C extension libraries like database connections.
Then if we use an async scheduler with a Fiber based Web server like Falcon,
you don’t have to do anything about your business code to get dozens of times of performance improvements.

Let’s continue happy programming with Ruby.