This is the first post in a series on asynchronous programming. The whole series explores a single question: What is asynchrony? When I first started digging into this, I thought I had a solid grasp of it. Turns out, I didn't know the first thing about asynchrony. So, let's dive in together!
Whole series:
- Asynchronous Programming. Blocking I/O and non-blocking I/O
- Asynchronous Programming. Threads and processes
- Asynchronous Programming. Cooperative multitasking
- Asynchronous Programming. Await the Future
- Asynchronous Programming. Python3.5+
In this post, we'll be focusing on networking, but you can easily apply these concepts to other input/output (I/O) operations, such as substituting sockets with file descriptors. This post is language-agnostic, though examples will be in Python. (What can I say? I love Python!)
One way or another, when you have a question about blocking or non-blocking calls, they're most often related to I/O. In today's world of microservices and serverless functions, the prime example is request processing. We can immediately imagine that you, dear reader, are on a website. Your browser (or the application where you're reading these lines) is the client, and somewhere (in the depths of the AWS as it's hosted there), a server is busy sending back the words you're reading.
For any interaction, the client and server first establish a connection. While we won't dig into the depths of the 7-layer OSI model, here's what you need to know: on both sides (client and server), there are special connection points known as sockets. Both must be bound and actively listen to each other's sockets to understand what the other is saying.

As the server processes requests (converting Markdown to HTML, finding images, etc.), it can face different types of latency, illustrated in the table below.
| System Event | Actual Latency | Scaled Latency |
|---|---|---|
| One CPU cycle | 0.4 ns | 1 s |
| Level 1 cache access | 0.9 ns | 2 s |
| Level 2 cache access | 2.8 ns | 7 s |
| Level 3 cache access | 28 ns | 1 min |
| Min memory access (DDR DIMM) | ~100 ns | 4 min |
| Intel Optane DC persistent memory access | ~350 ns | 15 min |
| Intel Optane DC SSD I/O | < 10 μs | 7 hrs |
| NVMe SSD I/O | ~25 μs | 17 hrs |
| SSD I/O | 50-150 μs | 1.5-4 days |
| Rotational disk I/O | 1-10 ms | 1-9 months |
| Internet SF to NYC | 65 ms | 5 years |
Notice the vast difference between CPU speed and network latency! The difference is a couple of orders of magnitude. It turns out that with I/O-heavy applications, the processor often sits idle, waiting for data. This type of application is called I/O-bound. In high-performance applications, this is a serious bottleneck, and we'll explore solutions for it next.
Organizing I/O: Blocking vs. Non-Blocking
There are two ways to organize I/O (I will give examples based on Linux): blocking and non-blocking.
Also, there are two types of I/O operations: synchronous and asynchronous.
All together they represent possible I/O models.
Each of these I/O models has usage patterns that are advantageous for particular applications. Here I will demonstrate the difference between the two ways of organizing I/O.
Blocking I/O
With blocking I/O, when a client connects to the server, the socket and its associated thread are blocked until data is ready for reading. This data is placed in the network buffer until it is all read and ready for processing. Until the operation completes, the server can't do anything but wait. By default, TCP sockets are set to blocking mode.
Here's a simple Python example. The client sends messages in a loop with a 50ms interval:
import socket
import sys
import time
def main() -> None:
host = socket.gethostname()
port = 12345
# create a TCP/IP socket
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock:
while True:
sock.connect((host, port))
while True:
data = str.encode(sys.argv[1])
sock.send(data)
time.sleep(0.5)
if __name__ == "__main__":
assert len(sys.argv) > 1, "Please provide a message"
main()
Here we send a message with 50ms interval to the server in the endless loop. Imagine that this client-server communication consist of downloading a big file — it takes some time to finish.
The server code:
import socket
def main() -> None:
host = socket.gethostname()
port = 12345
# create a TCP/IP socket
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock:
# bind the socket to the port
sock.bind((host, port))
# listen for incoming connections
sock.listen(5)
print("Server started...")
while True:
conn, addr = sock.accept() # accepting the incoming connection, blocking
print('Connected by ' + str(addr))
while True:
data = conn.recv(1024) # receving data, blocking
if not data:
break
print(data)
if __name__ == "__main__":
main()
I am running this in separate terminal windows with several clients as:
$ python client.py "client N"
And server as:
$ python server.py
Here we just listen to the socket and accept incoming connections. Then we try to receive data from this connection.
In this setup, the server is blocked by a single client connection! Running a second client won't register. Try this example to see what's happening.
What is going on here?
The send() method will try to send all data to the server while the write buffer on the server will continue to receive data. When the system call for reading is called, the application is blocked and the context is switched to the kernel. The kernel initiates reading - the data is transferred to the user-space buffer. When the buffer becomes empty, the kernel will wake up the process again to receive the next portion of data to be transferred.
Now in order to handle two clients with this approach, we need to have several threads, i.e. to allocate a new thread for each client connection. We will get back to that soon.
Non-blocking I/O
However, there is also a second option — non-blocking I/O. The difference is obvious from its name — instead of blocking, any operation is executed immediately. Non-blocking I/O means that the request is immediately queued and the function is returned. The actual I/O is then processed at some later point.
By setting a socket to a non-blocking mode, you can effectively interrogate it. If you try to read from a non-blocking socket and there is no data, it will return an error code (EAGAIN or EWOULDBLOCK).
Actually, this polling type is a bad idea. If you run your program in a constant cycle of polling data from the socket, it will consume expensive CPU time. This can be extremely inefficient because in many cases the application must busy-wait until the data is available or attempt to do other work while the command is performed in the kernel. A more elegant way to check if the data is readable is using select().
Let us go back to our example with the changes on the server:
import select
import socket
def main() -> None:
host = socket.gethostname()
port = 12345
# create a TCP/IP socket
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock:
sock.setblocking(0)
# bind the socket to the port
sock.bind((host, port))
# listen for incoming connections
sock.listen(5)
print("Server started...")
# sockets from which we expect to read
inputs = [sock]
outputs = []
while inputs:
# wait for at least one of the sockets to be ready for processing
readable, writable, exceptional = select.select(inputs, outputs, inputs)
for s in readable:
if s is sock:
conn, addr = s.accept()
inputs.append(conn)
else:
data = s.recv(1024)
if data:
print(data)
else:
inputs.remove(s)
s.close()
if __name__ == "__main__":
main()
Now if we run this code with >1 clients you will see that the server is not blocked by a single client and it handles everything that can be detected by the messages displayed. Again, I suggest that you try this example yourself.
What's going on here?
Here the server does not wait for all the data to be written to the buffer. When we make a socket non-blocking by calling setblocking(0), it will never wait for the operation to be completed. So when we call the recv method, it will return to the main thread. The main mechanical difference is that send, recv, connect and accept can return without doing anything at all.
With this approach, we can perform multiple I/O operations with different sockets from the same thread concurrently. But since we don't know if a socket is ready for an I/O operation, we would have to ask each socket with the same question and essentially spin in an infinite loop (this non-blocking but the still synchronous approach is called I/O multiplexing).
To get rid of this inefficient loop, we need polling readiness mechanism. In this mechanism, we could interrogate the readiness of all sockets, and they would tell us which one is ready for the new I/O operation and which one is not without being explicitly asked. When any of the sockets is ready, we will perform operations in the queue and then be able to return to the blocking state, waiting for the sockets to be ready for the next I/O operation.
There are several polling readiness mechanisms, they are different in performance and detail, but usually, the details are hidden "under the hood" and not visible to us.
Keywords to search:
Notifications:
- Level Triggering (state)
- Edge Triggering (state changed)
Mechanics:
select(),poll()epoll(),kqueue()EAGAIN,EWOULDBLOCK