> You can switch a file descriptor into non-blocking mode so the call won’t block while data you requested is not available. But system calls are still expensive, incurring context switches and cache misses. In fact, networks and disks have become so fast that these costs can start to approach the cost of doing the I/O itself. For the duration of time a file descriptor is unable to read or write, you don’t want to waste time continuously retrying read or write system calls.
O_NONBLOCK basically doesn't do anything for file-based file-descriptions - a file is always considered "ready" for I/O.
Think about it, what does it means for a file to be ready? Socket and pipes are a stream abstraction: To be ready it means that there is data to read or space to write.
But for files data is always available to read (unless the file is empty) or write (unless the disk is full). Even if you somehow interpret readiness as the backing pages being loaded in the page cache, files are random access so which pages (ie which specific offset and length) you are interested in can't be expressed via a simple fd based poll-like API (Linux tried to make splice work for this use case, but it didn't work out).
I’m pretty sure spinning HDDs can have rather complex controllers that actually try to optimize access at the block level by minimizing the amount the read head needs to travel. So yea there are some buffers in there.
Note that this flag has no effect for regular files and block devices; that is, I/O operations will (briefly) block when device activity is required, regardless of whether O_NONBLOCK is set. Since O_NONBLOCK semantics might eventually be implemented, applications should not depend upon blocking behavior when specifying this flag for regular files and block devices.
Also worth checking out libxev[1] by Mitchell Hashimoto. It’s a Zig based event loop (similar to libuv) inspired by Tigerbeetle’s implementation.
[1] https://github.com/mitchellh/libxev
> You can switch a file descriptor into non-blocking mode so the call won’t block while data you requested is not available. But system calls are still expensive, incurring context switches and cache misses. In fact, networks and disks have become so fast that these costs can start to approach the cost of doing the I/O itself. For the duration of time a file descriptor is unable to read or write, you don’t want to waste time continuously retrying read or write system calls.
O_NONBLOCK basically doesn't do anything for file-based file-descriptions - a file is always considered "ready" for I/O.
Is that true for all file abstractions? What happens with NFS?
Think about it, what does it means for a file to be ready? Socket and pipes are a stream abstraction: To be ready it means that there is data to read or space to write.
But for files data is always available to read (unless the file is empty) or write (unless the disk is full). Even if you somehow interpret readiness as the backing pages being loaded in the page cache, files are random access so which pages (ie which specific offset and length) you are interested in can't be expressed via a simple fd based poll-like API (Linux tried to make splice work for this use case, but it didn't work out).
Don’t block devices have a scheduler with a queue under the hood? Couldn’t that queue become full when writing?
(This is a genuine question)
I’m pretty sure spinning HDDs can have rather complex controllers that actually try to optimize access at the block level by minimizing the amount the read head needs to travel. So yea there are some buffers in there.
from open(2):
When I am already using things like io_uring already I don’t need any io abstraction.
BTW most of applications is totally fine with a UNIX file apis.