What's the purpose of the first argument to select system call?

Question

From man select

int select(int nfds, fd_set *readfds, fd_set *writefds,
           fd_set *exceptfds, struct timeval *timeout);

nfds is the highest-numbered file descriptor in any of the three sets, plus 1.

What is the purpose of nfds, when we already have readfds, writefds and exceptfds, from which the file descriptors can be determined?

I was about to ask on SO, but it's more centralized here, and [C API calls are considered on-topic](http://meta.unix.stackexchange.com/q/314/250). — phunehehe, Feb 21 '11 at 04:59

Mikel · Accepted Answer · 2011-02-21T21:37:50.363

In "Advanced Programming in the UNIX Environment", W. Richard Stevens says it is a performance optimization:

By specifying the highest descriptor we're interested in, the kernel can avoid going through hundred of unused bits in the three descriptor sets, looking for bits that are turned on.

(1st edition, page 399)

If you are doing any kind of UNIX systems programming, the APUE book is highly recommended.

UPDATE

An fd_set is usually able to track up to 1024 file descriptors.

The most efficient way to track which fds are set to 0 and which are set to 1 would be a bitset, so each fd_set would consist of 1024 bits.

On a 32-bit system, a long int (or "word") is 32 bits, so that means each fd_set is
1024 / 32 = 32 words.

If nfds is something small, such as 8 or 16, which it would be in many applications, it only needs to look inside the 1st word, which should clearly be faster than looking inside all 32.

(See FD_SETSIZE and __NFDBITS from /usr/include/sys/select.h for the values on your platform.)

UPDATE 2

As to why the function signature isn't

int select(fd_set *readfds, int nreadfds,
           fd_set *writefds, int nwritefds,
           fd_set *exceptfds, int nexceptfds,
           struct timeval *timeout);

My guess is it's because the code tries to keep all the arguments in registers, so the CPU can work on them faster, and if it had to track an extra 2 variables, the CPU might not have enough registers.

So in other words, select is exposing an implementation detail so that it can be faster.

BSD 4.4 Lite select source code (select and selscan functions)
Linux 2.6.37 select source code (do_select and max_select_fd functions)

That, or the more recent [The Linux Programming Interface](http://www.man7.org/tlpi) — chris, Feb 21 '11 at 09:44
APUE was updated recently too. Second edition: http://www.amazon.com/gp/aw/d.html/ref=aw_d_detail?pd=1&a=0201433079 — Mikel, Feb 21 '11 at 09:56
@chris I will check Linux Programming Interface out. Thanks. — Mikel, Feb 21 '11 at 10:06
Thanks for the info, I will check on the books when I grab some time. — phunehehe, Feb 21 '11 at 13:00
APUE 2nd Ed: June 27, 2005 (covers linux-2.4.22) TLPI: October 2010 (covers linux-2.6.35) — chris, Feb 21 '11 at 16:35
With fixed-sized descriptor sets at around 1024, the optimization is pointless. Even a completely naive iteration over 1024 bits with optimizations disabled is at least 30 times faster than one usermode-kernelmode-usermode transition. — Petr Skocik, Apr 03 '20 at 10:09

score 6 · Answer 2 · answered Feb 21 '11 at 05:04

6

I don't know for sure, since I'm not one of the designers of select(), but I'd say it's a performance optimization. The calling function knows how many file descriptors it put in the read, write and except FDs, so why should the kernel figure it out again?

Remember that in the early 80s, when select() got introduced, they didn't have multi-gigaghertz, multi-processors to work with. A 25 MHz VAX was pretty doggone fast. Plus, you wanted select() to work fast if it could: if some I/O was waiting for the process, why make the process wait?

answered Feb 21 '11 at 05:04

To your argument I would say we need `nreadfds`, `nwritefds` and `nexceptfds` instead of just one `nfds`. – phunehehe Feb 21 '11 at 05:46
Maybe it's so that `nfds` can go in a register for faster access. If it had to track three numbers, along with all the other arguments, maybe the CPU would not have enough registers. Of course, the kernel could have created its own `nfds` based on your hypothetical 3 variables. So my guess is it's exposing an implementation detail to gain efficiency. – Mikel Feb 21 '11 at 21:29
@Mikel, phunehehe: Separate `nfds` arguments would bring very little gain. Most of the times, the process has opened very few processes relative to `FD_SETSIZE`. A typical case might have (4,4,2) out of 1024; making the kernel check (4,4,4) is a big win over (1024,1024,1024), but optimizing down to (4,4,2) would be near-useless. – Gilles 'SO- stop being evil' Feb 21 '11 at 21:56
@Gilles: the gain would be a cleaner API. (As it is, either the programmer has to do the extra work to calculate `nfds`, or be lazy and call `select(FD_SETSIZE, ...)`, which would be slower.) – Mikel Feb 21 '11 at 22:38
OTOH, tracking only one max variable could be easier for the programmer too. – Mikel Feb 21 '11 at 22:44
@Mikel: The single-`nfds` API is cleaner on all counts: there are fewer arguments, and neither side needs to precisely track which fd is used in which direction. Passing the largest fd used anywhere in the program is almost always ok. – Gilles 'SO- stop being evil' Feb 21 '11 at 23:00
@Gilles: "cleaner" will depend on the use case. Both ways seem to have advantages. It would be good to discuss in more detail with you, but a 600-character comments box is not the place. ;-) – Mikel Feb 22 '11 at 02:25

What's the purpose of the first argument to select system call?

2 Answers2

Linked