0

Taking inspiration from this blog post, I'm playing around with linux device drivers (which I'm studying from ).

The read field of the file_operations associated with the driver is initialized to the function below:

static ssize_t mychardev_read(struct file *file, char __user *buf, size_t count, loff_t *offset)
{
    uint8_t *data = "Hello from the kernel world!\n";
    size_t datalen = strlen(data);

    printk("MYCHARDEV: mychardev_read was called with count equal to %zu\n", count);

    if (count > datalen) {
        count = datalen;
    }

    if (copy_to_user(buf, data, count)) {
        return -EFAULT;
    }

    return count;
}

My understanding is that, when the user space requests a given amount of data from the device created by this driver, the transfer happens in batches of at most the datalen, i.e. the length of the data string, which in this case is 29 (including the trailing \n). This is confirmed by the fact that executing the following commands in the shell

$ head -c 5 /dev/mychardev-0
$ head -c 29 /dev/mychardev-0
$ head -c 30 /dev/mychardev-0
$ head -c 32 /dev/mychardev-0

results in these lines in the output of dmesg (assuming I have obvious printk calls in the open and release methods):

[10782.052736] MYCHARDEV: Device open
[10782.052743] MYCHARDEV: mychardev_read was called with count equal to 5
[10782.052751] MYCHARDEV: Device close
[10868.275577] MYCHARDEV: Device open
[10868.275585] MYCHARDEV: mychardev_read was called with count equal to 29
[10868.275598] MYCHARDEV: Device close
[10878.414433] MYCHARDEV: Device open
[10830.796424] MYCHARDEV: mychardev_read was called with count equal to 30
[10830.796438] MYCHARDEV: mychardev_read was called with count equal to 1
[10830.796443] MYCHARDEV: Device close
[10830.796417] MYCHARDEV: Device open
[10878.414441] MYCHARDEV: mychardev_read was called with count equal to 32
[10878.414455] MYCHARDEV: mychardev_read was called with count equal to 3
[10878.414460] MYCHARDEV: Device close

What I don't understand is why the following command

$ cat /dev/mychardev-0 | head -c 5

results in

[11186.036107] MYCHARDEV: Device open
[11186.036120] MYCHARDEV: mychardev_read was called with count equal to 8192
[11186.036131] MYCHARDEV: mychardev_read was called with count equal to 8192
[11186.036136] MYCHARDEV: mychardev_read was called with count equal to 8192
[11186.036141] MYCHARDEV: mychardev_read was called with count equal to 8192
[11186.036145] MYCHARDEV: mychardev_read was called with count equal to 8192
[11186.036150] MYCHARDEV: mychardev_read was called with count equal to 8192
[11186.036154] MYCHARDEV: mychardev_read was called with count equal to 8192
[11186.036159] MYCHARDEV: mychardev_read was called with count equal to 8192
[11186.036163] MYCHARDEV: mychardev_read was called with count equal to 8192
[11186.036168] MYCHARDEV: mychardev_read was called with count equal to 8192
[11186.036172] MYCHARDEV: mychardev_read was called with count equal to 8192
[11186.036177] MYCHARDEV: mychardev_read was called with count equal to 8192
[11186.036181] MYCHARDEV: mychardev_read was called with count equal to 8192
[11186.036187] MYCHARDEV: Device close

In principle, I understand that maybe cat is requesting (or should I say the shell runtime on behalf of cat? Or what?) data from the driver in batches of several bytes at a time (in this case 8192, apparently), rather byte-by-byte, for the sake of being more performant.

But I don't understand why is it called so many times?

Enlico
  • 1,471
  • 16
  • 35
  • If you want to go inside what happens in `cat`, then it would be the C library that handles I/O between the application code and the kernel system call interface. The shell isn't involved anywhere near that. Then again, while glibc on Linux commonly reads in blocks of 8192, the GNU coreutils version of `cat` often reads in blocks of 128 kB. Which might mean it also just makes the system calls itself, which would make sense in that it just copies data around without looking at it, so it doesn't really need the buffering etc. provided by the C library. – ilkkachu Nov 13 '22 at 19:47
  • You could run that `cat` or whatever under `strace` to see the system calls it makes, at least that'd tell you if the requests the driver sees match those the kernel gets, or if there's something in the kernel that splits large requests to multiple smaller ones. Though there's exactly 13 calls to your driver there in that log, and that seems an odd number, not matching the common case of a power-of-two buffer size. – ilkkachu Nov 13 '22 at 19:51
  • No, indeed the number of those calls actually changes. – Enlico Nov 13 '22 at 20:09
  • firstly, although the read gets called with a count of 8192, only 29 bytes are returned, so cat will keep calling read in order to fill whatever input buffer it is using. secondly, the cat on its own will never complete - without the pipe to head, it will just output an infinite stream of bytes, since you always return data from the read and never EOF (try it) but once head has enough to fill its input buffer, it will exit and the next time cat tries to write to the pipe, the write will cause a SIGPIPE signal to be sent or return EPIPE [see this](https://unix.stackexchange.com/a/433394/36176). – Murray Jensen Nov 14 '22 at 13:56

0 Answers0