I'm looking for some clarity in terms of the system calls. I tried avoiding the vDSO by calling a system call clock_gettime directly. Here is a snippet of a program named sys_clock_gettime.
int clock = CLOCK_MONOTONIC;
struct timespec ts;
if (syscall(__NR_clock_gettime, clock, &ts) == -1){
exit(-1);
}
This program takes the name of a clock (CLOCK_BOOTTIME, CLOCK_MONOTONIC, etc...) and a number of calls to make. For instance, the snippet below would make 100 calls to clock_gettime for the CLOCK_BOOTTIME clock.
./sys_clock_gettime CLOCK_BOOTTIME 100
I tried using ltrace to figure out what was going on:
ltrace -cS ./sys_clock_gettime CLOCK_BOOTTIME 100
% time seconds usecs/call calls function
------ ----------- ----------- --------- --------------------
77.41 0.006887 68 100 syscall
14.32 0.001274 12 100 clock_gettime
1.79 0.000159 159 1 brk
1.17 0.000104 104 1 exit
1.16 0.000103 51 2 strcmp
0.82 0.000073 14 5 mmap
0.65 0.000058 14 4 mprotect
0.61 0.000054 54 1 exit_group
0.51 0.000045 45 1 atoi
0.31 0.000028 14 2 openat
0.30 0.000027 13 2 fstat
0.29 0.000026 13 2 close
0.19 0.000017 17 1 munmap
0.17 0.000015 15 1 access
0.16 0.000014 14 1 read
0.15 0.000013 13 1 arch_prctl
------ ----------- ----------- --------- --------------------
100.00 0.008897 225 total
For some reason there is a syscall entry as well as a clock_gettime entry. You can tell that the syscall is a library entry, while clock_gettime is a system call entry because if you disable the -S mode in ltrace it won't show up. If I'm interested in timing the execution of the system call, should I be adding the syscall and clock_gettime information? For some reason, this only happened when I ran this in Amazon Linux 2, when running in an Ubuntu VM in my own computer, I only get a SYS_clock_gettime entry, as I was expecting.
Now, compare this with another program, let's call it vdso_clock_gettime, that takes the same arguments.
ltrace -cS ./vdso_clock_gettime CLOCK_BOOTTIME 100
% time seconds usecs/call calls function
------ ----------- ----------- --------- --------------------
85.82 0.004340 43 100 clock_gettime
3.08 0.000156 156 1 brk
2.02 0.000102 51 2 strcmp
1.90 0.000096 96 1 exit
1.38 0.000070 14 5 mmap
1.13 0.000057 14 4 mprotect
0.97 0.000049 49 1 exit_group
0.93 0.000047 47 1 atoi
0.59 0.000030 15 2 openat
0.51 0.000026 13 2 fstat
0.51 0.000026 13 2 close
0.32 0.000016 16 1 access
0.32 0.000016 16 1 munmap
0.26 0.000013 13 1 read
0.26 0.000013 13 1 arch_prctl
------ ----------- ----------- --------- --------------------
100.00 0.005057 125 total
If I don't add the syscall and clock_gettime entries from the first program's output, then the total time required to call the system call clock_gettime is actually lower (12 usecs/call) than if I called the vDSO version (43 usecs/call) in the second program. That wouldn't make sense, in my opinion. It would only make sense if the first program's calls would take ~80 usecs/call (the addition of the two entries), as the vDSO is supposed to accelerate this.
I know that in this system (kernel 5.10), clock_gettime is accelerated via the vDSO, because when I run the snippet below, I get no output:
strace -e clock_gettime,gettimeofday -- date >/dev/null
+++ exited with 0 +++
I'm not sure if this is relevant, but the clock source is set to tsc. Any ideas of what is going on here?
Edit: Looking at the actual trace, I am starting to thing that the call to syscall includes both.
syscall(228, 7, 0x7ffd270f9c80, 0x1999999999999999 <unfinished ...>
clock_gettime@SYS(7, 0x7ffd270f9c80, 0x1999999999999999, 0x7ffd270fa70a) = 0
<... syscall resumed> )
I also experimented with nanosleep and if I counted both the syscall and nanosleep calls, the time of sleep would double, and that wouldn't make sense. Am I correct?