Timing system call with ltrace: Adding `syscall` + `clock_gettime`?

Question

I'm looking for some clarity in terms of the system calls. I tried avoiding the vDSO by calling a system call clock_gettime directly. Here is a snippet of a program named sys_clock_gettime.

int clock = CLOCK_MONOTONIC;
struct timespec ts;
if (syscall(__NR_clock_gettime, clock, &ts) == -1){
  exit(-1); 
}

This program takes the name of a clock (CLOCK_BOOTTIME, CLOCK_MONOTONIC, etc...) and a number of calls to make. For instance, the snippet below would make 100 calls to clock_gettime for the CLOCK_BOOTTIME clock.

./sys_clock_gettime CLOCK_BOOTTIME 100

I tried using ltrace to figure out what was going on:

ltrace -cS ./sys_clock_gettime CLOCK_BOOTTIME 100
% time     seconds  usecs/call     calls      function
------ ----------- ----------- --------- --------------------
 77.41    0.006887          68       100 syscall
 14.32    0.001274          12       100 clock_gettime
  1.79    0.000159         159         1 brk
  1.17    0.000104         104         1 exit
  1.16    0.000103          51         2 strcmp
  0.82    0.000073          14         5 mmap
  0.65    0.000058          14         4 mprotect
  0.61    0.000054          54         1 exit_group
  0.51    0.000045          45         1 atoi
  0.31    0.000028          14         2 openat
  0.30    0.000027          13         2 fstat
  0.29    0.000026          13         2 close
  0.19    0.000017          17         1 munmap
  0.17    0.000015          15         1 access
  0.16    0.000014          14         1 read
  0.15    0.000013          13         1 arch_prctl
------ ----------- ----------- --------- --------------------
100.00    0.008897                   225 total

For some reason there is a syscall entry as well as a clock_gettime entry. You can tell that the syscall is a library entry, while clock_gettime is a system call entry because if you disable the -S mode in ltrace it won't show up. If I'm interested in timing the execution of the system call, should I be adding the syscall and clock_gettime information? For some reason, this only happened when I ran this in Amazon Linux 2, when running in an Ubuntu VM in my own computer, I only get a SYS_clock_gettime entry, as I was expecting.

Now, compare this with another program, let's call it vdso_clock_gettime, that takes the same arguments.

ltrace -cS ./vdso_clock_gettime CLOCK_BOOTTIME 100
% time     seconds  usecs/call     calls      function
------ ----------- ----------- --------- --------------------
 85.82    0.004340          43       100 clock_gettime
  3.08    0.000156         156         1 brk
  2.02    0.000102          51         2 strcmp
  1.90    0.000096          96         1 exit
  1.38    0.000070          14         5 mmap
  1.13    0.000057          14         4 mprotect
  0.97    0.000049          49         1 exit_group
  0.93    0.000047          47         1 atoi
  0.59    0.000030          15         2 openat
  0.51    0.000026          13         2 fstat
  0.51    0.000026          13         2 close
  0.32    0.000016          16         1 access
  0.32    0.000016          16         1 munmap
  0.26    0.000013          13         1 read
  0.26    0.000013          13         1 arch_prctl
------ ----------- ----------- --------- --------------------
100.00    0.005057                   125 total

If I don't add the syscall and clock_gettime entries from the first program's output, then the total time required to call the system call clock_gettime is actually lower (12 usecs/call) than if I called the vDSO version (43 usecs/call) in the second program. That wouldn't make sense, in my opinion. It would only make sense if the first program's calls would take ~80 usecs/call (the addition of the two entries), as the vDSO is supposed to accelerate this.

I know that in this system (kernel 5.10), clock_gettime is accelerated via the vDSO, because when I run the snippet below, I get no output:

strace -e clock_gettime,gettimeofday -- date >/dev/null
+++ exited with 0 +++

I'm not sure if this is relevant, but the clock source is set to tsc. Any ideas of what is going on here?

Edit: Looking at the actual trace, I am starting to thing that the call to syscall includes both.

syscall(228, 7, 0x7ffd270f9c80, 0x1999999999999999 <unfinished ...>
clock_gettime@SYS(7, 0x7ffd270f9c80, 0x1999999999999999, 0x7ffd270fa70a) = 0
<... syscall resumed> )

I also experimented with nanosleep and if I counted both the syscall and nanosleep calls, the time of sleep would double, and that wouldn't make sense. Am I correct?

score 1 · Accepted Answer · answered Feb 27 '22 at 13:23

I suspect some of the confusion is down to naming. On my system, forcing a system call produces a trace in this style:

% time     seconds  usecs/call     calls      function
------ ----------- ----------- --------- --------------------
 84.10    0.005402          54       100 syscall
 10.14    0.000651           6       100 SYS_clock_gettime
  1.87    0.000120          17         7 SYS_mmap
  0.97    0.000062          62         1 SYS_brk
  0.72    0.000046          11         4 SYS_mprotect
  0.47    0.000030          30         1 SYS_access
  0.45    0.000029          29         1 SYS_read
  0.37    0.000024          12         2 SYS_fstat
  0.31    0.000020          10         2 SYS_openat
  0.26    0.000017          17         1 SYS_munmap
  0.23    0.000015           7         2 SYS_close
  0.11    0.000007           7         1 SYS_arch_prctl
------ ----------- ----------- --------- --------------------
100.00    0.006423                   222 total

Using the vDSO produces a trace in this style:

% time     seconds  usecs/call     calls      function
------ ----------- ----------- --------- --------------------
 93.17    0.004158          41       100 clock_gettime
  1.66    0.000074          74         1 SYS_access
  1.61    0.000072          10         7 SYS_mmap
  0.92    0.000041          10         4 SYS_mprotect
  0.72    0.000032          16         2 SYS_openat
  0.56    0.000025          25         1 SYS_brk
  0.36    0.000016           8         2 SYS_close
  0.34    0.000015           7         2 SYS_fstat
  0.34    0.000015          15         1 SYS_munmap
  0.18    0.000008           8         1 SYS_read
  0.16    0.000007           7         1 SYS_arch_prctl
------ ----------- ----------- --------- --------------------
100.00    0.004463                   122 total

In the first case, the program calls the C library syscall function, which calls the clock_gettime system call; in the second case, the program calls the C library clock_gettime function, which calls the vDSO. The two clock_gettime calls are different, but they appear identical in your traces.

As I understand it, the time reported for syscall does include the time reported for SYS_clock_gettime; this also means that the percentage display is misleading. This isn’t covered in the documentation and I haven’t read all the source code so I may be wrong.

Another point to consider is that the times measured by ltrace include ltrace’s overhead. Other profiling tools such as perf stat show that the vDSO is indeed faster:

             42.13 msec task-clock                #    0.988 CPUs utilized          
                 4      context-switches          #    0.095 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
                44      page-faults               #    0.001 M/sec                  
       138,771,425      cycles                    #    3.294 GHz                      (71.53%)
        38,690,379      instructions              #    0.28  insn per cycle           (71.57%)
         7,599,904      branches                  #  180.377 M/sec                    (81.14%)
           300,729      branch-misses             #    3.96% of all branches          (75.76%)

       0.042636471 seconds time elapsed

       0.015510000 seconds user
       0.027143000 seconds sys

typically for the system call version (over 100,000 calls), and

              4.84 msec task-clock                #    0.815 CPUs utilized          
                 0      context-switches          #    0.000 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
                48      page-faults               #    0.010 M/sec                  
         6,257,648      cycles                    #    1.294 GHz                      (50.28%)
         9,019,370      instructions              #    1.44  insn per cycle         
         1,925,045      branches                  #  398.113 M/sec                  
             1,594      branch-misses             #    0.08% of all branches          (49.72%)

       0.005934008 seconds time elapsed

       0.003025000 seconds user
       0.003025000 seconds sys

for the vDSO version.

Timing system call with ltrace: Adding `syscall` + `clock_gettime`?

1 Answers1