21

I have a command-line script that performs an API call and updates a database with the results.

I have a limit of 5 API calls per second with the API provider. The script takes more than 0.2 seconds to execute.

  • If I run the command sequentially, it will not run fast enough and I will only be making 1 or 2 API calls per second.
  • If I run the command sequentially, but simultaneously from several terminals, I might exceed the 5 calls / second limit.

If there a way to orchestrate threads so that my command-line script is executed almost exactly 5 times per second?

For example something that would run with 5 or 10 threads, and no thread would execute the script if a previous thread has executed it less than 200ms ago.

BenMorel
  • 4,447
  • 8
  • 36
  • 46
  • All of the answers depend on the assumption that your script will finish in the order it is called. Is it acceptable for your use case if they finish out of order? – Cody Gustafson May 03 '16 at 22:27
  • @CodyGustafson It is perfectly acceptable if they finish out of order. I don't believe there is such an assumption in the accepted answer, at least? – BenMorel May 03 '16 at 23:15
  • What happens if you exceed the number of calls per second? If the API provider throttles, you don't need any mechanism at your end... do you? – Floris May 04 '16 at 02:08
  • @Floris They will return an error message that will translate in an exception in the SDK. First of all I doubt the API provider will be happy if I generate 50 throttle messages per second (you're supposed to act upon such messages accordingly), and secondly I'm using the API for other purposes at the same time, so I don't want to reach the limit which is actually slightly higher. – BenMorel May 04 '16 at 10:20

5 Answers5

25

On a GNU system and if you have pv, you could do:

cmd='
   that command | to execute &&
     as shell code'

yes | pv -qL10 | xargs -n1 -P20 sh -c "$cmd" sh

The -P20 is to execute at most 20 $cmd at the same time.

-L10 limits the rate to 10 bytes per second, so 5 lines per second.

If your $cmds become two slow and causes the 20 limit to be reached, then xargs will stop reading until one $cmd instance at least returns. pv will still carry on writing to the pipe at the same rate, until the pipe gets full (which on Linux with a default pipe size of 64KiB will take almost 2 hours).

At that point, pv will stop writing. But even then, when xargs resumes reading, pv will try and catch up and send all the lines it should have sent earlier as quickly as possible so as to maintain a 5 lines per second average overall.

What that means is that as long as it's possible with 20 processes to meet that 5 run per second on average requirement, it will do it. However when the limit is reached, the rate at which new processes are started will not be driven by pv's timer but by the rate at which earlier cmd instances return. For instance, if 20 are currently running and have been for 10 seconds, and 10 of them decide to finish all at the same time, then 10 new ones will be started at once.

Example:

$ cmd='date +%T.%N; exec sleep 2'
$ yes | pv -qL10 | xargs -n1 -P20 sh -c "$cmd" sh
09:49:23.347013486
09:49:23.527446830
09:49:23.707591664
09:49:23.888182485
09:49:24.068257018
09:49:24.338570865
09:49:24.518963491
09:49:24.699206647
09:49:24.879722328
09:49:25.149988152
09:49:25.330095169

On average, it will be 5 times per second even if the delay between two runs will not always be exactly 0.2 seconds.

With ksh93 (or with zsh if your sleep command supports fractional seconds):

typeset -F SECONDS=0
n=0; while true; do
  your-command &
  sleep "$((++n * 0.2 - SECONDS))"
done

That puts no bound on the number of concurrent your-commands though.

Stéphane Chazelas
  • 522,931
  • 91
  • 1,010
  • 1,501
  • After a little bit of testing, the `pv` command seems to be exactly what I was looking for, couldn't hope better! Just on this line: `yes | pv -qL10 | xargs -n1 -P20 sh -c "$cmd" sh`, isn't the last `sh` redundant? – BenMorel May 03 '16 at 09:17
  • 1
    @Benjamin That second `sh` is for the `$0` in your `$cmd` script. It's also used in error messages by the shell. Without it, `$0` would be `y` from `yes`, so you'd get error messages like `y: cannot execute cmd`... You could also do `yes sh | pv -qL15 | xargs -n1 -P20 sh -c "$cmd"` – Stéphane Chazelas May 03 '16 at 09:27
  • I'm struggling to decompose the whole thing into understandable pieces, TBH! In your example, you have removed this last `sh`; and in my tests, when I remove it, I can see no difference! – BenMorel May 03 '16 at 09:36
  • @Benjamin. It's not critical. It will only make a different if your `$cmd` does use `$0` (why would it?) and for error messages. Try for instance with `cmd=/`; without the second `sh`, you'd see something like `y: 1: y: /: Permission denied` instead of `sh: 1: sh: /: Permission denied` – Stéphane Chazelas May 03 '16 at 09:43
  • I'm having an issue with your solution: it works fine for a few hours, then at some point it just exits, without any error. Could this be related to the pipe getting full, having some unexpected side effects? – BenMorel May 10 '16 at 22:48
  • @Benjamin, `yes` wouldn't terminate unless killed (like of SIGPIPE if pv dies). pv would only die on eof (yes terminates) or killed. xargs would only die on eof (pv terminates) of if killed or if cmd returns with exit status 255 (in which case I would expect a message about it). It mostly points to something getting killed. Does `ulimit -a` show unusually low limits? If run from bash, what does `echo "${PIPESTATUS[@]}"` or `echo $pipestatus` in zsh show after the pipeline returns? – Stéphane Chazelas May 17 '16 at 20:01
  • Oh this makes sense, then. PHP exits with status 255 when the script errors because of a max execution time exceeded. This is surely what happened on a few occasions! Thanks for the clarification. – BenMorel May 17 '16 at 20:19
  • @Benjamin, in that case, you can replace `$cmd` with `$cmd || echo >&2 "cmd failed with exit status $?"`, so that `sh`'s exit status be 0, but you still get a message about cmd failing. – Stéphane Chazelas May 17 '16 at 21:24
  • Will that prevent it from exiting, as well? – BenMorel May 17 '16 at 22:52
5

Simplistically, if your command lasts less than 1 second you can just start 5 commands each second. Obviously, this is very bursty.

while sleep 1
do    for i in {1..5}
      do mycmd &
      done
done

If your command might take more than 1 second and you want to spread out the commands you can try

while :
do    for i in {0..4}
      do  sleep .$((i*2))
          mycmd &
      done
      sleep 1 &
      wait
done

Alternatively, you can have 5 separate loops that run independently, with a 1 second minimum.

for i in {1..5}
do    while :
      do   sleep 1 &
           mycmd &
           wait
      done &
      sleep .2
done
meuh
  • 49,672
  • 2
  • 52
  • 114
  • Quite nice solution as well. I like the fact that it's simple and is exactly 5 times per second, but it has the disadvantage of starting 5 commands at the same time (instead of every 200ms), and maybe lacks the safeguard of having at most n threads running at a time! – BenMorel May 03 '16 at 09:51
  • @Benjamin I added a 200ms sleep in the loop of the second version. This second version cannot have more than 5 cmds running at a time as we only every start 5, then wait for them all. – meuh May 03 '16 at 10:14
  • The issue is, you cannot have more than 5 per second started; if all of the scripts suddenly take more than 1s to execute, then you're far away from reaching the API limit. Plus, if you wait for them all, a *single* blocking script would block all the others? – BenMorel May 03 '16 at 10:24
  • @Benjamin So you can run 5 independent loops, each with a minimum sleep of 1 second, see 3rd version. – meuh May 03 '16 at 10:41
2

With a C program,

You can for example use a thread which sleeps for 0.2 seconds into a while

#include<stdio.h>
#include<string.h>
#include<pthread.h>
#include<stdlib.h>
#include<unistd.h>

pthread_t tid;

void* doSomeThing() {
    While(1){
         //execute my command
         sleep(0.2)
     } 
}

int main(void)
{
    int i = 0;
    int err;


    err = pthread_create(&(tid), NULL, &doSomeThing, NULL);
    if (err != 0)
        printf("\ncan't create thread :[%s]", strerror(err));
    else
        printf("\n Thread created successfully\n");



    return 0;
}

use it to know how to create a thread : create a thread (this is the link I've used to paste this code)

Couim
  • 195
  • 1
  • 9
  • Thanks for your answer, although I was ideally looking for something that would not involve C programming, but only using existing Unix tools! – BenMorel May 03 '16 at 09:52
  • Yeah, the stackoverflow answer to this might for example be to use a token bucket shared between multiple worker threads, but asking on Unix.SE suggests more of a "Power user" rather than "programmer" approach is wanted :-) Still, `cc` is an existing Unix tool, and this isn't a lot of code! – Steve Jessop May 03 '16 at 11:30
1

Using node.js you can start a single thread that executes the bash script every 200 milliseconds no matter how long the response takes to come back because the response comes through a callback function.

var util = require('util')
exec = require('child_process').exec

setInterval(function(){
        child  = exec('fullpath to bash script',
                function (error, stdout, stderr) {
                console.log('stdout: ' + stdout);
                console.log('stderr: ' + stderr);
                if (error !== null) {
                        console.log('exec error: ' + error);
                }
        });
},200);

This javascript runs every 200 milliseconds and the response is got through the callback function function (error, stdout, stderr).

In this way you can control that it never exceeds the 5 calls per second independently of how slow or fast is the execution of the command or how much it has to wait for a response.

jcbermu
  • 4,626
  • 17
  • 26
  • I like this solution: it starts *exactly* 5 commands per second, at regular intervals. The only drawback I can see is that it lacks a safeguard of having at most n processes running at a time! If this is something you could include easily? I'm not familiar with node.js. – BenMorel May 03 '16 at 09:55
0

I've used Stéphane Chazelas' pv-based solution for some time, but found out that it exited randomly (and silently) after some time, anywhere from a few minutes to a few hours. -- Edit: The reason was that my PHP script occasionally died because of a max execution time exceeded, exiting with status 255.

So I decided to write a simple command-line tool that does exactly what I need.

Achieving my original goal is as simple as:

./parallel.phar 5 20 ./my-command-line-script

It starts almost exactly 5 commands per second, unless there are already 20 concurrent processes, in which case it skips the next execution(s) until a slot becomes available.

This tool is not sensitive to a status 255 exit.

BenMorel
  • 4,447
  • 8
  • 36
  • 46