How to calculate say output duration?

Question

My goal is to calculate how long it takes to output text to audio when using the say command.

For example, say will speak in real time:

$ say -v Alex "Hello there"

I can then combine say with time to answer the question in the text, although we have wait until the end of the actual audio output:

$ time say -v Alex "Hello there. How long will this take?"

real    0m2.993s
user    0m0.006s
sys     0m0.009s

Is there a way to calculate how long it will take to output any say command without actually executing it? How?
If not, how can I use grep to pull out the real line?

I'm trying something like this:

time say -v Alex "Hello there. How long will this take?" | grep "^real   .*$"

But of course there is no result.

Is the output not being passed to grep, does grep not work for this multi-line output, or did I use the wrong pattern matching?

If grep won't work, what will?

UPDATE #1

Actually what I think I'm really looking for is the duration of the generated audio file that results from say.

@Peschke `grep "real"` didn't work for me so I tried something else — kraftydevil, Jul 19 '18 at 04:34

slm · Accepted Answer · 2018-07-19T05:59:13.240

Timing a run of `say`

Is there a way to calculate how long it will take to output any say command without actually executing it? How?

I see no way to accomplish this using any switches provided by the say command.

If not, how can I use grep to pull out the real line?

To parse the time output you can do the following:

$ ( time say -v Alex "Hello there. How long will this take?" ) |& grep real
real    0m2.987s

Alternatively:

$ ( time say -v Alex "Hello there. How long will this take?" ) 2>&1 | grep real
real    0m2.987s

In the above we've wrapped the time ... command in a subshell and then redirected the STDOUT & STDERROR (|&) to grep. The 2>&1 form does the same thing in situations where |& doesn't work for your particular version of Bash.

/dev/null

Incidentally, if you use the -o <file> argument to say you can speed up the translation of text to audio. Here since we don't actually want the audio file, we're directing to /dev/null instead:

$ ( time say -v Alex "Hello there. How long will this take?" -o /dev/null ) |& grep real
real    0m0.310s

Alternatively:

$ ( time say -v Alex "Hello there. How long will this take?" -o /dev/null ) 2>&1 | grep real
real    0m0.283s

Notice how much faster it is when not having to utilize the speakers to do this operation, that's the delay in using the audio I/O. By directing to a file instead it's much more efficient.

Calculating the audio's duration

To determine the duration of the resulting say's audio file you can do the following:

$ say -v Alex "Hello there. How long will this take?" -o a.aiff && \
    ffmpeg -i a.aiff 2>&1 | grep Duration && rm a.aiff
  Duration: 00:00:02.85, start: 0.000000, bitrate: 364 kb/s

Here we can see that the duration of the resulting audio is 2.85 seconds.

Further improvements?

I looked into piping the output from say directly into ffmpeg but say apparently cannot do this. Others have come to the same conclusion per the Ask Q&A titled: How to pipe output of 'say' to another command.

References

Do parentheses really put the command in a subshell?

The problem with outputting to `/dev/null` is that since we no longer have to wait for the audio to be output in real time we are welcome to generate it as quickly as possible. — Ignacio Vazquez-Abrams, Jul 19 '18 at 04:46
@IgnacioVazquez-Abrams - that's true, I wasn't sure if the OP wanted the time it took for `say` to actually do the translation of text to speech and utter it, or just the time to do the translation. — slm, Jul 19 '18 at 04:48