5

I've found a solution that doesn't work by me:

audio - Monitoring the microphone level with a command line tool in Linux - Super User https://superuser.com/questions/306701/monitoring-the-microphone-level-with-a-command-line-tool-in-linux

The problem is that they are using Maximum amplitude to detect sound. However its value is always the same by me, no matter whether the recorded audio contains only silence or some sounds. For example:

10 sec of silence (Can be downloaded here: http://denis-aristov.ucoz.com/en/test-mic-silence.wav ):

$ arecord -f S16_LE -D hw:2,0 -d 10 /tmp/test-mic-silence.wav

$ sox -t .wav /tmp/test-mic-silence.wav -n stat
Samples read:             80000
Length (seconds):     10.000000
Scaled by:         2147483647.0
Maximum amplitude:     0.999969
Minimum amplitude:    -1.000000
Midline amplitude:    -0.000015
Mean    norm:          0.202792
Mean    amplitude:     0.009146
RMS     amplitude:     0.349978
Maximum delta:         0.913849
Minimum delta:         0.000000
Mean    delta:         0.001061
RMS     delta:         0.005564
Rough   frequency:           20
Volume adjustment:        1.000

10 sec with some sounds (Can be downloaded here: http://denis-aristov.ucoz.com/en/test-mic-sounds.wav ):

$ arecord -f S16_LE -D hw:2,0 -d 10 /tmp/test-mic-sounds.wav

$ sox -t .wav /tmp/test-mic-sounds.wav -n stat
Samples read:             80000
Length (seconds):     10.000000
Scaled by:         2147483647.0
Maximum amplitude:     0.999969
Minimum amplitude:    -1.000000
Midline amplitude:    -0.000015
Mean    norm:          0.185012
Mean    amplitude:     0.010225
RMS     amplitude:     0.334286
Maximum delta:         1.999969
Minimum delta:         0.000000
Mean    delta:         0.006213
RMS     delta:         0.057844
Rough   frequency:          220
Volume adjustment:        1.000

What is the difference? What values to use for sound detection? Or do I have to set something up because something works wrong?

I've just used another computer (a notebook with a built-in microphone). I've recorded two WMA files (with and without sounds) using Windows "Sound Recorder". Converted them to WAV files using audacity and got the following outputs. Maximum amplitudes differ this time:

With sounds:

$ sox -t .wav /tmp/mic-sounds.wav -n stat
Samples read:            581632
Length (seconds):      6.594467
Scaled by:         2147483647.0
Maximum amplitude:     0.999969
Minimum amplitude:    -1.000000
Midline amplitude:    -0.000015
Mean    norm:          0.013987
Mean    amplitude:     0.000062
RMS     amplitude:     0.065573
Maximum delta:         1.999969
Minimum delta:         0.000000
Mean    delta:         0.011242
RMS     delta:         0.047009
Rough   frequency:         5031
Volume adjustment:        1.000

Without sounds:

$ sox -t .wav /tmp/mic-silence.wav -n stat
Samples read:            372736
Length (seconds):      4.226032
Scaled by:         2147483647.0
Maximum amplitude:     0.029022
Minimum amplitude:    -0.029114
Midline amplitude:    -0.000046
Mean    norm:          0.005082
Mean    amplitude:    -0.000053
RMS     amplitude:     0.006480
Maximum delta:         0.030487
Minimum delta:         0.000000
Mean    delta:         0.005815
RMS     delta:         0.007285
Rough   frequency:         7891
Volume adjustment:       34.348

May it be an indication that there are some problems with the microphone on another computer?

ka3ak
  • 1,235
  • 4
  • 18
  • 30
  • Looks like something is wrong. Have you tried to listen to the recordings of silence? Aren't they just pure noise? – Michal Polovka May 14 '17 at 08:45
  • Of course I heard them after recording. The second one contained the sounds. – ka3ak May 14 '17 at 09:09
  • Look at the first recording ("10 secs of silence") in an audio editor, e.g. `audacity`. You'll see a DC (very low frequency) component when the level goes from 1 at 0 secs to -1 at 1 secs to 0.5 at 1.5 secs, and then falls down to near zero near the end. Did you plug in the mic during that time? If yes, you need to wait ca. 10 seconds before the amplitude settles, then measure. If not, you need to filter out the DC component somehow. `sox` has several filters you can try. – dirkt May 14 '17 at 17:57
  • @dirkt I should have mentioned that I want to differ between an audio with some sounds and an audio without them by using a shell script only. The microphone was plugged in all the time. What is DC? – ka3ak May 16 '17 at 08:24
  • You can use the `sox` filters from a shell script without problems. Try e.g. `highpass 100`, that filters out most of it except for the initial jump. DC = direct current. If the mic was plugged in all the time, something strange is going on, or it recorded the final keypress or something. Try to record, say, 3x 10-sec samples of silence one after the other, first directly, then from a single script, and see in which ones you get this effect (or upload them so I can have a look). – dirkt May 16 '17 at 09:14
  • @dirkt There was no final key press as the arecord records 10 seconds without user interruption. I also don't understand what you mean with "effect". For me everything looks fine without any effects. I've uploaded 2 files. If you hear them you'll notice that one of them doesn't contain any sounds while the other does. I've also posted corresponding sox outputs for them. Do you mean with "effect" that some sox output is unexpected? Then what is expected in your opinion? – ka3ak May 16 '17 at 11:41
  • I meant the key press (`Enter`) that occurs when you start the program. Have you looked at the "silence" file in Audacity as I recommended? It *doesn't* look fine. Look at it, then do `sox test-mic-silence.wav highpass.wav highpass 100` (that applies the filter as effect), then look at `highpass.wav`. Maybe that clears things up. – dirkt May 16 '17 at 13:11
  • @dirkt Thanks. You were right. I opened test-mic-silence.wav in audacity and cut off a few milliseconds at the beginning. After that Maximum amplitude was reduced to 0.465576. There must be a key press or another sound when the recording started. – ka3ak May 16 '17 at 16:52
  • @dirkt However it wasn't caused by a key press. I tried to record with a delay $ sleep 15; arecord... but the result had the same Maximum amplitude 0.999969. There must be some distortion when arecord starts recording. It would be great to know how to workaround this. – ka3ak May 16 '17 at 17:03
  • If you'd actually do the 3-in-sequence recording, both manually and from a script like I suggested, we may get a hint where this distortion comes from, by looking at the shape of the distortion in the subseqeuent recordings. I'm not suggesting things for fun: You don't debug by scratching your head and *knowing* the solution, you debug by making tests to see if you can narrow down the problem. – dirkt May 17 '17 at 05:22
  • @dirkt In my task I can ignore the first 50-100 ms of a recording. The remaining time seems to be without any distortions. – ka3ak May 19 '17 at 14:32
  • The remaining time definitely is not without distortions, at least in the one recording of silence you made available. Just *look* at the curve. It takes several seconds until it settles near zero, where it should be. Must I draw some pictures, and explain what the curve actually means? – dirkt May 19 '17 at 15:30
  • @dirkt Yes. 2-3 seconds should be enough. I can even ignore a minute after start in my task. – ka3ak May 19 '17 at 15:36
  • @dirkt You can post an answer based on your recommendation in the third comment where you mention higher amplitude at the beginning. I'll accept it. As I wrote previously I can ignore even the first minute of a recording. I tested it and it worked. – ka3ak Jan 23 '19 at 19:37

1 Answers1

2

(Answer based on various comments, as this method seems to be acceptable, and comments are not guaranteed to stay.)

Look at the first recording ("10 secs of silence") in an audio editor, e.g. audacity. You'll see a DC (very low frequency) component when the level goes from 1 at 0 secs to -1 at 1 secs to 0.5 at 1.5 secs, and then falls down to near zero near the end. Did you plug in the mic during that time? If yes, you need to wait ca. 10 seconds before the amplitude settles, then measure. If not, you need to filter out the DC (direct current, that is constant voltage offset) component somehow. sox has several filters you can try.

You can use the sox filters from a shell script without problems. Try e.g. highpass 100, that filters out most of it except for the initial jump.

If filtering out DC components is too much effort, you can also ignore the initial part, and use the remaining part as it is.

dirkt
  • 31,679
  • 3
  • 40
  • 73