How to fake a microphone stream
Posted
Recently, I was benchmarking a Speech to Text module at work and we were particularly interested in the latency of the us-central
location. We first spun up an empty server in us-central
, setup and installed the STT module and the bechmarking scripts. To remove network latency from affecting the numbers, we wanted to have input audio originating from the machine itself. To be thorough, we were interested in the performance of ‘speech in real-time’ as opposed to measuring results by reading directly from an audio file (The results do vary quite a bit). I figured I could do this without changing a lot of the STT module’s code with just a virtual microphone that can stream data from an audio file at real-time speed. This way, I could run the stream in loop for thousands of iterations to get statistically relevant metrics.
Now, a virtual linux server does not have a physical sound card as it is a software emulation of a physical server. To verify, let’s install alsa-utils
by running sudo apt install alsa-utils
, we now have access to arecord
(it’s needed again anyway). arecord
is a simple command-line soundfile recorder. It supports several file formats and multiple soundcards with multiple devices.
When we run arecord test.wav -t wav -f s16_LE -c 1 -r 16000
, arecord
is supposed to find our microphone, connect with it, start a stream, record the audio in s16le format, with a single channel, sample rate of 16000 Hz
to ./test.wav
file. We see this error instead: ALSA lib confmisc.c.855:(parse_card) cannot find card '0'
. There it is: No sound card
Creating the virtual microphone
First, we install: sudo apt install pulseaudio
. It is a sound a server system which performs multiple functions but we are particularly interested in module-pipe-source
. It is a PulseAudio module that allows you to create a virtual audio source by reading audio data from a FIFO (First-In-First-Out) file.
Below is a bash script named: virtmic.sh
. It can be made executable by running: chmod +x ./virtmic.sh
#!/bin/bash
# This script will create a virtual microphone for PulseAudio to use and set it as the default device.
# Load the "module-pipe-source" module to read audio data from a FIFO special file.
pactl load-module module-pipe-source source_name=virtmic file=/home/user/virtmic format=s16le rate=16000 channels=1
# Set the virtmic as the default source device.
pactl set-default-source virtmic
# Create a file that will set the default source device to virtmic for all PulseAudio client applications.
echo "default-source = virtmic" > /home/user/.config/pulse/client.conf
On running the above script ./virtmic.sh
we create a virtual audio source named virtmic
which acts as a virtual microphone. It will be created in the directory /home/user/
and when we do an ls
we can see it as a file named virtmic
there.
Running the stream
To run the stream we install sudo apt install ffmpeg
. ffmpeg
is a command-line tool that is used to convert multimedia files between formats.
Below is a bash script named: stream.sh
. It can be made executable by running: chmod +x ./stream.sh
#!/bin/sh
# Write the audio file to the named pipe virtmic. This will block until the named pipe is read.
ffmpeg -re -i verloop.wav -f s16le -ar 16000 -ac 1 - > /home/user/virtmic
On running ./stream.sh
-re
sets readrate to 1, telling ffmpeg to read the input at native frame rate i.e. real-time speed (This is what we want!)-i verloop.wav
is the input audio file we want to run in the stream. We canrsync
this from our local machine.-f s16le
,ar 16000
,-ac 1
are the audio format, rate and number of channels respectively. These have to be the same as we set in thevirtmic.sh
bash script.- > /home/user/virtmic
redirects the output to the FIFO file (virtual microphone), essentially populating our stream.
Now, we can test this by opening another pane if you’ve tmux
running. The moment we run ./stream.sh
, we can switch over the other pane and run arecord test.wav -t wav -f s16_LE -c 1 -r 16000
. Once the stream ends, stop arecord
, download the new test.wav
that is created on the server to your local machine and take a listen. It should be the same audio as the input audio file.
Now, we can use python’s subprocess
module to call ./stream.sh
and run it inside a thread in the background. It can be looped over as many iterations as required along with the main program which in my case was the STT module and the benchmark scripts. Here’s a python snippet:
from subprocess import call
import threading
def run_audio_in_thread(self):
def run_audio():
rc = call("./stream.sh")
thread = threading.Thread(target=run_audio)
thread.daemon = True
thread.start()
return thread
Cleanup
Below is a bash script named: cleanup.sh
. It can be made executable by running: chmod +x ./cleanup.sh
#!/bin/bash
# Uninstall the virtual microphone.
pactl unload-module module-pipe-source
rm /home/user/.config/pulse/client.conf
Running ./cleanup.sh
unloads the PulseAudio module and essentially removes the virtual microphone. Once this script is run, to re-run the ./stream.sh
, first ./virtmic.sh
has to be run again.