How to fake a microphone stream

Posted Dec 1, 2024

Recently, I was benchmarking a Speech to Text module at work and we were particularly interested in the latency of the us-central location. We first spun up an empty server in us-central, setup and installed the STT module and the bechmarking scripts. To remove network latency from affecting the numbers, we wanted to have input audio originating from the machine itself. To be thorough, we were interested in the performance of ‘speech in real-time’ as opposed to measuring results by reading directly from an audio file (The results do vary quite a bit). I figured I could do this without changing a lot of the STT module’s code with just a virtual microphone that can stream data from an audio file at real-time speed. This way, I could run the stream in loop for thousands of iterations to get statistically relevant metrics.

Now, a virtual linux server does not have a physical sound card as it is a software emulation of a physical server. To verify, let’s install alsa-utils by running sudo apt install alsa-utils, we now have access to arecord (it’s needed again anyway). arecord is a simple command-line soundfile recorder. It supports several file formats and multiple soundcards with multiple devices. When we run arecord test.wav -t wav -f s16_LE -c 1 -r 16000, arecord is supposed to find our microphone, connect with it, start a stream, record the audio in s16le format, with a single channel, sample rate of 16000 Hz to ./test.wav file. We see this error instead: ALSA lib confmisc.c.855:(parse_card) cannot find card '0'. There it is: No sound card

Creating the virtual microphone

First, we install: sudo apt install pulseaudio. It is a sound a server system which performs multiple functions but we are particularly interested in module-pipe-source. It is a PulseAudio module that allows you to create a virtual audio source by reading audio data from a FIFO (First-In-First-Out) file.

Below is a bash script named: virtmic.sh. It can be made executable by running: chmod +x ./virtmic.sh

#!/bin/bash

# This script will create a virtual microphone for PulseAudio to use and set it as the default device.

# Load the "module-pipe-source" module to read audio data from a FIFO special file.
pactl load-module module-pipe-source source_name=virtmic file=/home/user/virtmic format=s16le rate=16000 channels=1

# Set the virtmic as the default source device.
pactl set-default-source virtmic

# Create a file that will set the default source device to virtmic for all PulseAudio client applications.
echo "default-source = virtmic" > /home/user/.config/pulse/client.conf

On running the above script ./virtmic.sh we create a virtual audio source named virtmic which acts as a virtual microphone. It will be created in the directory /home/user/ and when we do an ls we can see it as a file named virtmic there.

Running the stream

To run the stream we install sudo apt install ffmpeg. ffmpeg is a command-line tool that is used to convert multimedia files between formats.

Below is a bash script named: stream.sh. It can be made executable by running: chmod +x ./stream.sh

#!/bin/sh

# Write the audio file to the named pipe virtmic. This will block until the named pipe is read.
ffmpeg -re -i verloop.wav -f s16le -ar 16000 -ac 1 - > /home/user/virtmic

On running ./stream.sh

-re sets readrate to 1, telling ffmpeg to read the input at native frame rate i.e. real-time speed (This is what we want!)
-i verloop.wav is the input audio file we want to run in the stream. We can rsync this from our local machine.
-f s16le, ar 16000, -ac 1 are the audio format, rate and number of channels respectively. These have to be the same as we set in the virtmic.sh bash script.
- > /home/user/virtmic redirects the output to the FIFO file (virtual microphone), essentially populating our stream.

Now, we can test this by opening another pane if you’ve tmux running. The moment we run ./stream.sh, we can switch over the other pane and run arecord test.wav -t wav -f s16_LE -c 1 -r 16000. Once the stream ends, stop arecord, download the new test.wav that is created on the server to your local machine and take a listen. It should be the same audio as the input audio file.

Now, we can use python’s subprocess module to call ./stream.sh and run it inside a thread in the background. It can be looped over as many iterations as required along with the main program which in my case was the STT module and the benchmark scripts. Here’s a python snippet:

from subprocess import call
import threading

def run_audio_in_thread(self):
    def run_audio():
        rc = call("./stream.sh")
    thread = threading.Thread(target=run_audio)
    thread.daemon = True
    thread.start()
    return thread

Cleanup

Below is a bash script named: cleanup.sh. It can be made executable by running: chmod +x ./cleanup.sh

#!/bin/bash

# Uninstall the virtual microphone.
pactl unload-module module-pipe-source
rm /home/user/.config/pulse/client.conf

Running ./cleanup.sh unloads the PulseAudio module and essentially removes the virtual microphone. Once this script is run, to re-run the ./stream.sh, first ./virtmic.sh has to be run again.

How to fake a microphone stream

Creating the virtual microphone

Running the stream

Cleanup

Share

Comments