C++ audio

A big part of my current work project of developing model train decoders is audio processing on tiny embedded platforms. An external flash memory on the decoder can be loaded with wave files which then play in certain situations (e.g. a whistle, a generator starting, drive noise, …) to emulate a real locomotive as realistically as possible. Doing audio is usually considered a “soft” real-time task. Nobody’s going to die if you screw things up but miss a single sample in a buffer of 1024 and your speaker will punish you with a loud crack. Deeply embedded audio solutions are usually a 100% in-house development but I still wanted my code to be cross-platform with a single (obvious) exception: audio output.

The way audio output on modern microcontrollers work is by either utilizing a digital serial audio bus like I2S or a digital-to-analog converter depending on whether the external amplifier is digital or analog. From a software perspective there ain’t much difference because you point your device to some memory location containing the next chunk of audio data and say “here, take this and tell me when you’re ready again.” Copying the chunk sample-by-sample to the actual output is done by DMA so that there is no additional processor load for that.

And on x86? Well, turns out it pretty much works the same. Of course there’s a whole lot more going on under the hood and you definitely want to use a library but I had to write surprisingly little code in order to adapt my audio engine.

The libraries I tried are PortAudio and RtAudio. Both are surprisingly easy to use and have great documentation. Recently there’s also been some effort by Timur Doumler to get audio into the standard library but it’s not quite there yet. I still recommend talking a look at some of his talks (e.g. C++ in the Audio Industry and Audio in standard C++) as they contain some of the best explanations of how low-level audio processing works.

Let’s take a look at how to play some raw audio data (e.g. wave files) with PortAudio and RtAudio. There are basically only two things required:

  1. Initialize the library and open a stream
  2. Write a callback to pass to the library

Although RtAudio claims to be C++ it looks more like C with classes and exceptions. So it’s no surprise that the initialization code for both libraries is almost identical. For both examples I pass a wave file to main(), read the number of channels, bit depth and sample rate from its header and open a stream according to those attributes.

// Initialize PortAudio
err = Pa_Initialize();
if (err != paNoError)
  goto error;

// Default output device
outputParameters.device = Pa_GetDefaultOutputDevice();
if (outputParameters.device == paNoDevice) {
  fprintf(stderr, "Error: No default output device.\n");
  goto error;

// Set stream parameters
outputParameters.channelCount = wav_header.channels;
outputParameters.sampleFormat = wav_header.bit_depth == 8 ? paUInt8 : paInt16;
outputParameters.suggestedLatency = Pa_GetDeviceInfo(outputParameters.device)->defaultLowOutputLatency;
outputParameters.hostApiSpecificStreamInfo = NULL;

// Open stream
err = Pa_OpenStream(&stream,
                    NULL,  // No input
                    &wav_header);  // User data
if (err != paNoError)
  goto error;
// Initialize RtAudio
RtAudio audio;

// Check if audio device is available
if (audio.getDeviceCount() < 1) {
  std::cout << "\nNo audio devices found!\n";

// Let RtAudio print messages to stderr

// Set stream parameters
unsigned int bufferFrames{256};
RtAudio::StreamParameters oParams;
oParams.deviceId = audio.getDefaultOutputDevice();
oParams.nChannels = wav_header.channels;
oParams.firstChannel = 0;

RtAudio::StreamOptions options;

try {
  // Open stream
                   NULL,  // No input
                   wav_header.bit_depth == 8 ? RTAUDIO_SINT8 : RTAUDIO_SINT16,
                   NULL);  // No error callback

  // Start stream
} catch (RtAudioError& e) {
  goto cleanup;

One of the few differences between the libraries seems to be that RtAudio does not support unsigned 8bit playback whereas PortAudio does (not that this matters much today).

Noticed the paCallback and rtCallback parameters I passed in the snippets above? Here’s their signature.

static int paCallback(void const* input,
                      void* output,
                      unsigned long frameCount,
                      PaStreamCallbackTimeInfo const* timeInfo,
                      PaStreamCallbackFlags statusFlags,
                      void* userData) {}
static int rtCallback(void* outputBuffer,
                      void* inputBuffer,
                      unsigned int nFrames,
                      double streamTime,
                      RtAudioStreamStatus status,
                      void* userData) {}

Apart from the time and status info you could pretty much swap those between libraries. As far as the implementation goes the user must ensure that whatever data needs to play next gets copied to the void* output. This is also where things get inevitable ugly. Either you pass your data as void* or you use globals…

Either way here are the two examples on my GitHub account:

Both require C++17 and the filesystem part of the standard library to compile.