A simple method of measuring audio latency

By wrung out by Nirbheek

In my previous blog post, I talked about how I improved the latency of GStreamer's default audio capture and render elements on Windows. An important part of any such work is a way to accurately measure the latencies in your audio path. Ideally, one would use a mechanism that can track your buffers and give you a detailed breakdown of how much latency each component of your system adds. For instance, with an audio pipeline like this: audio-capture → filter1 → filter2 → filter3 → audio-output

If you use GStreamer, you can use the latency tracer to measure how much latency filter1 adds, filter2 adds, and so on.

However, sometimes you need to measure latencies added by components outside of your control, for instance the audio APIs provided by the operating system, the audio drivers, or even the hardware itself. In that case it's really difficult, bordering on impossible, to do an automated breakdown.

But we do need some way of measuring those latencies, and I needed that for the aforementioned work. Maybe we can get an aggregated (total) number?

There's a simple way to do that if we can create a loopback connection in the audio setup. What's a loopback you ask?

Ouroboros snake biting its tail
Essentially, if we can redirect the audio output back to the audio input, that's called a loopback. The simplest way to do this is to connect the speaker-out/line-out to the microphone-in/line-in with a two-sided 3.5mm jack.
photo of male-to-male 3.5mm jack connecting speaker-out to mic-in
Now, when we send an audio wave down to the audio output, it'll show up on the audio input.

Hmm, what if we store the current time when we send the wave out, and compare it with the current time when we get it back? Well, that's the total end-to-end latency!

If we send out a wave periodically, we can measure the latency continuously, even as things are switched around or the pipeline is dynamically reconfigured. Some of you may notice that this is somewhat similar to how the `ping` command measures latencies across the Internet.
screenshot of ping to 192.168.1.1
Just like a network connection, the loopback connection can be lossy or noisy, f.ex. if you use loudspeakers and a microphone instead of a wire, or if you have (ugh) noise in your line. But unlike network packets, we lose all context once the waves leave our pipeline and we have no way of uniquely identifying each wave. So the simplest reliable implementation is to have only one wave traveling down the pipeline at a time. If we send a wave out, say, once a second, we can wait about one second for it to show up, and otherwise presume that it was lost.

That's exactly how the audiolatency GStreamer plugin that I wrote works! Here you can see its output while measuring the combined latency of the WASAPI source and sink elements:

The first measurement will always be wrong because of various implementation details in the audio stack, but the next measurements should all be correct. This mechanism does place an upper bound on the latency that we can measure, and on how often we can measure it, but it should be possible to take more frequent measurements by sending a new wave as soon as the previous one was received (with a 1 second timeout). So this is an enhancement that can be done if people need this feature.

Hope you find the element useful; go forth and measure!