An Analysis of Sample Rate Conversion in Sox

Sox contains three different sample-rate conversion algorithms:
  1. Linear Interpolation: missing samples are linearly interpolated from existing samples.
  2. Band-Limited Interpolation: missing samples are interpolated using a Kaiser-windowed sinc function from existing samples.
  3. Polyphase Filtering: traditional DSP interpolation and decimation mechanisms implemented efficiently.
The question that arises is: Which of these algorithms is the best for converting my data?

Unfortunately, the answer is not quite obvious, especially because some of the effects suffered from serveral implementation flaws in older versions (before sox 12.16). The resample effect had some trouble with calculating the correct cut-off frequency, which lead to aliasing when downsampling and excessive low pass filtering when upsampling. This problem has been removed in the version used here. The polyphase effect has been improved, too, which gives much better results than previous versions.

To address the signal-processing issues, I performed a series of experiments using Sox 12.17.3 and Matlab under Linux.  The script of these experiments is available, should you wish to recreate them.

These experiments are intended to find the effects of these differing techniques on the spectrum of the signal processed, and what drawbacks (if any) exist.

Table of Contents:

  1. Upsampling

  2.  
    1. Random Noise
      1. Linear Interpolation
      2. Band-Limited Interpolation
      3. Polyphase Filtering

      4.  
    2. Sine Waves
      1. Linear Interpolation
      2. Band-Limited Interpolation
      3. Polyphase Filtering

      4.  
  3. Downsampling

  4.  
    1. Random Noise
      1. Linear Interpolation
      2. Band-Limited Interpolation
      3. Polyphase Filtering

      4.  
    2. Sine Waves
      1. Linear Interpolation
      2. Band-Limited Interpolation
      3. Polyphase Filtering

      4.  
  5. Summary

Ideal Sample-Rate Conversion

Under ideal mathematical conditions, sample-rate conversion from a lower sampling rate to a higher sampling rate creates the original spectrum of the signal at the new sampling rate with no energy in the frequency domain between the original highest frequency (Forig / 2) and the new highest frequency (Fnew / 2), where Forig and Fnew are the orignal and new sampling rates. There is no loss of information from the original signal.

Sample-rate conversion from a higher sampling rate to a lower sampling rate removes signal energy from Fnew / 2 to Forig / 2, thereby reducing the signal content, and creates a new signal at the new sampling rate.

To evaluate different sample-rate conversion techniques, the output of each is compared to this ideal. The key features for up-conversion are:


Upsampling From 8 kHz to 44.1 kHz

The experiment uses two mathematical signal sources: random noise, which has a "flat" spectrum (constant energy at all frequencies), and a sequence of sine waves spaced at 500 Hz, which allows a much better feel for the shape of filter distortions and also more readily shows the presence of aliasing artifacts (aliasing is the presence of energy from high frequency regions in low frequency regions due to the removal or insertion of samples).

The signal source is originally at 8 kHz sampling rate, and written to a 16-bit, single channel .WAV file (PCM encoding). Sox is used to perform sample rate conversion to 44.1 kHz using each of the sample-rate conversion algorithms.

Each of the resulting up-sampled files is loaded into Matlab using a function that reads the binary format into memory, and then a 64k-point FFT is performed. The resulting spectrum is plotted in dB against the new sampling rate.

Random Noise

Results for all three tests using random noise appear in the figure below. In all plots shown here the least significant bit corresponds to 0 dB.





The upper limit of data at an 8 kHz sampling rate is 4 kHz, which is shown by the green line. Each method leaves some amount of signal energy above 4 kHz, which is improper as the original signal had no energy at these frequencies. On the other hand, the truncation error when reducing the sample values to 16-bit integers produces a noise floor which is about -40 dB. No method could ever become better than that, of course, if testing with wave files.

Looking at the stop band rejection the "resample" and the "polyphase" algorithm deliver a very low noise floor above 4 kHz, while linear interpolation produces a lot of noise above 4 kHz. Due to the conservative default settings the cut off starts at 3 kHz using "resample". The "polyphase" algorithm obviously was improved in the newer versions of sox and leaves more energy below 4 kHz. The "linear" algorithm is the worst, as it has decreased by 10 dB at 4 kHz and continues down to 20 dB at 12 kHz.

Each of these signals appears in more detail in the following figures.

Linear Interpolation and Random Noise

The following figure shows the performance of linear interpolation on random noise, upsampling from 8 kHz to 44.1 kHz.





This shape is very characteristic and can be mathematically derived, but we won't do so. The important things to note are that signal degradation starts almost immediately (the spectrum should be flat) and that a large "hump" of energy appears around 12 kHz, which will sound like a quiet high-pitched noise. As it is only 25 dB down from the maximum signal energy, this is significant.

Band-Limited Interpolation and Random Noise

The following figure shows the performance of band-limited interpolation on random noise, upsampling from 8 kHz to 44.1 kHz.






Band-limited interpolation is implemented by the "resample" effect, which is also the default mechanism of Sox. It used to have some bugs, which seem to be removed: The result shows extremly low noise in the stop band (4 kHz - 22 kHz) and an acceptable low pass cut off starting at 3 kHz.

Polyphase Filtering and Random Noise

The following figure shows the performance of polyphase filtering on random noise, upsampling from 8 kHz to 44.1 kHz.





Polyphase filtering is implemented by the "polyphase" special effect. The original spectrum is flat up to 3.7 kHz, and then is more than 80 dB down by 4 kHz. There is essentially no aliased energy, it is more than 80 dB down above 4 kHz.

Sine Waves

An input signal consisting of sine waves spaced at 500 Hz (with an addition sine wave at 3.9 kHz) was also processed. The input spectrum, for comparison, appears below. For this spectrum, the original 64-bit double precision values were taken, otherwise the noise floor would have been much higher.





Results for all three tests appear in the figure below.





The data here is similar to that obtained from the random noise, but the presence of aliased signals is more apparant. The following subsections look at each mechanism in turn.

Linear Interpolation and Sine Waves

Linear interpolation introduces a fairly large amount of aliased signals at higher frequencies, as shown in the figure below.





The signal is down 3dB at 2.5 kHz, and down 25 dB by 12 kHz, but strong signals every 500 Hz still appear in the band between 4 and 8 kHz.

Band-Limited Interpolation and Sine Waves

Band-Limited Interpolation again shows its good performance with this input signal, as seen below.





Essentially no signal energy occurred in the stop band and the low pass behaviour is as expected from the white noise test.

Polyphase Filtering and Sine Waves

Polyphase filtering again removes a minute fraction of the original signal's content, and prevents much aliasing, as seen below.





The signal energy at 3.5 kHz is unchanged, only at 3.9 kHz a little energy is removed, and the noise floor above 4 kHz is very low.


Downsampling From 44.1 kHz to 8 kHz

The opposite process of upsampling is downsampling, in which signal energy is removed. The experiment uses two mathematical signal sources: random noise, which has a "flat" spectrum (constant energy at all frequencies), and a sequence of sine waves spaced at 500 Hz, which allows a much better feel for the shape of filter distortions and also more readily shows the presence of aliasing artifacts (aliasing is the presence of energy from high frequency regions in low frequency regions due to the removal or insertion of samples).

The signal source is originally at 44.1 kHz, and written to a 16-bit, single channel .WAV file (PCM encoding). Sox is used to perform sample rate conversion to 8 kHz using each of the sample-rate conversion algorithms.

Each of the resulting up-sampled files is loaded into Matlab using a function that reads the binary format into memory, and then a 64k-point FFT is performed. The resulting spectrum is plotted in dB against the new sampling rate.

Random Noise

Random noise is difficult to down-sample, primarily because under ideal conditions it looks the same (flat across the spectrum). The presence of aliasing is also difficult to determine. Nevertheless, some results from it are meaningful.



Linear Interpolation and Random Noise

The linear interpolation of random noise actually tells us nothing because it results in a flat spectrum, which is what it's supposed to be. As we can't extract any information about aliasing from this, it's not terribly amusing.



Band-Limited Interpolation and Random Noise

Band-limited interpolation again shows the features already pointed out for the upsampling case.





There is about 80 dB of rejection at 4 kHz which starts at 3 kHz.

Polyphase Filtering and Random Noise

The polyphase filter was much improved compared to the last test, the spectrum is flat up 3.7 kHz wich a sharp fall off of about 80 dB.





Sine Waves

An input signal consisting of sine waves spaced at 500 Hz was also processed. The input spectrum, for comparison, appears below.





Results for all three tests appear in the figure below.





Some aliasing is detectable, which will be addressed with each method below.

Linear Interpolation and Sine Waves

The linear interpolator, as suspected from the random noise, introduces a lot of aliasing. Aliasing is noticeable in this context as sine wave peaks at a frequency other than 500 Hz (or reasonably close to 500 Hz, as the peak is likely to spread a bit).





Peaks are observed at intervals of 125 Hz at varying amplitudes, which are aliased from the original signal. If the linear interpolator does a good job, these are much lower in energy; since it's not doing so well, these are high in energy. Result: a noisy output.

Band-Limited Interpolation and Sine Waves

Band-limited interpolation again does well in this instance.




Almost no aliasing is evident. Residual peaks observed are more than 70 dB lower than the main signal energy.

Polyphase Filtering and Sine Waves

The polyphase filtering again shows excellent performance.





Summary

The polyphase filter clearly has the best interpolation performance, closely followed by the bandlimited interpolation, both of them leaving linear interpolation far behind. If your only concern is signal quality, the polyphase effect is the best choice.

For less critical quality demands, e.g. resampling between high sample rates (both actual and target sample rates are above say 32 kHz) or on a slow machine the time do the resampling might be interesting. On my machine (Athlon 800 MHz), downsampling 270 seconds of music at 44100 Hz sample rate to 32000 Hz took as little as 2 seconds with the linear interpolation, 26 seconds with bandlimited interpolation and 64 seconds with polyphase. So the impatient using slow machines (poor you ;-)) might consider bandlimited interpolation, as the difference in the results will be hardly audible.

If you don't care about sound quality or you have to clock your cpu by hand, use linear interpolation.


Written by K. Bradley on Septebmer 9, 1998.

This page relies mostly on the original page by Kevin Bradley, which was discontinued. I modified the parts concerning the improved resample and polyphase effects.

Andreas Wilde, 19. Dec. 2003

email: wilde@eas.iis.fhg.de