Spectrogram peak detection with scipy
Table of Contents
Figure 1: Spectrogram peak detection
Introduction
In this post I share a peak detection algorithm I came up with whilst
working on the Birdclef 2023 Kaggle competition. The dataset is
16,941 recordings of Eastern African bird species, which I converted
to spectrograms using a Fourier transform.
A spectrogram is a visual representation of the spectral content of a signal, typically represented as a graph of time versus frequency 1. It is commonly used in signal processing, acoustics, and speech analysis to highlight patterns and structures in a sound wave.
Peak detection involves identifying the points of maximum amplitude in a signal or spectrum. This is a common technique used in signal processing and analysis, particularly in music and audio applications where identifying peaks can help to identify characteristics such as beats and harmonics. The process usually involves finding the local maxima, filtering out noise and selecting the most prominent peaks based on criteria such as their amplitude and frequency.
The imports for this project are:
import numpy as np import matplotlib.pyplot as plt from statistics import NormalDist from sklearn.decomposition import PCA from librosa import load as load_audio from librosa import stft, amplitude_to_db from librosa.display import specshow from scipy.ndimage import label as label_features from scipy.ndimage import maximum_position as extract_region_maximums
Creating a spectrogram from an audio file
This is quite straight forward in Python, all we need is
librosa. The load function will return the samples as a numpy
array. Then, stft will do a short-time Fourier transform 2 and
amplitude_to_db will produce the spectrogram, which is also a numpy
array.
audio_file_path = "file path ..." desired_sample_rate = 32000 # load samples into ndarray samples, _actual_sample_rate = load_audio( path=audio_file_path, sr=desired_sample_rate ) # do transformations # fourier transform -> spectrogram x = samples x = stft(x) x = amplitude_to_db(abs(x)) # plot plt.figure(figsize=(14, 5)) specshow(x, sr=desired_sample_rate, x_axis="time", y_axis="hz") plt.colorbar()
The specshow helper function will visualise the spectrogram as a
heat map. The x axis is time, y is frequency, and z is intensity
in decibels. It is important that the sample rate argument sr is
correct, otherwise the time axis will not line-up with the original
recording. Also, notice that the y axis starts at the bottom left,
whereas the ndarray representation starts at the top left. The
specshow function is flipping the array so that the visualisation is
not upside down.
Figure 2: Spectrogram visualisation
I also experimented with principal component analysis (PCA), projecting the frequencies onto 10 principal components. Recall, PCA is a statistical technique used to reduce the complexity of high-dimensional datasets whilst retaining most of their variation 3. It transforms data into a new set of uncorrelated variables called principal components, making it easier to identify underlying patterns and structures.
Note: this is not part of the peak detection algorithm.
z = PCA(n_components=10).fit_transform(x)
Figure 3: Spectrogram visualisation after PCA
The above plot is clearly a lower resolution representation of the
previous - you can clearly see the 10 components in the y axis plus
the frequency spikes. However, all of the noise has been removed.
The following spectrograms are cropped so that the peaks are easier to
see. They are also plotted with pyplot's matshow rather than
specshow, therefore they are upside down and have a different colour
scheme. However, the data is the same.
fig, ax = plt.subplots(figsize=(16, 8)) ax.matshow(x1[:500, :600])
Figure 4: Spectrogram visualisation (cropped)
The peak detection algorithm
The goal of the algorithm is to identify areas of high intensity (a
peak) in the spectrogram. A peak as a high value and is surrounded by
low value neighbours. The algorithm has a single threshold parameter
which controls its sensitivity. In this example, I set the threshold
to 2.75.
peak = (time, frequency, intensity)
The steps are as follows:
- filter out zero values
- calculate mean and standard deviation of intensities
- assign a z score to each intensity in the spectrogram matrix
- create a label matrix from the frequency intensities that are above threshold parameter
- for each isolated region in the label matrix, extract the max intensity value from the original spectrogram
- create a list of peak values and their locations
The implementation:
spectrogram = x # number of standard deviations away from the mean threshold = 2.75 # remove zero values flattened = np.matrix.flatten(spectrogram) filtered = flattened[flattened > np.min(flattened)] # create a normal distribution from frequency intensities # then map a zscore onto each intensity value ndist = NormalDist(np.mean(filtered), np.std(filtered)) zscore = np.vectorize(lambda x: ndist.zscore(x)) zscore_matrix = zscore(spectrogram) # create label matrix from frequency intensities that are # above threshold mask_matrix = zscore_matrix > threshold labelled_matrix, num_regions = label_features(mask_matrix) label_indices = np.arange(num_regions) + 1 # for each isolated region in the mask, identify the maximum # value, then extract it position peak_positions = extract_region_maximums( zscore_matrix, labelled_matrix, label_indices) # finally, create list of peaks (time, frequency, intensity) peaks = [(x, y, spectrogram[y, x]) for y, x in peak_positions]
The following visualisation shows the previous spectrogram plus its peaks overlaid as a scatter plot.
fig, ax = plt.subplots(figsize=(16, 8)) ax.scatter( x=[p[0] for p in peaks], y=[p[1] for p in peaks], s=1.5, color="red" ) ax.matshow(x[:500, :600])
Figure 5: Spectrogram with peaks overlay
The points are not perfect - especially if there is a lot of
background noise - some threshold fine-tuning is necessary. However, I
have found 2.75 standard deviations to be good enough for most audio
files in my dataset.
Some more examples
Figure 6: Peak detection example 1
Figure 7: Peak detection example 2
Figure 8: Peak detection example 3