Escaping Praat scripting using python
A walk through on how to use praat from python.
What is Praat
Praat is a computer programme to analyze speech signals. The software interface provides a GUI allowing insightful visualization of features, such as pitch, formant, intensity, spectrogram, etc. On top of this, it also has a feature to write script enabling analysis of a pile of speech files. So, in short, it is a popular and useful research tool for speech analysis. But I haven't found the scripting language of Praat easy to understand. Actually, I have been scared to read the praat scripts also because the syntax looks alien to me. This is not to scare you as there are many people using it beautifully.
Parselmouth comes to rescue
Then one day I came across parselmouth, and this has given me some peace! Quating from its website:
Parselmouth is unique in its aim to provide a complete and Pythonic interface to the internal Praat code. While other projects either wrap Praat’s scripting language or reimplementing parts of Praat’s functionality in Python, Parselmouth directly accesses Praat’s C/C++ code (which means the algorithms and their output are exactly the same as in Praat) and provides efficient access to the program’s data, but also provides an interface that looks no different from any other Python library.
In this notebook I will provide some code snippets to analyze a speech file using parselmouth (and thus, praat).
#collapse
import numpy as np
import parselmouth
import matplotlib.pyplot as plt
[fs, x] = wavfile.read('./my_sounds/count.wav')
x = x/np.max(np.abs(x))
t = np.arange(0,len(x))/fs
# for PRAAT
sr = 16000
hop_dur = .01
num_form = 3
max_form_freq = 4500
# call parselmouth to load sound
snd = parselmouth.Sound('./my_sounds/count.wav')
# extract features
pitch = snd.to_pitch(time_step=hop_dur) # pitch track
harm = snd.to_harmonicity(time_step=hop_dur) # harmonic-to-noise ratio
form = snd.to_formant_burg(time_step=hop_dur,max_number_of_formants=num_form, maximum_formant = max_form_freq,
window_length=win_dur, pre_emphasis_from=50.0) # formants
intensity = snd.to_intensity(minimum_pitch = 75.0, time_step=hop_dur,subtract_mean=False) # intensity
spectrogram = snd.to_spectrogram(window_length=0.04)
times = pitch.ts() # analysis window time instants
pitch_vals = []
harm_vals = []
form_1_vals = []
form_2_vals = []
form_3_vals = []
inten_vals = []
for dt in times:
pitch_vals.append(pitch.get_value_at_time(dt))
harm_vals.append(harm.get_value(dt))
form_1_vals.append(form.get_value_at_time(1,dt))
form_2_vals.append(form.get_value_at_time(2,dt))
form_3_vals.append(form.get_value_at_time(3,dt))
inten_vals.append(intensity.get_value(dt))
Lets visualize some of the features
fig = plt.subplots(figsize=(12,8))
ax = plt.subplot(2,2,1)
ax.plot(times,pitch_vals)
ax.set_xlabel('TIME [in s]')
ax.set_ylabel('PITCH FREQ [in Hz]')
plt.xticks(fontsize=13)
plt.yticks(fontsize=13)
plt.xlim(times[0],times[-1])
ax.grid(True)
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
ax = plt.subplot(2,2,2)
ax.plot(times,inten_vals)
ax.set_xlabel('TIME [in s]')
ax.set_ylabel('INTENSITY [in dB]')
plt.xticks(fontsize=13)
plt.yticks(fontsize=13)
plt.xlim(times[0],times[-1])
ax.grid(True)
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
ax = plt.subplot(2,2,3)
ax.plot(times,harm_vals)
ax.set_xlabel('TIME [in s]')
ax.set_ylabel('HARMONIC-TO-NOISE RATIO [in dB]')
plt.xlim(times[0],times[-1])
plt.xticks(fontsize=13)
plt.yticks(fontsize=13)
ax.grid(True)
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
plt.ylim(0,50)
ax = plt.subplot(2,2,4)
ax.plot(times,np.array(form_1_vals)/1e3)
ax.plot(times,np.array(form_2_vals)/1e3)
ax.plot(times,np.array(form_3_vals)/1e3)
ax.set_xlabel('TIME [in s]')
ax.set_ylabel('FORMANT FREQ [in kHz]')
plt.xlim(times[0],times[-1])
plt.xticks(fontsize=13)
plt.yticks(fontsize=13)
ax.grid(True)
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
fig,ax = plt.subplots(1,1,figsize=(12,5))
dynamic_range = 70
X, Y = spectrogram.x_grid(), spectrogram.y_grid()
sg_db = 10 * np.log10(spectrogram.values)
im = ax.pcolormesh(X, Y, sg_db, vmin=sg_db.max() - dynamic_range, cmap='RdBu_r')
fig.colorbar(im, ax=ax)
plt.ylim([spectrogram.ymin, spectrogram.ymax])
plt.xlabel("TIME [in s]")
plt.ylabel("FREQUENCY [in Hz]")
Lets also interpret these features.
- Pitch is a perceived acoustic feature which relates to the vibratory frequency of the vocal folds.
- Intensity of the perceived correlate of loudness of speech
- Harmonic-to-noise ratio relates to the quality of the speech, example, hoarse (low HNR)
- Formant frequencies quantify the resonance frequecies of the vocal tract while we speak. You can read more about these features in the context of Praat at its website.
Coming to the above plots, we can see that our speaker usually has a pitch close to 140 Hz. So we can bet the speaker is a male. The speaker is loud enough, and the HNR around 15 dB indicates a good quality speech. All these obaservations get further strengthened on seeing the narrowband spectrogram.