In this article, I will be going over how to extract all of the time-domain features of an audio file using Librosa a Python library for music / audio processing.
Audio Processing has many applications in the Machine Learning field. With the great rise of voice-assistance devices such as Alexa, Siri and Google Home we can expect to see great advancements in this field.
What are Audio Features?
Audio features can be divided into two categories: Time-Domain and Frequency-Domain. In this article we will talk about the time-domain features.
Amplitude Envelope of an audio signal consists of the maximum value of amplitude from each frame. This audio feature is sensitive to outliers and gives a general idea about the loudness of the file.
Root Mean Square Energy (rmse)
As the title explains, in this section we will find and plot the root mean square energy of an audio sample. Energy of an audio file is related to the loudness, hence it is an time domain feature.
Zero-Crossing Rate (ZCR)
The zero-crossing rate measures the number of times an audio signal crosses over from positive to negative or from negative to positive. ZCR can be used to differentiate between voiced and un-voiced samples.
For more information refer to the librosa documentation.
In the next article I will go over on how to extract the frequency domain audio features.