Project Plan
Below, we've detailed our current plan for improving our filter and accuracy for language detection.
-
Filter the DFT coefficients around frequencies where English and Mandarin are distinctly different. By analyzing the DFT coefficient magnitudes as various frequency ranges, we can see that English has greater peak magnitudes around particular frequencies when compared to Mandarin and vice-versa.
-
Develop the algorithm to produce the Mel-Frequency Cepstral Coefficients (MFCCs) and extract additional features. From the MFCCs, we will conduct the same analysis that we do with the FFTs. There is a Matlab function that will compute the MFCCs of our data with the following inputs:
-
The speech signal (as a vector)
-
Sampling frequency (Hz)
-
Frame duration (ms)
-
Frame shift (ms)
-
Preemphasis coefficient
-
Frequency range (Hz) for filterbank analysis
-
The number of filterbank channels
-
The number of cepstral coefficients (including the 0th coefficient)
-
The liftering parameter
-
-
Develop the algorithm to identify the frequency of differences in articulated sounds
-
The 'R' sound in our samples, as the 'R' sound isn't in Mandarin
-
Fricative voiced sounds (sibilants) and Affricate sounds (pitch differences)
-
-
Implement a filter in a n attempt to identify individual syllables. From this, we will be able to extract the following features:
-
The number of syllables per second
-
Peak amplitudes and frequently seen patterns
-