<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article  PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="3.0" xml:lang="en" article-type="research article"><front><journal-meta><journal-id journal-id-type="publisher-id">JCC</journal-id><journal-title-group><journal-title>Journal of Computer and Communications</journal-title></journal-title-group><issn pub-type="epub">2327-5219</issn><publisher><publisher-name>Scientific Research Publishing</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.4236/jcc.2015.36001</article-id><article-id pub-id-type="publisher-id">JCC-56677</article-id><article-categories><subj-group subj-group-type="heading"><subject>Articles</subject></subj-group><subj-group subj-group-type="Discipline-v2"><subject>Computer Science&amp;Communications</subject></subj-group></article-categories><title-group><article-title>
 
 
  Robust Speech Recognition System Using Conventional and Hybrid Features of MFCC, LPCC, PLP, RASTA-PLP and Hidden Markov Model Classifier in Noisy Conditions
 
</article-title></title-group><contrib-group><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>eton</surname><given-names>Z. Këpuska</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref><xref ref-type="corresp" rid="cor1"><sup>*</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Hussien</surname><given-names>A. Elharati</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref><xref ref-type="corresp" rid="cor1"><sup>*</sup></xref></contrib></contrib-group><aff id="aff1"><addr-line>Electrical &amp;amp; Computer Engineering Department, Florida Institute of Technology, Melbourne, FL, USA</addr-line></aff><author-notes><corresp id="cor1">* E-mail:<email>vkepuska@fit.edu(EZK)</email>;<email>helharati2013@my.fit(HAE)</email>;</corresp></author-notes><pub-date pub-type="epub"><day>26</day><month>05</month><year>2015</year></pub-date><volume>03</volume><issue>06</issue><fpage>1</fpage><lpage>9</lpage><history><date date-type="received"><day>19</day>	<month>April</month>	<year>2015</year></date><date date-type="rev-recd"><day>accepted</day>	<month>23</month>	<year>May</year>	</date><date date-type="accepted"><day>26</day>	<month>May</month>	<year>2015</year></date></history><permissions><copyright-statement>&#169; Copyright  2014 by authors and Scientific Research Publishing Inc. </copyright-statement><copyright-year>2014</copyright-year><license><license-p>This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0/</license-p></license></permissions><abstract><p>
 
 
  In recent years, the accuracy of speech recognition (SR) has been one of the most active areas of research. Despite that SR systems are working reasonably well in quiet conditions, they still suffer severe performance degradation in noisy conditions or distorted channels. It is necessary to search for more robust feature extraction methods to gain better performance in adverse conditions. This paper investigates the performance of conventional and new hybrid speech feature extraction algorithms of Mel Frequency Cepstrum Coefficient (MFCC), Linear Prediction Coding Coefficient (LPCC), perceptual linear production (PLP), and RASTA-PLP in noisy conditions through using multivariate Hidden Markov Model (HMM) classifier. The behavior of the proposal system is evaluated using TIDIGIT human voice dataset corpora, recorded from 208 different adult speakers in both training and testing process. The theoretical basis for speech processing and classifier procedures were presented, and the recognition results were obtained based on word recognition rate.
 
</p></abstract><kwd-group><kwd>Speech Recognition</kwd><kwd> Noisy Conditions</kwd><kwd> Feature Extraction</kwd><kwd> Mel-Frequency Cepstral Coefficients</kwd><kwd> Linear Predictive Coding Coefficients</kwd><kwd> Perceptual Linear Production</kwd><kwd> RASTA-PLP</kwd><kwd> Isolated Speech</kwd><kwd> Hidden Markov Model</kwd></kwd-group></article-meta></front><body><sec id="s1"><title>1. Introduction</title><p>Automatic speech recognition (ASR) is an interactive system used to make the speech machine recognizable. ASR as shown in the block diagram in <xref ref-type="fig" rid="fig1">Figure 1</xref> consists of two main parts. The first part, the signal modeling, known as front-end is used to extract the acoustic features from input speech signal using specific feature extraction algorithm. The second part, the statistical modeling, known as back-end, is used to match these features with reference model to generate the recognition result using one templet or classifier techniques [<xref ref-type="bibr" rid="scirp.56677-ref1">1</xref>] , such as Hidden Markov Models (HMMs), Artificial Neural Network (ANN), Dynamic Time Warping (DTW), or Vector Quantization (VQ). Front-end is used to extract input speech signal into several short frames. Typically, each frame between 10 to 30 ms length reflects a number of useful physical characteristics of the input signal. The same processes are repeated for all subsequent frames. A new frame is overlapped to its previous frame typically ~10 ms to generate sequence of feature vectors and then passes to the next back-end part to select the most likely words out of all trained words as possible words. Back-end applies statistical modelling which is used to calculate the maximum likelihood based on reference models to select the most likely sequence of words. The performance of automatic speech recognition system based on acoustic model is totally dependent on the condition of training and testing data [<xref ref-type="bibr" rid="scirp.56677-ref2">2</xref>] . This means that the lack of noise robustness is the largely unsolved problem in automatic speech recognition research today. Indeed, the main challenges involved in designing speech recognition system are selecting the signal modelling, statistical modelling, and noise. The focus of this study is to experimentally evaluate the effectiveness of noise on different conventional and hybrid feature extractions algorithm using MFCC, LPCC, PLP, and RASTA-PLP through using multivariate HMM classifier and TIDIGIT speech corpora. This paper is organized as follows: Section 1, introduction; Section 2 describes the speech modeling; Section 3, details of different feature extraction techniques that are discussed, followed by a description of Hidden Markov Model as statistical modeling classifier in Section 4. Sections 4 and 5 include the result and the conclusion of the comparison done on all the eight above mentioned methods of speech extraction algorithms respectively.</p><sec id="s1_1"><title>2. Speech Pre-Processing</title><p>Sampling, pre-emphasis, frame blocking and windowing are the common steps needed to prepare input speech signal in order to extract the features [<xref ref-type="bibr" rid="scirp.56677-ref3">3</xref>] .</p></sec><sec id="s1_2"><title>2.1. Pre-Emphasis</title><p>The input speech signal has been digitally disturbed and corrupted by adding different values of realistic noises at SNRs ranging from 30dB to5dB as shown in <xref ref-type="fig" rid="fig2">Figure 2</xref> using v_addnoise.m Mathlab function.</p></sec><sec id="s1_3"><title>2.2. Signal-to-Noise Ratio Estimation</title><p>First order High-pass filter (FIR) was used to flatten the speech spectrum and compensate for the unwanted high frequency part of the speech signal [<xref ref-type="bibr" rid="scirp.56677-ref4">4</xref>] . Equation (1) describes the transfer function of FIR filter in z-domain</p><disp-formula id="scirp.56677-formula50"><label>(1)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/1-1730211x5.png"  xlink:type="simple"/></disp-formula><fig id="fig1"  position="float"><label><xref ref-type="fig" rid="fig1">Figure 1</xref></label><caption><title> Speech recognition system</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/1-1730211x6.png"/></fig><p><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/1-1730211x7.png" xlink:type="simple"/></inline-formula>: input speech signal.</p><p><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/1-1730211x8.png" xlink:type="simple"/></inline-formula>: previous speech signal.</p><p>A: pre-emphasis factor which chosen as 0.975.</p></sec><sec id="s1_4"><title>2.3. Frame Blocking and Windowing</title><p>In order to ensure the smoothing transition of estimated parameters from frame to frame, pre-emphasized signal y[n] is blocked into 200 samples with 25 ms frame long and 10 ms frame shift. In addition to that hamming window as shown in Equation (2) was selected and applied on each frame in order to minimize the signal discontinuities at the beginning and the end of each frame as shown in <xref ref-type="fig" rid="fig3">Figure 3</xref>.</p><disp-formula id="scirp.56677-formula51"><label>(2)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/1-1730211x9.png"  xlink:type="simple"/></disp-formula><p>n: windowed speech signal.</p><p>N: sampled speech signal.</p></sec></sec><sec id="s2"><title>3. Speech Feature Extraction</title><p>Feature extraction is used to convert the acoustic signal into a sequence of acoustic feature vectors that carry a good representation of input speech signal. These features are then used to classify and predict new words. To increase the feature evidence of dynamic coefficients, delta and delta delta can be devoted by adding the first and second derivative approximation to feature parameters [<xref ref-type="bibr" rid="scirp.56677-ref4">4</xref>] . In this research, several conventional and hybrid</p><fig id="fig2"  position="float"><label><xref ref-type="fig" rid="fig2">Figure 2</xref></label><caption><title> Word seven corrupted by different values of SNR</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/1-1730211x10.png"/></fig><fig id="fig3"  position="float"><label><xref ref-type="fig" rid="fig3">Figure 3</xref></label><caption><title> Hamming window</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/1-1730211x11.png"/></fig><p>feature extraction techniques were designed and tested using Matlab software to generate 39 parameter coefficients.</p><sec id="s2_1"><title>3.1. Mel Frequency Cepstral Coefficients (MFCC)</title><p>MFCC is the most dominant method used to extract spectral features. MFCCs analysis is started by Appling Fast Fourier Transform (FFT) on the frame sequence in order to obtain certain parameters, converting the power- spectrum to a Mel-frequency spectrum, taking the logarithm of that spectrum, and computing its inverse Fourier transform [<xref ref-type="bibr" rid="scirp.56677-ref5">5</xref>] as shown in <xref ref-type="fig" rid="fig4">Figure 4</xref>.</p></sec><sec id="s2_2"><title>3.2. Linear Prediction Coding Coefficients (LPCC)</title><p>LPCC is one of the earliest algorithms that worked at low bit-rate and represented an attempt to mimic the human speech and was derived using auto-correlation method [<xref ref-type="bibr" rid="scirp.56677-ref6">6</xref>] . Autocorrelation technique is almost an exclusively used method to find the correlation between the signal and itself by auto-correlating each frame of the windowed signal using Equation (3) as shown in <xref ref-type="fig" rid="fig5">Figure 5</xref>.</p><disp-formula id="scirp.56677-formula52"><label>(3)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/1-1730211x12.png"  xlink:type="simple"/></disp-formula><p><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/1-1730211x13.png" xlink:type="simple"/></inline-formula>Length of the window.</p><p><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/1-1730211x14.png" xlink:type="simple"/></inline-formula>Windowed segment.</p></sec><sec id="s2_3"><title>3.3. Perceptual Linear Prediction (PLP)</title><p>Several spectral characteristics were calculated in order to match human auditory system. PLP computation was used as an autoregressive all-pole model to derive a more auditory-like spectrum based on linear LP analysis of speech. This kind of feature extraction was reached by making spectral analysis, frequency band analysis, equal- loudness pre-emphasis, intensity-loudness power law, and autoregressive modeling [<xref ref-type="bibr" rid="scirp.56677-ref7">7</xref>] as shown in <xref ref-type="fig" rid="fig6">Figure 6</xref>.</p><fig id="fig4"  position="float"><label><xref ref-type="fig" rid="fig4">Figure 4</xref></label><caption><title> Mel Frequency Cepstral Coefficients (MFCC)</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/1-1730211x15.png"/></fig><fig id="fig5"  position="float"><label><xref ref-type="fig" rid="fig5">Figure 5</xref></label><caption><title> Linear Prediction Coding Coefficients (LPCC)</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/1-1730211x16.png"/></fig></sec><sec id="s2_4"><title>3.4. RASTA-PLP</title><p>A special band-pass filter was added to each frequency sub-band in traditional PLP algorithm in order to smooth out short-term noise variations and to remove any constant offset in the speech channel. <xref ref-type="fig" rid="fig7">Figure 7</xref> shows the most processes involved in RASTA-PLP which include calculating the critical-band power spectrum as in PLP, transforming spectral amplitude through a compressing static nonlinear transformation, filtering the time trajectory of each transformed spectral component by the band pass filter using Equation (4), transforming the filtered speech via expanding static nonlinear transformations, simulating the power law of hearing, and finally computing an all-pole model of the spectrum, as in the PLP [<xref ref-type="bibr" rid="scirp.56677-ref8">8</xref>] .</p><disp-formula id="scirp.56677-formula53"><label>(4)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/1-1730211x17.png"  xlink:type="simple"/></disp-formula></sec><sec id="s2_5"><title>3.5. Hybrid Feature Extraction.</title><p>In order to obtain new features, hybrid algorithms are developed using a combination of previous feature extraction methods MFCC, LPC, PLP, and RASTA-PLP. Each of the previous features were designed to generate 13 coefficient parameters as shown in <xref ref-type="fig" rid="fig8">Figure 8</xref>. In each experiment, three kind of feature extractions were selected to provide 39 coefficient parameters in one vector as follows:</p><p>1) 13 MFCC + 13 LPC + 13 PLP.</p><p>2) 13 MFCC + 13 LPC + 13 RASTA-PLP.</p><p>3) 13 MFCC + 13 PLP + 13 RASTA-PLP.</p><p>4) 13 LPC + 13 PLP + 13 RASTA-PLP.</p></sec></sec><sec id="s3"><title>4. Statistical Modeling</title><p>Powerful statistical tools are used to test the previous feature extraction algorithms. HMM classifier is selected due to the ability of modeling non-linear aligning speech and estimating the model parameters [<xref ref-type="bibr" rid="scirp.56677-ref9">9</xref>] is to classify feature vectors and to predict unknown words based on evaluation, learning, and decoding processes. HMM is a finite-state machine characterized by a set of parameters hidden states, observations, transition probabilities,</p><fig id="fig6"  position="float"><label><xref ref-type="fig" rid="fig6">Figure 6</xref></label><caption><title> Perceptual Linear Prediction (PLP)</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/1-1730211x18.png"/></fig><fig id="fig7"  position="float"><label><xref ref-type="fig" rid="fig7">Figure 7</xref></label><caption><title> RASTA-PLP</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/1-1730211x19.png"/></fig><fig id="fig8"  position="float"><label><xref ref-type="fig" rid="fig8">Figure 8</xref></label><caption><title> Hybrid feature extraction algorithm</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/1-1730211x20.png"/></fig><p>emission probabilities, and the initial state probabilities.</p><sec id="s3_1"><title>4.1. Evaluation</title><p>Probability of the observation sequence given in the model was computed using forward-backward dynamic programming. This algorithm was used to compute the probability that any sequence of states has produced the sequence of observations using Equation (5) as shown in <xref ref-type="fig" rid="fig9">Figure 9</xref>.</p><disp-formula id="scirp.56677-formula54"><label>(5)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/1-1730211x21.png"  xlink:type="simple"/></disp-formula></sec><sec id="s3_2"><title>4.2. Learning</title><p>In this step, all the model parameters (λ), mean, variance, transition probability matrix, and Gaussian mixtures were re-estimated using Baum-Welch algorithm as shown in <xref ref-type="fig" rid="fig1">Figure 1</xref>0. Baum-Welch is used to learn and encode the characteristics of the observation sequence that best describes the process in order to recognize a similar observation sequence in the future [<xref ref-type="bibr" rid="scirp.56677-ref9">9</xref>] . The training model can be formed as Equation (6).</p><disp-formula id="scirp.56677-formula55"><label>(6)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/1-1730211x22.png"  xlink:type="simple"/></disp-formula></sec><sec id="s3_3"><title>4.3. Decoding</title><p>In order to find the state sequence that is most likely to have produced an observation sequence, Viterbi algorithm was used to find the optimal scoring path of state sequence as shown in <xref ref-type="fig" rid="fig1">Figure 1</xref>1. The maximum probability of state sequences was defined in Equation (7), and the optimal scoring path of state sequence selected was calculated using Equation (8).</p><disp-formula id="scirp.56677-formula56"><label>(7)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/1-1730211x23.png"  xlink:type="simple"/></disp-formula><disp-formula id="scirp.56677-formula57"><label>(8)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/1-1730211x24.png"  xlink:type="simple"/></disp-formula></sec></sec><sec id="s4"><title>5. Results</title><p>The performance evaluation for the proposal speech recognition model was obtained. This system includes conventional and new hybrid feature extractions of MFCC, LPCC, PLP and RASTA-PLP, was trained and tested in clean [<xref ref-type="bibr" rid="scirp.56677-ref10">10</xref>] and noisy conditions in order to find the maximum word recognition rate through using Multivariate Hidden Markov Model (HMM) classifier. A number of experiments are carried out in different conditions using small vocabulary isolated words based on TIDIGITS corpora. The data consist of 2072 training file and 2486</p><fig id="fig9"  position="float"><label><xref ref-type="fig" rid="fig9">Figure 9</xref></label><caption><title> Forward α and Backward β probabilities in each state</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/1-1730211x25.png"/></fig><fig id="fig10"  position="float"><label><xref ref-type="fig" rid="fig1">Figure 1</xref>0</label><caption><title> Eight dimensional Gaussian distribution</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/1-1730211x26.png"/></fig><fig id="fig11"  position="float"><label><xref ref-type="fig" rid="fig1">Figure 1</xref>1</label><caption><title> Viterbi trellis computation for 8-states HMM</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/1-1730211x27.png"/></fig><p>testing file, including eleven words (zero to nine and the letter o) recorded from 208 adult speaker males and females. For the purpose of fair comparison, all experiments were repeated using the same pre-reemphasis factor 0.975, covered by 25 milliseconds hamming window, and 10 milliseconds overlapping. 256-point Fast Fourier Transform (FFT) was applied to transforming 200 samples of speech from time to frequency domain. The resulting confidence level intervals for the recognition rate obtained in decoding process are listed in <xref ref-type="table" rid="table1">Table 1</xref>. All training data were modeled using 6, 8, 10 and 12 states. Each state has 2 to 8 multi-dimensional Gaussians Hidden Markov Model. The chart in <xref ref-type="fig" rid="fig1">Figure 1</xref>2 summarizes the recognition rate obtained for each feature extraction methods.</p></sec><sec id="s5"><title>6. Conclusions</title><p>The objective of this research is to evaluate the performance of four feature extraction techniques MFCC, LPCC,</p><fig id="fig12"  position="float"><label><xref ref-type="fig" rid="fig1">Figure 1</xref>2</label><caption><title> Recognition rate of conventional and hypered feature extractions</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/1-1730211x28.png"/></fig><table-wrap id="table1" ><label><xref ref-type="table" rid="table1">Table 1</xref></label><caption><title> Recognition rate of different type feature extractions</title></caption><table><tbody><thead><tr><th align="center" valign="middle"  rowspan="3"  >Feature Extraction Methods</th><th align="center" valign="middle"  colspan="5"  >Word Accuracy [%]</th></tr></thead><tr><td align="center" valign="middle"  rowspan="2"  >Clean Speech</td><td align="center" valign="middle"  colspan="4"  >SNR [dB]</td></tr><tr><td align="center" valign="middle" >30</td><td align="center" valign="middle" >20</td><td align="center" valign="middle" >10</td><td align="center" valign="middle" >5</td></tr><tr><td align="center" valign="middle" >MFCC + ∆ + ∆∆</td><td align="center" valign="middle" >98.95</td><td align="center" valign="middle" >98.45</td><td align="center" valign="middle" >97.92</td><td align="center" valign="middle" >94.87</td><td align="center" valign="middle" >90.37</td></tr><tr><td align="center" valign="middle" >LPCC + ∆ + ∆∆</td><td align="center" valign="middle" >99.95</td><td align="center" valign="middle" >98.59</td><td align="center" valign="middle" >97.63</td><td align="center" valign="middle" >93.27</td><td align="center" valign="middle" >82.73</td></tr><tr><td align="center" valign="middle" >PLP + ∆ + ∆∆</td><td align="center" valign="middle" >99.95</td><td align="center" valign="middle" >98.50</td><td align="center" valign="middle" >98.35</td><td align="center" valign="middle" >93.42</td><td align="center" valign="middle" >93.52</td></tr><tr><td align="center" valign="middle" >RASTA-PLP + ∆ + ∆∆</td><td align="center" valign="middle" >99.75</td><td align="center" valign="middle" >98.46</td><td align="center" valign="middle" >95.93</td><td align="center" valign="middle" >91.92</td><td align="center" valign="middle" >88.24</td></tr><tr><td align="center" valign="middle" >LPCC + PLP + RASTA</td><td align="center" valign="middle" >98.93</td><td align="center" valign="middle" >96.32</td><td align="center" valign="middle" >94.10</td><td align="center" valign="middle" >90.52</td><td align="center" valign="middle" >82.73</td></tr><tr><td align="center" valign="middle" >MFCC + LPCC + PLP</td><td align="center" valign="middle" >98.79</td><td align="center" valign="middle" >97.92</td><td align="center" valign="middle" >94.05</td><td align="center" valign="middle" >86.31</td><td align="center" valign="middle" >89.41</td></tr><tr><td align="center" valign="middle" >MFCC + LPCC + RASTA</td><td align="center" valign="middle" >99.12</td><td align="center" valign="middle" >95.50</td><td align="center" valign="middle" >94.00</td><td align="center" valign="middle" >92.45</td><td align="center" valign="middle" >76.69</td></tr><tr><td align="center" valign="middle" >MFCC + PLP + RASTA</td><td align="center" valign="middle" >98.93</td><td align="center" valign="middle" >94.53</td><td align="center" valign="middle" >94.05</td><td align="center" valign="middle" >90.32</td><td align="center" valign="middle" >76.25</td></tr></tbody></table></table-wrap><p>PLP, RASTA-PLP and the combination of them is done by implementing a discrete-observation multivariate HMM-based on isolated word recognizer in MATLAB.</p><p>In clean speech, as shown in <xref ref-type="table" rid="table1">Table 1</xref> and <xref ref-type="fig" rid="fig1">Figure 1</xref>2, the acoustic signals extracted using the individual algorithms LPCC and PLP give the best recognition rate. At 99.95%, LPCC and PLP separately provide the highest rate of recognition rate using 12 states and 4 Gaussian mixtures. Followed by the combination of MFCC, LPCC, and RASTA which provides a 99.12% recognition rate using the same number of states and Gaussian mixtures, the hybrid combination of LPCC, PLP, and RASTA represents the third highest recognition rate at 98.93% using 10 states and 3 Gaussian mixtures. Trailed by the combination of MFCC, LPCC, and PLP with a recognition rate of 98.79% using 10 states and 3 Gaussian mixtures, the lowest of the group, MFCC, provides a 98.95% recognition rate using 12 states and 4 Gaussian mixtures. When adding 30 db of realistic noises at SNR range to the input speech signal, individual LPCC method provides the best recognition rate by 98.59%. With the addition of 20 db, PLP provides the best recognition rate at 98.35%. When adding either 10 db or 5 db, individual MFCC provides the best rate of recognition at 94.87%, and 90.37% respectively.</p></sec><sec id="s6"><title>Cite this paper</title><p>Veton Z. K&#235;puska,Hussien A. Elharati, (2015) Robust Speech Recognition System Using Conventional and Hybrid Features of MFCC, LPCC, PLP, RASTA-PLP and Hidden Markov Model Classifier in Noisy Conditions. Journal of Computer and Communications,03,1-9. doi: 10.4236/jcc.2015.36001</p></sec></body><back><ref-list><title>References</title><ref id="scirp.56677-ref1"><label>1</label><mixed-citation publication-type="other" xlink:type="simple">Kepuska, V. and Klein, T. (2009) A Novel Wake-Up-Word Speech Recognition System, Wake-Up-Word Recognition Task, Technology and Evaluation. Nonlinear Analysis: Theory, Methods &amp; Applications, 71, e2772-e2789.http://dx.doi.org/10.1016/j.na.2009.06.089</mixed-citation></ref><ref id="scirp.56677-ref2"><label>2</label><mixed-citation publication-type="other" xlink:type="simple">Veisi, H. and Sameti, H. (2013) Speech Enhancement Using Hidden Markov Models in Mel-Frequency Domain. Speech Communication, 55, 205-220. http://dx.doi.org/10.1016/j.specom.2012.08.005</mixed-citation></ref><ref id="scirp.56677-ref3"><label>3</label><mixed-citation publication-type="other" xlink:type="simple">Zhu, Q. and Alwan, A. (2000) On the Use of Variable Frame rate Analysis in Speech Recognition. 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing, 3, 1783-1786.</mixed-citation></ref><ref id="scirp.56677-ref4"><label>4</label><mixed-citation publication-type="other" xlink:type="simple">Rabiner, L. R. and Juang, B.-H. (1993) Fundamentals of Speech Recognition. Vol. 14, PTR Prentice Hall, Englewood Cliffs.</mixed-citation></ref><ref id="scirp.56677-ref5"><label>5</label><mixed-citation publication-type="other" xlink:type="simple">Chetouani, M., Gas, B. and Zarader, J. (2002) Discriminative Training for Neural Predictive Coding Applied to Speech Features Extraction. Proceedings of the 2002 International Joint Conference on Neural Networks, 1, 852-857. http://dx.doi.org/10.1109/ijcnn.2002.1005585</mixed-citation></ref><ref id="scirp.56677-ref6"><label>6</label><mixed-citation publication-type="other" xlink:type="simple">Dave, N. (2013) Feature Extraction Methods LPC, PLP and MFCC in Speech Recognition. International Journal for Advance Research in Engineering and Technology, 1.</mixed-citation></ref><ref id="scirp.56677-ref7"><label>7</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Hermansky</surname><given-names> H. </given-names></name>,<etal>et al</etal>. (<year>1990</year>)<article-title>Perceptual Linear Predictive (PLP) Analysis of Speech</article-title><source> The Journal of the Acoustical Society of America</source><volume> 87</volume>,<fpage> 1738</fpage>-<lpage>1752</lpage>.<pub-id pub-id-type="doi"></pub-id></mixed-citation></ref><ref id="scirp.56677-ref8"><label>8</label><mixed-citation publication-type="other" xlink:type="simple">Hermansky, H., Morgan, N., Bayya, A. and Kohn, P. (1991) The Challenge of Inverse-E: The RASTA-PLP Method. 1991 Conference Record of the 25th Asilomar Conference on Signals, Systems and Computers, Pacific Grove, 4-6 November 1991, 800-804. http://dx.doi.org/10.1109/acssc.1991.186557</mixed-citation></ref><ref id="scirp.56677-ref9"><label>9</label><mixed-citation publication-type="other" xlink:type="simple">Dugad, R. and Desai, U. (1996) A Tutorial on Hidden Markov Models. Signal Processing and Artificial Neural Networks Laboratory, Department of Electrical Engineering, Indian Institute of Technology, Bombay Powai, Mumbai, 400 076, India.</mixed-citation></ref><ref id="scirp.56677-ref10"><label>10</label><mixed-citation publication-type="other" xlink:type="simple">Kepuska, V.Z. and Elharati, H.A (2015) Performance Evaluation of Conventional and Hybrid Feature Extractions Using Multivariate HMM Classifier. International Journal of Engineering Research and Applications (IJERA), 5, 96-101.</mixed-citation></ref></ref-list></back></article>