<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article  PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="3.0" xml:lang="en" article-type="research article"><front><journal-meta><journal-id journal-id-type="publisher-id">JSEA</journal-id><journal-title-group><journal-title>Journal of Software Engineering and Applications</journal-title></journal-title-group><issn pub-type="epub">1945-3116</issn><publisher><publisher-name>Scientific Research Publishing</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.4236/jsea.2021.1412037</article-id><article-id pub-id-type="publisher-id">JSEA-114352</article-id><article-categories><subj-group subj-group-type="heading"><subject>Articles</subject></subj-group><subj-group subj-group-type="Discipline-v2"><subject>Computer Science&amp;Communications</subject></subj-group></article-categories><title-group><article-title>
 
 
  A Noise Suppression Method for Speech Signal by Jointly Using Bayesian Estimation and Fuzzy Theory
 
</article-title></title-group><contrib-group><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Akira</surname><given-names>Ikuta</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref><xref ref-type="corresp" rid="cor1"><sup>*</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Hisako</surname><given-names>Orimoto</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Kouji</surname><given-names>Hasegawa</given-names></name><xref ref-type="aff" rid="aff2"><sup>2</sup></xref></contrib></contrib-group><aff id="aff2"><addr-line>Western Region Industrial Research Center, Hiroshima Prefectural Technology Research Institute, Kure, Japan</addr-line></aff><aff id="aff1"><addr-line>Department of Management Information Systems, Prefectural University of Hiroshima, Hiroshima, Japan</addr-line></aff><pub-date pub-type="epub"><day>17</day><month>12</month><year>2021</year></pub-date><volume>14</volume><issue>12</issue><fpage>631</fpage><lpage>645</lpage><history><date date-type="received"><day>25,</day>	<month>November</month>	<year>2021</year></date><date date-type="rev-recd"><day>28,</day>	<month>December</month>	<year>2021</year>	</date><date date-type="accepted"><day>31,</day>	<month>December</month>	<year>2021</year></date></history><permissions><copyright-statement>&#169; Copyright  2014 by authors and Scientific Research Publishing Inc. </copyright-statement><copyright-year>2014</copyright-year><license><license-p>This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0/</license-p></license></permissions><abstract><p>
 
 
  Speech recognition systems have been applied to inspection and maintenance operations in industrial factories to recording and reporting routines at construction sites, etc. where hand-writing is difficult. In these actual circumstances, some countermeasure methods for surrounding noise are indispensable. In this study, a new method to remove the noise for actual speech signal was proposed by using Bayesian estimation with the aid of bone-conducted speech and fuzzy theory. More specifically, by introducing Bayes’ theorem based on the observation of air-conducted speech contaminated by surrounding background noise, a new type of algorithm for noise removal was theoretically derived. In the proposed noise suppression method, bone-conducted speech signal with the reduced high-frequency components was regarded as fuzzy observation data, and a stochastic model for the bone-conducted speech was derived by applying the probability measure of fuzzy events. The proposed method was applied to speech signals measured in real environment with low SNR, and better results were obtained than an algorithm based on observation of only air-conducted speech.
 
</p></abstract><kwd-group><kwd>Air- and Bone-Conducted Speeches</kwd><kwd> Noise Suppression</kwd><kwd> Bayesian Estimation</kwd><kwd> Fuzzy Data</kwd></kwd-group></article-meta></front><body><sec id="s1"><title>1. Introduction</title><p>Speech recognition systems have been applied to various fields, for example, to inspection and maintenance operations in industrial factories and at construction sites, etc. where hand-writing is difficult. For speech recognition in such actual circumstances, some suppression methods for surrounding noises are indispensable.</p><p>Previously reported methods for noise reduction in speech recognition can be classified into two categories. One is based on a single microphone [<xref ref-type="bibr" rid="scirp.114352-ref1">1</xref>] [<xref ref-type="bibr" rid="scirp.114352-ref2">2</xref>], and the other uses a microphone array [<xref ref-type="bibr" rid="scirp.114352-ref3">3</xref>]. Since the latter requires a priori information on the number of noise sources, and the number of microphones larger than that of the noise sources is needed in the case of multi-noise sources, this category demands large scale systems. Therefore, the former based on a single microphone is more advantageous than the latter [<xref ref-type="bibr" rid="scirp.114352-ref4">4</xref>] [<xref ref-type="bibr" rid="scirp.114352-ref5">5</xref>]. In such a noise suppression task for speech signals based on a single microphone, many algorithms applying the Kalman filter have been proposed up to now [<xref ref-type="bibr" rid="scirp.114352-ref6">6</xref>] [<xref ref-type="bibr" rid="scirp.114352-ref7">7</xref>] [<xref ref-type="bibr" rid="scirp.114352-ref8">8</xref>] [<xref ref-type="bibr" rid="scirp.114352-ref9">9</xref>]. However, the Kalman filter is originally based on the assumption of Gaussian white noise [<xref ref-type="bibr" rid="scirp.114352-ref10">10</xref>]. The actual noises show complex fluctuation forms with non-Gaussian and non-white properties.</p><p>From the above viewpoint, in our previously reported study, a noise suppression algorithm for the actual speech signals without requirement of the assumption of Gaussian white noise has been proposed [<xref ref-type="bibr" rid="scirp.114352-ref11">11</xref>]. The method can be applied to actual complex situation where both the noise statistics and the fluctuation forms of speech signal are unknown. By applying the algorithm to real speech signals with several kinds of noises, its effectiveness has been experimentally confirmed in comparison with the Kalman filter.</p><p>Furthermore, signal processing methods to remove the noise for actual speech signals have been proposed by jointly using the measured data of bone- and air-conducted speech signals [<xref ref-type="bibr" rid="scirp.114352-ref12">12</xref>] [<xref ref-type="bibr" rid="scirp.114352-ref13">13</xref>]. However, the algorithms of the previous methods were introduced a simple additive model of the original speech signal and surrounding noise for the air-conducted speech observation. Furthermore, the derived algorithms have applied to only the signals mixed with noises on computer, and not to signals in real environment under existence of noises.</p><p>In this study, a new noise suppression method for speech signals is proposed by using Bayes theorem after employing a posterior distribution based on the air-conducted speech observation contaminated by surrounding noise. In the proposed algorithm, in order to improve the accuracy of estimation of speech signal, an expansion expression of conditional probability density function reflecting all linear and non-linear correlation information between original speech signal and air-conducted speech observation is adopted as the model of the speech observation. Then, a probability distribution with parameters estimated from the bone-conducted speech is adopted as the prior distribution. Furthermore, the algorithm proposed in this study is applied to signals measured in real environment under existence of noises.</p><p>Though the bone-conducted speech signal is a kind of solid propagation sound with less effect by the surrounding noise, the high frequency components of the signal are reduced through the propagation process [<xref ref-type="bibr" rid="scirp.114352-ref14">14</xref>]. After considering the bone-conducted speech signal with the reduction of higher components as fuzzy data, applying the probability measure of fuzzy events [<xref ref-type="bibr" rid="scirp.114352-ref15">15</xref>], a new simplified noise suppression method is derived by reflecting the air- and bone-conducted speech signals.</p><p>The effectiveness of the proposed method is confirmed by applying it to bone- and air-conducted speech measured in a real environment under the existence of surrounding noise.</p></sec><sec id="s2"><title>2. Theoretical Consideration</title><sec id="s2_1"><title>2.1. Stochastic Model for Air- and Bone-Conducted Speech Signals by Introducing Fuzzy Theory</title><p>In the actual environment with a surrounding noise, let x k , y k and z k be the original speech signal, the observations of air- and bone-conducted speech signals at a discrete time k. The observation y k is contaminated by a surrounding noise v k . In our previous studies, a simple additive model was considered for the air-conducted speech observation y k [<xref ref-type="bibr" rid="scirp.114352-ref12">12</xref>] [<xref ref-type="bibr" rid="scirp.114352-ref13">13</xref>]. In this study, in order to improve the accuracy of estimation of speech signal x k , an expansion expression of conditional probability density function P ( y k | x k ) [<xref ref-type="bibr" rid="scirp.114352-ref11">11</xref>] reflecting all linear and non-linear correlation information between x k and y k is adopted as the model of air-conducted speech observation.</p><p>P ( y k | x k ) = P ( x k , y k ) / P ( x k ) = P ( y k ) ∑ r = 0 ∞ ∑ s = 0 ∞ A r s θ r ( 1 ) ( x k ) θ s ( 2 ) ( y k ) (1)</p><p>with</p><p>A r s ≡ 〈 θ r ( 1 ) ( x k ) θ s ( 2 ) ( y k ) 〉 , (2)</p><p>where 〈   〉 denotes the averaging operation on variables.</p><p>As the probability density functions x k and y k showing non-Gaussian distribution, the following statistical orthonormal expansion series expressions are adopted.</p><p>P ( x k ) = N ( x k ; μ x , σ x 2 ) ∑ i = 0 ∞ B i 1 i ! H i ( x k − μ x σ x ) , (3)</p><p>P ( y k ) = N ( y k ; μ y , σ y 2 ) ∑ i = 0 ∞ C i 1 i ! H i ( y k − μ y σ y ) (4)</p><p>with</p><p>μ x ≡ 〈 x k 〉 , σ x 2 ≡ 〈 ( x k − μ x ) 2 〉 ,</p><p>B i ≡ 〈 1 i ! H i ( x k − μ x σ x ) 〉 , B 0 = 1 , B 1 = B 2 = 0 ,</p><p>C i ≡ 〈 1 i ! H i ( y k − μ y σ y ) 〉 , C 0 = 1 , C 1 = C 2 = 0 ,</p><p>N ( x ; μ , σ 2 ) ≡ 1 2 π σ 2 exp { − ( x − μ ) 2 2 σ 2 } , (5)</p><p>where H i (   ) is a Hermite polynomial with ith order. Functions θ r ( 1 ) ( x k ) and θ s ( 2 ) ( y k ) are orthonormal polynomials having weighting functions P ( x k ) and P ( y k ) , respectively. These orthonormal polynomials can be decomposed into linearly independent series as</p><p>θ r ( 1 ) ( x k ) = ∑ i = 0 r λ r i ( 1 ) 1 i ! H i ( x k − μ x σ x ) , (6)</p><p>θ s ( 2 ) ( y k ) = ∑ i = 0 s λ s i ( 2 ) 1 i ! H i ( y k − μ y σ y ) . (7)</p><p>The coefficients λ r i ( 1 ) and λ s i ( 2 ) are calculated beforehand by using Schmidt’s orthogonalization algorithm [<xref ref-type="bibr" rid="scirp.114352-ref16">16</xref>]. The expansion coefficients A r s with order r ≤ R , s ≤ S can be obtained from the correlation relationship between original speech signal x k and noisy observation of air-conducted speech y k . Since the original speech signal is unknown in the presence of noise, these coefficients have to be estimated on the basis of the observation y k . Let’s regard the expansion coefficients A r s as unknown parameter vector a :</p><p>a ≡ ( a 11 , ⋯ , a R 1 , a 12 , ⋯ , a R 2 , ⋯ , a 1 S , ⋯ , a R S ) ,</p><p>a r s ≡ A r s , ( r = 1 , 2 , ⋯ , R ; s = 1 , 2 , ⋯ , S ) , (8)</p><p>the following simple dynamical model is introduced for the simultaneous estimation of the parameters with the specific signal x k :</p><p>a k + 1 = a k , (9)</p><p>Next, in order to express the relationship between the original speech signal and bone-conducted speech, after regarding the bone-conducted speech as fuzzy data, the conditional probability distribution function P ( x k | z k ) can be obtained by applying the probability measure of fuzzy events [<xref ref-type="bibr" rid="scirp.114352-ref15">15</xref>] to (1), as follows.</p><p>P ( x k | z k ) = P ( x k , z k ) / P ( z k ) = ∫ m y &#175; k ( y k ) P ( x k , y k ) d y k / ∫ m y &#175; k ( y k ) P ( y k ) d y k ( ≡ N ( x k , z k ) / D ( z k ) ) (10)</p><p>where m y &#175; k ( y k ) is a membership function of the bone-conducted speech z k , and a Gaussian type function:</p><p>m y &#175; k ( y k ) = exp { − α ( y k − y &#175; k ) 2 } , ( y &#175; k ≡ a + b z k ) , (11)</p><p>where a and b are constants and α ( &gt; 0 ) is a parameter, is adopted. Accordingly, by considering P ( x k , y k ) in Equation (1) and P ( y k ) in Equation (4), and the membership function in Equation (11), the numerator of Equation (10) can be expressed as follows:</p><p>N ( x k , z k ) = P ( x k ) e K 3 2 K 1 σ y 2 ∫ ​ ( 1 π / K 1 ) exp { − ( y k − K 2 ) 2 1 / K 1 }     ⋅ ∑ i = 0 ∞ C i 1 i ! H i ( y k − μ y σ y ) ∑ r = 0 ∞ ∑ s = 0 ∞ A r s θ r ( 1 ) ( x k ) θ s ( 2 ) ( y k ) d y k (12)</p><p>with</p><p>K 1 ≡ ( 2 α σ y 2 + 1 ) / ( 2 σ y 2 ) , K 2 ≡ ( 2 α σ y 2 y &#175; k + μ y ) / ( 2 α σ y 2 + 1 ) ,</p><p>K 3 ≡ K 1 ( K 2 2 − 2 α σ y 2 y &#175; k 2 + μ y 2 2 α σ y 2 + 1 ) . (13)</p><p>After considering the equality on Hermite polynomial:</p><p>H i ( y k − μ y σ y ) = ∑ j = 0 i d i j H i ( y k − K 2 1 / 2 K 1 ) , (14)</p><p>where d i j are expansion coefficients reflecting bone-conducted speech signal, and using the orthonormal condition:</p><p>∫ N ( y k ; K 2 , 1 / 2 K 1 ) H j ( y k − K 2 1 / 2 K 1 ) H j ′ ( y k − K 2 1 / 2 K 1 ) d y k = j ! ⋅ δ j j ′ , (15)</p><p>the integral in Equation (12) can be calculated. Thus, the following expression is derived</p><p>N ( x k , z k ) = P ( x k ) e K 3 2 K 1 σ y 2 ∑ i = 0 ∞ 1 i ! C i ∑ r = 0 R ∑ s = 0 S F s i ( z k ) a r s , k θ r ( 1 ) ( x k ) , (16)</p><p>F s i ( z k ) ≡ ∑ t = 0 a ∑ j = 0 min { i , t } λ s t ( 2 ) 1 t ! d i j d t j j ! . (17)</p><p>Furthermore, through the similar calculation process, the denominator of Equation (10) can be derived as follows:</p><p>D ( z k ) = e K 3 2 K 1 σ y 2 G ( z k ) , G ( z k ) ≡ ∑ i = 0 ∞ 1 i ! C i d i 0 . (18)</p><p>Therefore, by substituting Equations (16) and (18) into Equation (10), the conditional probability distribution function P ( x k | z k ) can be expressed explicitly.</p></sec><sec id="s2_2"><title>2.2. Derivation of Noise Suppression Algorithm Based on Bayesian Estimation</title><p>To derive an estimation algorithm for the speech signal x k , the Bayes’ theorem for the conditional probability distribution [<xref ref-type="bibr" rid="scirp.114352-ref17">17</xref>] is first considered. Since the parameter a is also unknown, the conditional joint probability distribution of x k and a k is expressed as</p><p>P ( x k , a k | Y k ) = P ( x k , a k , y k | Y k − 1 ) / P ( y k | Y k − 1 ) , (19)</p><p>where Y k ( ≡ { y 1 , y 2 , ⋯ , y k } ) is a set of air-conducted speech data up to time k. By expanding the conditional joint probability distribution P ( x k , a k , y k | Y k − 1 ) in a statistical orthogonal expansion series on the basis of the well-known Gaussian distribution and calculating the conditional expectation, the estimates of x k and a r s , k for mean can be derived as follows:</p><p>x ^ k ≡ 〈 x k | Y k 〉 = ∑ n = 0 ∞ { B 0 0 n E 0 0 1 0 + B 1 0 n E 1 0 1 0 } 1 n ! H n ( y k − y k * Ω k ) / ∑ n = 0 ∞ B 0 0 n 1 n ! H n ( y k − y k * Ω k ) (20)</p><p>a ^ r s , k ≡ 〈 a r s , k | Y k 〉 = ∑ n = 0 ∞ { B 0 0 n E 00 01 + B 01 n E 01 01 } 1 n ! H n ( y k − y k * Ω k ) / ∑ n = 0 ∞ B 0 0 n 1 n ! H n ( y k − y k * Ω k ) (21)</p><p>with</p><p>E 0 0 1 0 = x k * ( ≡ 〈 x k | Y k − 1 〉 ) , E 1 0 1 0 = Γ x k , Γ x k ≡ 〈 ( x k − x k * ) 2 | Y k − 1 〉 ,</p><p>E 00 01 = a r s , k * ( ≡ 〈 a r s , k | Y k − 1 〉 ) , E 01 01 = Γ a r s , k , Γ a r s , k ≡ 〈 ( a r s , k − a r s , k * ) 2 | Y k − 1 〉 ,</p><p>y k * ≡ 〈 y k | Y k − 1 〉 , Ω k ≡ 〈 ( y k − y k * ) 2 | Y k − 1 〉 ,</p><p>B l m n ≡ 〈 1 l ! H l ( x k − x k * Γ x k ) ∏ r = 0 R ∏ s = 0 S 1 m r s ! H m r s ( a r s , k − a r s , k * Γ a r s , k ) 1 n ! H n ( y k − y k * Ω k ) | Y k − 1 〉 . (22)</p><p>Furthermore, the estimate of a r s , k for variance is derived as follows:</p><p>P a r s , k ≡ 〈 ( a r s , k − a ^ r s , k ) 2 | Y k 〉 = ∑ n = 0 ∞ { B 0 0 n E 00 02 + B 01 n E 01 02 + B 02 n E 02 02 } 1 n ! H n ( y k − y k * Ω k ) / ∑ n = 0 ∞ B 0 0 n 1 n ! H n ( y k − y k * Ω k ) (23)</p><p>with</p><p>E 00 02 = Γ a r s , k + ( a r s , k * − a ^ r s , k ) 2 , E 01 02 = 2 Γ a r s , k ( a r s , k * − a ^ r s , k ) , E 02 02 = 2 Γ a r s , k . (24)</p><p>Using Equation (1) and the orthonormal property of θ s ( 2 ) ( y k ) , variables y k * and Ω k in Equations (20) (21) and (23) can be calculated as follows:</p><p>y k * = 〈 ∫ y k P ( y k | x k ) d y k | Y k − 1 〉 = 〈 ∑ r = 0 ∞ ∑ s = 0 1 e 1 s A r s θ r ( 1 ) ( x k ) | Y k − 1 〉 = ∑ r = 0 R ∑ s = 0 1 e 1 s a r s , k * 〈 θ r ( 1 ) ( x k ) | Y k − 1 〉 (25)</p><p>Ω k = 〈 ∫ ( y k − y k * ) 2 P ( y k | x k ) d y k | Y k − 1 〉 = ∑ r = 0 R ∑ s = 0 2 e 2 s a r s , k * 〈 θ r ( 1 ) ( x k ) | Y k − 1 〉 (26)</p><p>with</p><p>e 10 = μ y , e 11 = σ y ,</p><p>e 20 = f 20 − ( f 21 λ 11 ( 2 ) − f 22 λ 11 ( 2 ) λ 22 ( 2 ) λ 21 ( 2 ) ) λ 10 ( 2 ) − f 22 λ 22 ( 2 ) λ 20 ( 2 ) ,</p><p>e 21 = f 21 λ 11 ( 2 ) − f 22 λ 11 ( 2 ) λ 22 ( 2 ) λ 21 ( 2 ) , e 22 = f 22 λ 22 ( 2 ) ,</p><p>f 20 = ( μ y − y k * ) 2 + σ y 2 , f 21 = 2 σ y ( μ y − y k * ) , f 22 = 2 σ y 2 . (27)</p><p>Furthermore, by considering Equations (10) (16) (18) and orthonormal property of θ r ( 1 ) ( x k ) , variables x k * , Γ x k in Equation (22) and the conditional expectation in Equations (25) (26) can be calculated as follows:</p><p>x k * = 〈 ∫ x k P ( x k | z k ) d x k | Y k − 1 〉 = ∑ i = 0 ∞ 1 i ! C i ∑ r = 0 1 ∑ s = 0 S h 1 r F s i ( z k ) a r s , k * / G ( z k ) (28)</p><p>Γ x k = 〈 ∫ ( x k − x k * ) 2 P ( x k | z k ) d x k | Y k − 1 〉 = ∑ i = 0 ∞ 1 i ! C i ∑ r = 0 2 ∑ s = 0 S h 2 r F s i ( z k ) a r s , k * / G ( z k ) (29)</p><p>〈 θ r ( 1 ) ( x k ) | Y k − 1 〉 = 〈 ∫ θ r ( 1 ) ( x k ) P ( x k | z k ) d x k | Y k − 1 〉 = ∑ i = 0 ∞ 1 i ! C i ∑ s = 0 S F s i ( z k ) a r s , k * / G ( z k ) (30)</p><p>with</p><p>h 10 = μ x , h 11 = σ x ,</p><p>h 20 = p 20 − ( p 21 λ 11 ( 1 ) − p 22 λ 11 ( 1 ) λ 22 ( 1 ) λ 21 ( 1 ) ) λ 10 ( 1 ) − p 22 λ 22 ( 1 ) λ 20 ( 1 ) ,</p><p>h 21 = p 21 λ 11 ( 1 ) − p 22 λ 11 ( 1 ) λ 22 ( 1 ) λ 21 ( 1 ) , h 22 = p 22 λ 22 ( 1 ) ,</p><p>p 20 = ( μ x − x k * ) 2 + σ x 2 , p 21 = 2 σ x ( μ x − x k * ) , p 22 = 2 σ x 2 . (31)</p><p>Since Equations (28) (29) and (30) can be evaluated by measuring bone-conducted speech z k , no time transition models of x k are necessary. Therefore, computation time of the proposed algorithm can be reduced than the previous one [<xref ref-type="bibr" rid="scirp.114352-ref12">12</xref>]. Furthermore, by considering Equation (9), two parameters a r s , k * and Γ a r s , k in Equation (22) are given by the estimates of a r s , k at the discrete time k − 1 , as follows:</p><p>a r s , k * = a ^ r s , k − 1 , Γ a r s , k = P a r s , k − 1 . (32)</p><p>Finally, considering Equations (1) (9) and (10), the expansion coefficients B l m n in the estimation algorithm in Equations (20) (21) and (23) are given by the measurement of bone-conducted speech z k , estimates of parameter a r s , k at the discrete time k − 1 , through the similar calculation process to Equations (25)-(30). Therefore, recursive estimation of the speech signal x k can be achieved.</p></sec></sec><sec id="s3"><title>3. Application to Speech Signal in Real Environment</title><p>In order to confirm the actual usefulness of the proposed noise suppression algorithm, it was applied to speech signals in real noise environment. Though, in the previous studies [<xref ref-type="bibr" rid="scirp.114352-ref12">12</xref>] [<xref ref-type="bibr" rid="scirp.114352-ref13">13</xref>], the noisy air-conducted speeches were created on a computer by mixing the original air-conducted speech signal measured in a noise-free environment, the algorithm proposed in this study was applied to signals measured in real environment under existence of actual noises. For a female and a male speech signals digitized with sampling frequency of 10 kHz and quantization of 16 bits, we estimated the speech signal based on the observation corrupted by additive noise.</p><p>More specifically, air-conducted speeches were measured in real environment under existence of a white noise generated from a noise generator and an actual machine noise. The bone-conducted speech was simultaneously measured by use of an acceleration sensor with the air-conducted speech. By setting roughly the amplitude of the noises at two levels, the proposed algorithm was applied to extremely difficult situations with low SNR (noise-free air-conducted speech signal to noise ratio defined by SNR = 10 log 10 ( ∑ x k 2 / ∑ v k 2 ) ) being approximately −3 dB and −5 dB.</p><p>Using the observed bone-conducted speech and noisy observation on air-con ducted speech, constants a and b are first calculated by introducing the linear regression model in Equation (11) and applying the least squared method to this model. Secondly, the parameter α of the membership function is obtained by calculating the standard deviation σ of y k around y &#175; k , as α = 2 σ after assuming Gaussian distribution for the deviation.</p><p>The observed signals on air-conducted female speech contaminated by the white noise and machine noise are shown in <xref ref-type="fig" rid="fig1">Figure 1</xref> and <xref ref-type="fig" rid="fig2">Figure 2</xref>. Furthermore, for the male speech signal, noisy air-conducted speech observations are shown in <xref ref-type="fig" rid="fig3">Figure 3</xref> and <xref ref-type="fig" rid="fig4">Figure 4</xref> respectively.</p><p>The estimated results by using the algorithm based on Equations (20)-(24) are shown in <xref ref-type="fig" rid="fig5">Figure 5</xref> and <xref ref-type="fig" rid="fig6">Figure 6</xref> for the female speech signal and in <xref ref-type="fig" rid="fig7">Figure 7</xref> and <xref ref-type="fig" rid="fig8">Figure 8</xref> for the male speech signal. For comparison, the estimated results of the female and male speech signals by using the estimation algorithm based on only the observation of air-conducted speech are shown in Figures 9-12.</p><p>By comparing Figures 5-8 with Figures 9-12, it is obvious that the proposed method can suppress the effects of white noise and real machine noise better than the method based on observation of only air-conducted speech.</p><p>The air-conducted female and male speech signals spoken by the same speakers in the different situation without any noises are shown in <xref ref-type="fig" rid="fig1">Figure 1</xref>3 and <xref ref-type="fig" rid="fig1">Figure 1</xref>4 as references. By comparing these speech signals measured in noise-free circumstance with the estimated results by the proposed method and the results by using the algorithm based on the observation of only air-conducted signal, the effectiveness of the proposed method is obvious. Furthermore, the computation time of the proposed method was reduced by 55.2% of the algorithm based on the only air-conducted observation, because it is unnecessary for the proposed method to calculate recursively the estimate of variance of x k based on the air-conducted speech y k .</p></sec><sec id="s4"><title>4. Conclusions</title><p>In this paper, after considering the bone-conducted speech signal with the reduction of higher components as fuzzy data, applying the probability measure of fuzzy events, a new noise suppression method is derived on the basis of Bayes’ theorem as the fundamental principle of estimation. Furthermore, the proposed algorithm has been applied to real speech signals contaminated by noises measured in actual environment with low SNR. As a result, it has been revealed by experiments that better estimation results may be obtained by the proposed algorithm as compared with the method based on only air-conducted observations.</p><p>The proposed approach is quite different from the traditional standard techniques. However, we are still in an early stage of development, and a number of practical problems are yet to be investigated in the future. These include: 1) application to a diverse range of speech signals in actual noise environment, 2) extension to cases with multi-noise sources, and 3) finding an optimal number of expansion terms for the expansion-based probability expressions adopted.</p></sec><sec id="s5"><title>Acknowledgements</title><p>The authors are grateful to Ms. Yui Maeda of the Prefectural University of Hiroshima for her help during this study. This work was supported in part by fund from the Grant-in-Aid for Scientific Research No. 19K04428 from the Ministry of Education, Culture, Sports, Science and Technology-Japan.</p></sec><sec id="s6"><title>Conflicts of Interest</title><p>The authors declare no conflicts of interest regarding the publication of this paper.</p></sec><sec id="s7"><title>Cite this paper</title><p>Ikuta, A., Orimoto, H. and Hasegawa, K. (2021) A Noise Suppression Method for Speech Signal by Jointly Using Bayesian Estimation and Fuzzy Theory. Journal of Software Engineering and Applications, 14, 631-645. https://doi.org/10.4236/jsea.2021.1412037</p></sec></body><back><ref-list><title>References</title><ref id="scirp.114352-ref1"><label>1</label><mixed-citation publication-type="other" xlink:type="simple">Yamashita, K. and Shimamura, T. (2005) Nonstationary Noise Estimation Using Low-Frequency Regions for Spectral Subtraction. IEEE Signal Processing Letters, 12, 465-468. https://doi.org/10.1109/LSP.2005.847864</mixed-citation></ref><ref id="scirp.114352-ref2"><label>2</label><mixed-citation publication-type="other" xlink:type="simple">Plapous, C., Marro, C. and Scalart, P. (2006) Improved Signal-to-Noise Ratio Estimation for Speech Enhancement. IEEE Transactions on Speech and Audio Processing, 14, 2098-2108. https://doi.org/10.1109/TASL.2006.872621</mixed-citation></ref><ref id="scirp.114352-ref3"><label>3</label><mixed-citation publication-type="other" xlink:type="simple">McCowan, I.A. and Bourlard, H. (2003) Microphone Array Post-Filter Based on Noise Field Coherence. IEEE Transactions on Speech and Audio Processing, 11, 709-716. https://doi.org/10.1109/TSA.2003.818212</mixed-citation></ref><ref id="scirp.114352-ref4"><label>4</label><mixed-citation publication-type="other" xlink:type="simple">Kawamura, A., Fujii, K., Itoh, Y. and Fukui, Y. (2002) A Noise Reduction Method Based on Linear Prediction Analysis. IEICE Transactions on Fundamentals, J85-A, 415-423. https://doi.org/10.1109/ICASSP.2002.1004860</mixed-citation></ref><ref id="scirp.114352-ref5"><label>5</label><mixed-citation publication-type="other" xlink:type="simple">Kawamura, A., Fujii, K. and Itoh, Y. (2005) A Noise Reduction Method Based on Linear Prediction with Variable Step-Size. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, E88-A, 855-861. https://doi.org/10.1093/ietfec/e88-a.4.855</mixed-citation></ref><ref id="scirp.114352-ref6"><label>6</label><mixed-citation publication-type="other" xlink:type="simple">Kim, W. and Ko, H. (2001) Noise Variance Estimation for Kalman Filtering of Noisy Speech. IEICE Transactions on Information and Systems, E84-D, 155-160.</mixed-citation></ref><ref id="scirp.114352-ref7"><label>7</label><mixed-citation publication-type="other" xlink:type="simple">Li, H., Wang, X., Dai, B. and Lu, W. (2007) A Kalman Smoothing Algorithm for Speech Enhancement Based on the Properties of Vocal Tract Varying Slowly. Proceedings of Eighth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing, Qingdao, 30 July-1 Aug. 2007, 832-836.</mixed-citation></ref><ref id="scirp.114352-ref8"><label>8</label><mixed-citation publication-type="other" xlink:type="simple">Tanabe, N., Furukawa, T. and Tsuji, S. (2008) Robust Noise Suppression Algorithm with the Kalman Filter Theory for White and Colored Disturbance. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, E91-A, 818-829. https://doi.org/10.1093/ietfec/e91-a.3.818</mixed-citation></ref><ref id="scirp.114352-ref9"><label>9</label><mixed-citation publication-type="other" xlink:type="simple">Jia, H., Zhang, X. and Jin, C. (2009) A Modified Speech Enhancement Algorithm Based on the Subspace. Proceedings of 2009 Second International Symposium on Knowledge Acquisition and Modeling, Wuhan, 30 November-1 December 2009, 344-347. https://doi.org/10.1109/KAM.2009.19</mixed-citation></ref><ref id="scirp.114352-ref10"><label>10</label><mixed-citation publication-type="other" xlink:type="simple">Candy, J.V. (2009) Bayesian Signal Processing: Classical, Modern, and Particle Filtering Methods. John Wiley &amp; Sons Ltd., Hoboken. https://doi.org/10.1002/9780470430583</mixed-citation></ref><ref id="scirp.114352-ref11"><label>11</label><mixed-citation publication-type="other" xlink:type="simple">Ikuta, A. and Orimoto, H. (2011) Adaptive Noise Suppression Algorithm for Speech Signal Based on Stochastic System Theory. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, E94-A, 1618-1627. https://doi.org/10.1587/transfun.E94.A.1618</mixed-citation></ref><ref id="scirp.114352-ref12"><label>12</label><mixed-citation publication-type="other" xlink:type="simple">Ikuta, A., Orimoto, H. and Gallagher, G. (2018) Noise Suppression Method by Jointly Using Bone- and Air-Conducted Speech Signals. Noise Control Engineering Journal, 66, 472-488. https://doi.org/10.3397/1/376640</mixed-citation></ref><ref id="scirp.114352-ref13"><label>13</label><mixed-citation publication-type="other" xlink:type="simple">Orimoto, H., Ikuta, A. and Hasegawa, K. (2021) Speech Signal Detection Based on Bayesian Estimation by Observing Air-Conducted Speech under Existence of Surrounding Noise with the Aid of Bone-Conducted Speech. Intelligent Information Management, 13, 199-213. https://doi.org/10.4236/iim.2021.134011</mixed-citation></ref><ref id="scirp.114352-ref14"><label>14</label><mixed-citation publication-type="other" xlink:type="simple">Shin, H.S., Kang, H.G. and Fingscheidt, T. (2012) Survey of Speech Enhancement Supported by a Bone Conduction Microphone. Proceedings of 10th ITG Conference on Speech Communication, Braunschweig, 26-28 September 2012, 47-50.</mixed-citation></ref><ref id="scirp.114352-ref15"><label>15</label><mixed-citation publication-type="other" xlink:type="simple">Ikuta, A. and Orimoto, H. (2014) Fuzzy Signal Processing of Sound and Electromagnetic Environment by Introducing Probability Measure of Fuzzy Events. Proceedings of International Conference on Fuzzy Computation Theory and Applications, Rome, 22-24 October 2014, 5-13.</mixed-citation></ref><ref id="scirp.114352-ref16"><label>16</label><mixed-citation publication-type="other" xlink:type="simple">Orimoto, H. and Ikuta, A. (2012) Prediction of Response Probability Distribution by Considering Additive Property of Energy and Evaluation in Decibel Scale for Sound Environment System with Unknown Structure. Transactions of the Society of Instrument and Control Engineers, 48, 830-836. https://doi.org/10.9746/sicetr.48.830</mixed-citation></ref><ref id="scirp.114352-ref17"><label>17</label><mixed-citation publication-type="other" xlink:type="simple">Orimoto, H. and Ikuta, A. (2019) State Estimation for Sound Environment System with Nonlinear Observation Characteristics by Introducing Wide-Sense Particle Filter. Intelligent Information Management, 11, 87-101. https://doi.org/10.4236/iim.2019.116008</mixed-citation></ref></ref-list></back></article>