<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article  PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="3.0" xml:lang="en" article-type="research article"><front><journal-meta><journal-id journal-id-type="publisher-id">JCC</journal-id><journal-title-group><journal-title>Journal of Computer and Communications</journal-title></journal-title-group><issn pub-type="epub">2327-5219</issn><publisher><publisher-name>Scientific Research Publishing</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.4236/jcc.2021.99012</article-id><article-id pub-id-type="publisher-id">JCC-112523</article-id><article-categories><subj-group subj-group-type="heading"><subject>Articles</subject></subj-group><subj-group subj-group-type="Discipline-v2"><subject>Computer Science&amp;Communications</subject></subj-group></article-categories><title-group><article-title>
 
 
  On a Feature Extraction and Classification Study for PPG Signal Analysis
 
</article-title></title-group><contrib-group><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Qian</surname><given-names>Wu</given-names></name><xref ref-type="aff" rid="aff1"><sub>1</sub></xref></contrib></contrib-group><aff id="aff1"><label>1</label><addr-line>School of Information Engineering, Minzu University of China, Beijing, China</addr-line></aff><pub-date pub-type="epub"><day>08</day><month>09</month><year>2021</year></pub-date><volume>09</volume><issue>09</issue><fpage>153</fpage><lpage>160</lpage><history><date date-type="received"><day>28,</day>	<month>July</month>	<year>2021</year></date><date date-type="rev-recd"><day>27,</day>	<month>September</month>	<year>2021</year>	</date><date date-type="accepted"><day>30,</day>	<month>September</month>	<year>2021</year></date></history><permissions><copyright-statement>&#169; Copyright  2014 by authors and Scientific Research Publishing Inc. </copyright-statement><copyright-year>2014</copyright-year><license><license-p>This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0/</license-p></license></permissions><abstract><p>
 
 
  
    Photoplethysmography (PPG) is a low cost, non-invasive optical technology to detect the volumetric changes of blood circulation at the surface of skin. While the medical indication of components of PPG signals in the form of pulse wave are not yet fully understood, it is vastly agreed that they carry valuable pathophysiological information related to the cardiovascular system. Going beyond just dealing with frequency and time domain features of the pulse wave, as well as the first and second derivatives of the wave commonly seen in many of the relevant work, we applied a K-MEANS improved algorithm for feature extraction based on selected time domain parameters: K1 (systolic area), K2 (diastolic area) and K (entire pulse wave area). The extracted characteristic waveforms under the same light intensity could achieve average confidence level of 90% or higher. The stationary wavelet transform was adopted to further analyze the characteristic waveform by calculating the wavelet entropy; We then trained a Probability Neural Network (PNN) model using the wavelet entropy and other time domain characteristic parameters. It is found that the trained PNN model performs well in analyzing characteristic waveform to distinguish between health condition and severe arterial stenosis. 
  
 
</p></abstract><kwd-group><kwd>PPG Pulse Wave</kwd><kwd> K-Means</kwd><kwd> Stationary Wavelet Transform</kwd><kwd> Wavelet Entropy</kwd><kwd> Probability Neural Network (PNN)</kwd></kwd-group></article-meta></front><body><sec id="s1"><title>1. Introduction</title><p>Photoplethysmography (PPG) is an electro-optic technology to generate cardiovascular pulse wave by measuring the volumetric changes of blood circulation at the surface of skin [<xref ref-type="bibr" rid="scirp.112523-ref1">1</xref>]. PPG is both clinically and individually adopted for a wide variety of application scenarios from professional diagnostics to society or home health monitoring. Numerous researches [<xref ref-type="bibr" rid="scirp.112523-ref2">2</xref>] [<xref ref-type="bibr" rid="scirp.112523-ref3">3</xref>] [<xref ref-type="bibr" rid="scirp.112523-ref4">4</xref>] on how to extract valuable information out of the PPG pulse wave beyond intuitive heart rate count and pulse oximetry estimation emerged recently. It is believed that the second derivative of pulse wave contains essential health-related information, hence pulse wave analysis could be of significant value in evaluating cardiovascular diseases, facilitating early detection and recognition of illnesses, and continuous health monitoring.</p><p>However due to the electro-optic nature of PPG, many factors could affect PPG signal detection [<xref ref-type="bibr" rid="scirp.112523-ref5">5</xref>]. For example, sensor displacement and movement due to body movement, variation of applied pressure incurred changes of magnitude of the received signal. In reality, PPG measurement usually collects excessive data to average out noises for better signal quality. Nevertheless, this inevitably could further raise difficulties for human reader of the PPG pulse wave. Peculiarity in certain pulse wave may rise simply because of affected sampling due to sensor displacement but sure causes distraction to human readers. It is therefore of practical use to extract feature waveform from vast PPG pulse wave data for the purpose of improving productivity of human readers.</p><p>We propose in the first part of this paper clustering algorithms to extract PPG pulse waves characteristics using three time domain feature parameters: K1, K2 and K, where K1 represents the systolic area, K2 denotes the diastolic area, whereas K holds the entire pulse wave area. An improved K-MEANS algorithm is adopted to extract the feature waveforms out of the pulse wave sets given the same light intensity. We present detailed algorithm implementations and the average confidence level achieved of more than 90%.</p><p>We calculate the wavelet entropy of the characteristic waveform using the stationary wavelet transform. A Probability Neural Network (PNN) model is then introduced with the wavelet entropy and six extra time domain characteristic parameters as the input for training. The trained model is tested to show the effectiveness in classification of waveforms to distinguish between health condition and severe arterial stenosis.</p></sec><sec id="s2"><title>2. Feature Extraction</title><sec id="s2_1"><title>2.1. Time Domain Feature Parameters</title><p>We adopt three time domain feature parameters: K1, K2 and K, where K1 represents the systolic area, K2 denotes the diastolic area, whereas K holds the entire pulse wave area. In medical sense pulse wave area K represents characteristics of microcirculation in general but does not reflect the correlation of other feature points and areas of the whole pulse wave. We then divide the pulse wave area into 2 parts, where K1 is the systolic area, and K2 is the diastolic area.</p><p>We calculate the K1, K2 and K with reference to <xref ref-type="fig" rid="fig1">Figure 1</xref>.</p><p>K is the ratio of the area S<sub>ABCDE</sub> vs. area of rectangular AHFE, denoted as below</p><p>K = S A B C D E S A H F E , S A B C D E = ∫ x A x E G ( t ) d t (2.1-1)</p><p>where x<sub>A</sub> denotes the start of the pulse wave segment, whereas x<sub>E</sub> denotes the end of the segment, G(t) is the function over time of the pulse wave.</p><p>Consequently, the K1 and K2 can be calculated as follows.</p><p>K 1 = S A B C S A H G I , K 2 = S C D E S I G F E S A B C = ∫ x A x C G ( t ) d t , S C D E = ∫ x C x E G ( t ) d t , K = K 1 + K 2 (2.1-2)</p><p>where area S<sub>ABC</sub> is area starting from the X<sub>A</sub> to the dicrotic notch, whereas S<sub>CDE</sub> is the area covering dicrotic notch to X<sub>E</sub>.</p></sec><sec id="s2_2"><title>2.2. Improved K-MEANS</title><p>It is well understood that dirty data affects the clustering results with K-MEANS algorithms; while PPG measurement is prone to noise caused by many factors. We propose an improved K-MEANS algorithm by introducing updated sample center and thresholds after each round of clustering calculation in order to achieve more accurate clustering. Such an improved algorithm is less sensitive to noise and dirty data at the expense of more computing. Fortunately in our case we just have small set of data, hence it is more appropriate to land on the algorithm.</p><p><xref ref-type="fig" rid="fig2">Figure 2</xref>(a) depicts the improved K-MEANS results whereas <xref ref-type="fig" rid="fig2">Figure 2</xref>(b) depicts K-MEANS results.</p><p>The confidence level of improved K-MEANS is much higher than that of the standard K-MEANS algorithms.</p></sec></sec><sec id="s3"><title>3. Stationary Wavelet Transform</title><p>Wavelet transform [<xref ref-type="bibr" rid="scirp.112523-ref6">6</xref>] combines both time and frequency domain together to describe the localized variation of power analysis. Wavelet provides multi-resolution analysis of pulse wave hence makes the result more insights for feature extractions. We adopted stationary wavelet transform, a.k.a., binary wavelet transform or non-decimated wavelet transform, which stops down sampling hence upon each transformation, maintains the same length as the original signal, preserve most valuable information (<xref ref-type="fig" rid="fig3">Figure 3</xref>).</p><sec id="s3_1"><title>3.1. Wavelet Entropy</title><p>We calculate the wavelet entropy as follows.</p><p>W E = − ∑ j P j ln ( P j )</p><p>P j = E j E t o t</p><p>E j = ∑ k | C j ( k ) | 2 (3.1-1)</p><p>E t o t = ∑ j E j</p><p>where j denotes the layers of the signal decomposition (j = 1, 2, 3, ... , 5); k is length of the original signal (k = 1, 2, 3, ..., 512); W<sub>E</sub> denotes the wavelet entropy, E<sub>j</sub> denotes total energy at each layer, P<sub>j</sub> is the probability of layer j’s energy vs. total energy.</p></sec><sec id="s3_2"><title>3.2. Wavelet Entropy Indication of PPG Pulse Wave</title><p>• Data Preparation</p><p>PPG measurement can be affected by many factors, including pathophysiological condition and environmental condition upon test, among which age and blood pressure are key factors. We picked 23 healthy (coronary artery normal or mild stenosis) participants and 23 unhealthy (severe coronary artery stenosis) participants all at age of 50 - 70.</p><p>• Test Results</p><p>The Mean, Variance and Standard Deviation of Wavelet Entropy for healthy and unhealthy participants are listed in <xref ref-type="table" rid="table1">Table 1</xref>.</p><p>The fact that mean value of wavelet entropy of healthy people is less than that of unhealthy people implies that healthy people’s PPG pulse wave is more stable than that of unhealthy people.</p></sec></sec><sec id="s4"><title>4. Classification</title><sec id="s4_1"><title>4.1. Probabilistic Neural Networks (PNN)</title><p>Probabilistic Neural Networks (PNN) [<xref ref-type="bibr" rid="scirp.112523-ref7">7</xref>] is a simple network which can be implemented using linear algebra computation and applicable to classification. As depicted in <xref ref-type="fig" rid="fig4">Figure 4</xref>, five layers are input layer, normalization layer, hidden layer, summation layer, and output layer.</p><table-wrap id="table1" ><label><xref ref-type="table" rid="table1">Table 1</xref></label><caption><title> The mean, variance and standard deviation of wavelet entropy</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Wavelet Entropy</th><th align="center" valign="middle" >MEAN</th><th align="center" valign="middle" >VARIANCE</th><th align="center" valign="middle" >STANDARD DEVIATION</th></tr></thead><tr><td align="center" valign="middle" >Healthy Participants</td><td align="center" valign="middle" >0.3639</td><td align="center" valign="middle" >0.0318</td><td align="center" valign="middle" >0.1785</td></tr><tr><td align="center" valign="middle" >Unhealthy Participants</td><td align="center" valign="middle" >0.4048</td><td align="center" valign="middle" >0.0251</td><td align="center" valign="middle" >0.1584</td></tr></tbody></table></table-wrap></sec><sec id="s4_2"><title>4.2. PNN Inputs</title><p>PNN Input</p><p>SIX time domain paramters: T, Pab, SI, K1, K, RI</p><p>ONE frequency domain parameters: wavelet entropy.</p><p>Where,</p><p>• T is the cycle of pulse wave; Pab = Tab/T, Tab is diff btw Xa and Xb.</p><p>• SI is height(m)/T<sub>bd</sub><sub>,</sub> T<sub>bd</sub> is diff btw Xb and Xd.</p><p>• RI is H<sub>d</sub>/H<sub>b</sub><sub>,</sub> H<sub>d</sub> is vertical diff between Yd and Ya, H<sub>b</sub> is vertical diff between Yb and Ya,</p><p>• K1 and K are defined in 2.1-2</p><p>Healthy People’s Parameters are listed in <xref ref-type="table" rid="table2">Table 2</xref>.</p><p>Unhealthy People’s Parameters are listed in <xref ref-type="table" rid="table3">Table 3</xref>.</p><p>It is obvious that the time domain parameters listed above for healthy and unhealthy people vary in different degree; hence it is difficult to derive any valuable information alone. As a result, we use all these time domain parameters together with wavelet entropy as inputs to the PNN for classification of PPG pulse wave.</p></sec><sec id="s4_3"><title>4.3. PPG Classification w/PNN</title><p>Our test consists of 13 samples as input for training, 10 samples for classification. Results of classification are listed in <xref ref-type="table" rid="table4">Table 4</xref>.</p><p>It clearly demonstrated that classification results in 60% accuracy for healthy people and 80% accuracy for unhealthy ones. The reason for this is that there are clear standard to define unhealthy (stenosis) but not for the healthy ones.</p><table-wrap id="table2" ><label><xref ref-type="table" rid="table2">Table 2</xref></label><caption><title> Input parameters for healthy people</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" >MEAN</th><th align="center" valign="middle" >VARIANCE</th><th align="center" valign="middle" >STANDARD DEVIATION</th></tr></thead><tr><td align="center" valign="middle" >T</td><td align="center" valign="middle" >0.7920</td><td align="center" valign="middle" >0.1192</td><td align="center" valign="middle" >0.0142</td></tr><tr><td align="center" valign="middle" >P<sub>ab</sub></td><td align="center" valign="middle" >0.2112</td><td align="center" valign="middle" >0.0516</td><td align="center" valign="middle" >0.0027</td></tr><tr><td align="center" valign="middle" >SI</td><td align="center" valign="middle" >7.5665</td><td align="center" valign="middle" >1.6493</td><td align="center" valign="middle" >2.7203</td></tr><tr><td align="center" valign="middle" >K1</td><td align="center" valign="middle" >0.6871</td><td align="center" valign="middle" >0.0349</td><td align="center" valign="middle" >0.0012</td></tr><tr><td align="center" valign="middle" >K</td><td align="center" valign="middle" >0.4015</td><td align="center" valign="middle" >0.0681</td><td align="center" valign="middle" >0.0046</td></tr><tr><td align="center" valign="middle" >RI</td><td align="center" valign="middle" >0.4746</td><td align="center" valign="middle" >0.1277</td><td align="center" valign="middle" >0.0163</td></tr></tbody></table></table-wrap><table-wrap id="table3" ><label><xref ref-type="table" rid="table3">Table 3</xref></label><caption><title> Input parameters for unhealthy people</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" >MEAN</th><th align="center" valign="middle" >VARIANCE</th><th align="center" valign="middle" >STANDARD DEVIATION</th></tr></thead><tr><td align="center" valign="middle" >T</td><td align="center" valign="middle" >0.8070</td><td align="center" valign="middle" >0.1382</td><td align="center" valign="middle" >0.0191</td></tr><tr><td align="center" valign="middle" >P<sub>ab</sub></td><td align="center" valign="middle" >0.2231</td><td align="center" valign="middle" >0.0716</td><td align="center" valign="middle" >0.0051</td></tr><tr><td align="center" valign="middle" >SI</td><td align="center" valign="middle" >6.9935</td><td align="center" valign="middle" >1.4744</td><td align="center" valign="middle" >2.1738</td></tr><tr><td align="center" valign="middle" >K1</td><td align="center" valign="middle" >0.6643</td><td align="center" valign="middle" >0.0897</td><td align="center" valign="middle" >0.0080</td></tr><tr><td align="center" valign="middle" >K</td><td align="center" valign="middle" >0.3947</td><td align="center" valign="middle" >0.0746</td><td align="center" valign="middle" >0.0056</td></tr><tr><td align="center" valign="middle" >RI</td><td align="center" valign="middle" >0.4480</td><td align="center" valign="middle" >0.0867</td><td align="center" valign="middle" >0.0075</td></tr></tbody></table></table-wrap><table-wrap id="table4" ><label><xref ref-type="table" rid="table4">Table 4</xref></label><caption><title> Classification results</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" >Normal (%)</th><th align="center" valign="middle" >stenosis (%)</th><th align="center" valign="middle" >Result</th><th align="center" valign="middle" ></th><th align="center" valign="middle" >Normal (%)</th><th align="center" valign="middle" >stenosis (%)</th><th align="center" valign="middle" >Result</th></tr></thead><tr><td align="center" valign="middle" >N1</td><td align="center" valign="middle" >39.64</td><td align="center" valign="middle" >60.36</td><td align="center" valign="middle" >-</td><td align="center" valign="middle" >P1</td><td align="center" valign="middle" >27.38</td><td align="center" valign="middle" >72.62</td><td align="center" valign="middle" >-</td></tr><tr><td align="center" valign="middle" >N2</td><td align="center" valign="middle" >60.13</td><td align="center" valign="middle" >39.87</td><td align="center" valign="middle" >+</td><td align="center" valign="middle" >P2</td><td align="center" valign="middle" >30.43</td><td align="center" valign="middle" >69.57</td><td align="center" valign="middle" >-</td></tr><tr><td align="center" valign="middle" >N3</td><td align="center" valign="middle" >76.66</td><td align="center" valign="middle" >23.34</td><td align="center" valign="middle" >+</td><td align="center" valign="middle" >P3</td><td align="center" valign="middle" >27.83</td><td align="center" valign="middle" >72.17</td><td align="center" valign="middle" >-</td></tr><tr><td align="center" valign="middle" >N4</td><td align="center" valign="middle" >76.94</td><td align="center" valign="middle" >23.06</td><td align="center" valign="middle" >+</td><td align="center" valign="middle" >P4</td><td align="center" valign="middle" >35.72</td><td align="center" valign="middle" >64.28</td><td align="center" valign="middle" >-</td></tr><tr><td align="center" valign="middle" >N5</td><td align="center" valign="middle" >15.08</td><td align="center" valign="middle" >84.92</td><td align="center" valign="middle" >-</td><td align="center" valign="middle" >P5</td><td align="center" valign="middle" >52.20</td><td align="center" valign="middle" >47.80</td><td align="center" valign="middle" >+</td></tr><tr><td align="center" valign="middle" >N6</td><td align="center" valign="middle" >40.34</td><td align="center" valign="middle" >59.66</td><td align="center" valign="middle" >-</td><td align="center" valign="middle" >P6</td><td align="center" valign="middle" >21.70</td><td align="center" valign="middle" >78.30</td><td align="center" valign="middle" >-</td></tr><tr><td align="center" valign="middle" >N7</td><td align="center" valign="middle" >77.93</td><td align="center" valign="middle" >22.07</td><td align="center" valign="middle" >+</td><td align="center" valign="middle" >P7</td><td align="center" valign="middle" >0.60</td><td align="center" valign="middle" >99.40</td><td align="center" valign="middle" >-</td></tr><tr><td align="center" valign="middle" >N8</td><td align="center" valign="middle" >63.27</td><td align="center" valign="middle" >36.73</td><td align="center" valign="middle" >+</td><td align="center" valign="middle" >P8</td><td align="center" valign="middle" >91.88</td><td align="center" valign="middle" >8.12</td><td align="center" valign="middle" >+</td></tr><tr><td align="center" valign="middle" >N9</td><td align="center" valign="middle" >75.76</td><td align="center" valign="middle" >24.24</td><td align="center" valign="middle" >+</td><td align="center" valign="middle" >P9</td><td align="center" valign="middle" >46.71</td><td align="center" valign="middle" >53.29</td><td align="center" valign="middle" >-</td></tr><tr><td align="center" valign="middle" >N10</td><td align="center" valign="middle" >39.31</td><td align="center" valign="middle" >60.69</td><td align="center" valign="middle" >-</td><td align="center" valign="middle" >P10</td><td align="center" valign="middle" >42.67</td><td align="center" valign="middle" >57.33</td><td align="center" valign="middle" >-</td></tr></tbody></table></table-wrap><p>Where, N denotes healthy people, P denotes unhealthy people; “+” denotes coronary artery normal; “-”, denotes severe coronary arterystenosis; “_” denotes misclassification.</p></sec></sec><sec id="s5"><title>5. Conclusion</title><p>The feature extraction and classification methodology for PPG signals using improved K-MEANS improved algorithm, stationary wavelet transform and PNN modelling is easy to implement and effective to use. Time domain parameters and frequency domain wavelet entropy are appropriate data set for PNN modelling to achieve acceptable classification results. We see all this as a start for further work to gain more insights into pathophysiological indication of PPG pulse wave.</p></sec><sec id="s6"><title>Conflicts of Interest</title><p>The author declares no conflicts of interest regarding the publication of this paper.</p></sec><sec id="s7"><title>Cite this paper</title><p>Wu, Q. (2021) On a Feature Extraction and Classification Study for PPG Signal Analysis. Journal of Computer and Communications, 9, 153-160. https://doi.org/10.4236/jcc.2021.99012</p></sec></body><back><ref-list><title>References</title><ref id="scirp.112523-ref1"><label>1</label><mixed-citation publication-type="other" xlink:type="simple">Castaneda, D., Esparza, A., Ghamari, M., Soltanpur, C. and Nazeran, H. (2018) A Review on Wearable Photoplethysmography Sensors and Their Potential Future Applications in Health Care. Int J Biosens Bioelectron, 4, 195-202.  
https://doi.org/10.1109/IEMBS.2006.4398399</mixed-citation></ref><ref id="scirp.112523-ref2"><label>2</label><mixed-citation publication-type="other" xlink:type="simple">Cohn, J.N., Finkelstein, S.M., McVeigh, G.E., et al. (1995) Noninvasive Pulse Wave Analysis for the Early Detection of Vascular Disease. Hypertension, 26, 503-508.  
https://doi.org/10.1161/01.HYP.26.3.503</mixed-citation></ref><ref id="scirp.112523-ref3"><label>3</label><mixed-citation publication-type="other" xlink:type="simple">O’Rourke, M., Pauca, A. and Jiang, X.-J. (2001) Pulse Wave Analysis. Br J Clin Pharmacol., 51, 507-522. https://doi.org/10.15406/ijbsbe.2018.04.00125</mixed-citation></ref><ref id="scirp.112523-ref4"><label>4</label><mixed-citation publication-type="other" xlink:type="simple">Zhang, G., Kong, X. and Liao, S. (2008) Pulse Wave Analysis for Cardiovascular Information Monitoring in Patients with Chronic Heart Failure: Effects of COQ10 Treatment. Montreal: Bio-Engineering 2008.  
https://doi.org/10.1016/B978-0-12-816514-0.00014-X</mixed-citation></ref><ref id="scirp.112523-ref5"><label>5</label><mixed-citation publication-type="other" xlink:type="simple">Bolanos, M., Nazeran, H., Haltiwanger, E., et al. (2006) Comparison of Heart Rate Variability Signal Features Derived from Electrocardiography and Photoplethysmography in Healthy Individuals. Engineering in Medicine and Biology Society, 1, 4289-4294. https://doi.org/10.1046/j.0306-5251.2001.01400.x</mixed-citation></ref><ref id="scirp.112523-ref6"><label>6</label><mixed-citation publication-type="other" xlink:type="simple">Weng, H. and Lau, K.-M. (1994) Wavelets, Period Doubling, and Time-Frequency Localization with Application to Organization of Convection over the Tropical Western. Pacific. J. Atmos. Sci., 51, 2523-2541.  
https://doi.org/10.1175/1520-0469(1994)051&lt;2523:WPDATL&gt;2.0.CO;2</mixed-citation></ref><ref id="scirp.112523-ref7"><label>7</label><mixed-citation publication-type="other" xlink:type="simple">Mohebali, B., Tahmassebi, A., Meyer-Baese, A. and Gandomi, A.H. (2020) Probabilistic Neural Networks: A Brief Overview of Theory, Implementation, and Application. Elsevier, 347-367.</mixed-citation></ref></ref-list></back></article>