<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article  PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="3.0" xml:lang="en" article-type="research article"><front><journal-meta><journal-id journal-id-type="publisher-id">JCC</journal-id><journal-title-group><journal-title>Journal of Computer and Communications</journal-title></journal-title-group><issn pub-type="epub">2327-5219</issn><publisher><publisher-name>Scientific Research Publishing</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.4236/jcc.2022.108002</article-id><article-id pub-id-type="publisher-id">JCC-119143</article-id><article-categories><subj-group subj-group-type="heading"><subject>Articles</subject></subj-group><subj-group subj-group-type="Discipline-v2"><subject>Computer Science&amp;Communications</subject></subj-group></article-categories><title-group><article-title>
 
 
  Framework Development Using Data Mining Techniques to Predict Mortality Risk during Pandemic
 
</article-title></title-group><contrib-group><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Debjany</surname><given-names>Chakraborty</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Md</surname><given-names>Musfique Anwar</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref><xref ref-type="corresp" rid="cor1"><sup>*</sup></xref></contrib></contrib-group><aff id="aff1"><addr-line>Computer Science and Engineering Department, Jahangirnagar University, Dhaka, Bangladesh</addr-line></aff><pub-date pub-type="epub"><day>11</day><month>08</month><year>2022</year></pub-date><volume>10</volume><issue>08</issue><fpage>18</fpage><lpage>25</lpage><history><date date-type="received"><day>23,</day>	<month>January</month>	<year>2022</year></date><date date-type="rev-recd"><day>9,</day>	<month>August</month>	<year>2022</year>	</date><date date-type="accepted"><day>12,</day>	<month>August</month>	<year>2022</year></date></history><permissions><copyright-statement>&#169; Copyright  2014 by authors and Scientific Research Publishing Inc. </copyright-statement><copyright-year>2014</copyright-year><license><license-p>This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0/</license-p></license></permissions><abstract><p>
 
 
  The corona virus, which causes the respiratory infection Covid-19, was first detected in late 2019. It then spread quickly across the globe in the first months of 2020, reaching more than 15 million confirmed cases by the second half of July. This global impact of the novel coronavirus (COVID-19) requires accurate forecasting about the spread of confirmed cases as well as continuation of analysis of the number of deaths and recoveries. Forecasting requires a huge amount of data. At the same time, forecasts are highly influenced by the reliability of the data, vested interests, and what variables are being predicted. Again, human behavior plays an important role in efficiently controling the spread of novel coronavirus. This paper introduces a sustainable approach for predicting the mortality risk during the pandemic to help medical decision making and raise public health awareness. This paper describes the range of symptoms for corona virus suffered patients and the ways of predicting patient mortality rate based on their symptoms.
 
</p></abstract><kwd-group><kwd>Sequential forward Feature Selection</kwd><kwd> Symptom Categorization</kwd><kwd> Decision Tree</kwd><kwd> Attribute Selection Measure</kwd></kwd-group></article-meta></front><body><sec id="s1"><title>1. Introduction</title><p>The global impact of the novel coronavirus (COVID-19) requires accurate forecasting about the spread of confirmed cases as well as continuation of analysis of the number of deaths and recoveries. Forecasting requires a huge amount of data. At the same time, forecasts are highly influenced by the reliability of the data, vested interests, and what variables are being predicted. This paper introduces a sustainable approach for predicting the mortality risk during pandemic to help medical decision making and raise public health awareness. This paper describes the range of symptoms of the patients who suffered in corona virus and the ways of predicting patient mortality rate based on their symptoms.</p><p>In this study, we will propose a data-driven predictive algorithm based on data mining to determine the health risk and predict the mortality risk of patients with COVID-19. The algorithm predicts the mortality risks based on patients’ physiological conditions, symptoms, chronic medical history, duration of illness and demographic information. This model will help hospitals and medical facilities in the following way:</p><p>&#183; who needs to get attention first?</p><p>&#183; who has higher priority to be hospitalized,</p><p>&#183; triage patients when the system is overwhelmed by overcrowding,</p><p>&#183; eliminate delays in providing the necessary care.</p><p>&#183; take immediate decisions by observing the most alarming symptoms</p></sec><sec id="s2"><title>2. Background Study</title><p>Sajana et al. [<xref ref-type="bibr" rid="scirp.119143-ref1">1</xref>] applied a non-invasive machine learning techniques to facilitate the doctors for ordering the hazard in dengue patients. They have conducted a comparison study among Simple Classification and Regression Tree (CART), Multi-layer perception (MLP) and C4.5 algorithms, based on which demonstrating that Simple CART algorithm shows 100% accuracy for classification of affected or unaffected patient. In this paper they have investigate various papers of different authors and made a list of the comparison between them in tabular form.</p><p>Krishna et al. [<xref ref-type="bibr" rid="scirp.119143-ref2">2</xref>] in their research paper mainly focused on a data mining technique that had an objective of creating a prediction model, using decision tree for predicting the chances of occurrences of diseases in an area, this model also identifies different significant parameters which can be used to help for the creation of model. They have taken both rural and urban area data, classified data set though decision tree construction method and they showed finally that out of 344 dengue cases 48.93% are from tribal areas. There are very limited cases from Hill, Rural and Urban areas of East Godavari District i.e., dengue cases are reported mainly in Tribal and Hill areas.</p><p>Mahdavi et al. [<xref ref-type="bibr" rid="scirp.119143-ref3">3</xref>] aimed to develop and compare prognosis prediction machine learning models based on invasive laboratory and noninvasive clinical and demographic data from patients’ day of admission. Wanyan et al. proposed a novel framework that utilizes relational learning based on a heterogeneous graph model (HGM) for predicting mortality at different time windows in COVID-19 patients within the intensive care unit (ICU) [<xref ref-type="bibr" rid="scirp.119143-ref4">4</xref>] . Friedman et al. [<xref ref-type="bibr" rid="scirp.119143-ref6">6</xref>] performed analysis by introducing a publicly available evaluation framework for assessing the predictive validity of COVID-19 mortality forecasts and track the model performance as well.</p></sec><sec id="s3"><title>3. Proposed Methodology</title><p>We have applied step by step process for completing our whole framework development as shown in <xref ref-type="fig" rid="fig1">Figure 1</xref>.</p><sec id="s3_1"><title>3.1. Data Preprocessing</title><p>We first need to apply preprocessing steps to remove noisy information from the dataset [<xref ref-type="bibr" rid="scirp.119143-ref5">5</xref>] [<xref ref-type="bibr" rid="scirp.119143-ref7">7</xref>] [<xref ref-type="bibr" rid="scirp.119143-ref9">9</xref>] [<xref ref-type="bibr" rid="scirp.119143-ref10">10</xref>] [<xref ref-type="bibr" rid="scirp.119143-ref11">11</xref>] .</p><p>&#183; We used a dataset of more than 117,00 laboratory-confirmed COVID-19 patients from 76 countries around the world including both male and female patients with an average age of 56.6 [<xref ref-type="bibr" rid="scirp.119143-ref8">8</xref>] .</p><p>&#183; The original dataset contained 32 data elements from each patient, including demographic and physiological data.</p><p>&#183; At the data cleaning stage, we removed useless and redundant data elements such as data source, admin id, and admin name.</p><p>&#183; Data imputation techniques were used to handle missing values.</p></sec><sec id="s3_2"><title>3.2. Sequential forward Feature Selection (SFS)</title><p>The primary purpose of feature selection is to find the most informative features and eliminate redundant data to reduce the dimensionality and complexity of the model.</p><p>In SFS variant features are sequentially added to an empty set of features until the addition of extra features does not reduce the criterion. Here, starting from the empty set, sequentially add the feature x + that maximizes J(Yk + x) when combined with the features Yk hat have already been selected.</p></sec><sec id="s3_3"><title>3.3. Symptom Categorization of Covid-19 Patient</title><p>The Nobel corona virus can cause a range of symptoms, ranging from mild illness to pneumonia. Symptoms of the disease are fever, cough, sore throat and headaches. In severe cases difficulty in breathing and deaths can occur. Here we have classified the symptoms into three categories: mild, severe, critical. The symptoms of these three categories are written below:</p><p>Mild Case: Fever, Dry Cough</p><p>Severe Case: Fever, Dry Cough, Diarrhea</p><p>Critical Case: Fever, Dry Cough, Diarrhea, Pneumonia, Shortness of Breath, Respiratory Failure (<xref ref-type="fig" rid="fig2">Figure 2</xref>).</p></sec><sec id="s3_4"><title>3.4. Decision Tree Construction</title><p>We follow decision tree construction method to identify Decision Tree is a popular classifier which is simple and easy to implement. It requires no domain knowledge or parameter setting and can handle high dimensional data. The results obtained from Decision Trees are easier to read and interpret. The drill through feature to access detailed patients profiles is only available in Decision Trees.</p><p>&#183; Select the best attribute using Attribute Selection Measures (ASM) to split the records.</p><p>&#183; Make that attribute a decision node and breaks the dataset into smaller subsets.</p><p>&#183; Start tree building by repeating this process recursively for each child until one of the condition will match:</p><p>- All the tuples belong to the same attribute value.</p><p>- There are no more remaining attributes.</p><p>- There are no more instances.</p><p>A decision tree is a tree where each node represents a feature (attribute), each link (branch) represents a decision (rule) and each leaf represents an outcome (categorical or continues value). The whole idea is to create a tree like this for the entire data and process a single outcome at every leaf (or minimize the error in every leaf) (<xref ref-type="fig" rid="fig3">Figure 3</xref>).</p><p>Conditions for stopping partitioning:</p><p>&#183; All samples for a given node belongs to the same class</p><p>&#183; There are no remaining attributes for further partitioning—majority voting is engaged for classifying the leaf. There are no samples remaining.</p></sec><sec id="s3_5"><title>3.5. Attribute Selection Measure (ASM)</title><p>By explaining the given dataset, attribute Selection Measure provides a rank to each feature (or attribute). We will identify the splitting attribute through best score attribute. But we need to define the split points in case of a continuous-valued attribute. Here,</p><disp-formula id="scirp.119143-formula149"><label>(1)</label><graphic position="anchor" xlink:href="//html.scirp.org/file/2-1731745x5.png?20220811165510991"  xlink:type="simple"/></disp-formula><disp-formula id="scirp.119143-formula150"><label>(2)</label><graphic position="anchor" xlink:href="//html.scirp.org/file/2-1731745x6.png?20220811165510991"  xlink:type="simple"/></disp-formula><p>where, Info(D) is the average amount of information needed to identify the class label of a tuple in D. The term <inline-formula><inline-graphic xlink:href="/html.scirp.org/file/2-1731745x7.png" xlink:type="simple"/></inline-formula> is the weight of the j-th partition. InfoA(D) is the expected information required to classify a tuple from D based on the partitioning by A and the attribute A with the highest information gain where, Gain(A) is chosen as the splitting attribute at node N.</p><p>Next, we build schematic decision tree as shown in <xref ref-type="fig" rid="fig4">Figure 4</xref>.</p><p>Here, Mchronic means Multi-chronic diseases: Those patients who have more than one chronic disease history. Again, Chronic means those patients who have one chronic disease.</p></sec></sec><sec id="s4"><title>4. Experimental Results</title><p>We have divided the original dataset<sup>1</sup> into 60:40 split (Train:Test). The validation in the auto model is a multi-hold out set validation. The model has trained on 60% data and the 40% test data has been divided into subsets. Once the model is trained, it has been used to make predictions on each of the subsets independently and the performance of these subsets has been averaged (Figures 5-7).</p><p><xref ref-type="table" rid="table1">Table 1</xref> presents the performance evaluation of the proposed model with the real time data. This performance is based on the following criteria:</p><p>&#183; The performance is calculated on a 40% hold out set which has not been used for any of the performed model optimizations.</p><p>&#183; This hold-out set is then used as input for a multi-hold-out-set validation where we calculate the performance for 7 disjoint subsets.</p><p>&#183; The largest and the highest performance are removed and the average of the remaining 5 performances is reported here.</p><p>&#183; Although this validation is not as thorough as a full cross-validation, this approach strikes a good balance between runtime and model validation quality.</p><table-wrap id="table1" ><label><xref ref-type="table" rid="table1">Table 1</xref></label><caption><title> Performance evaluation</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Death category</th><th align="center" valign="middle" >Proposed System</th><th align="center" valign="middle" >Original case</th></tr></thead><tr><td align="center" valign="middle" >Overall</td><td align="center" valign="middle" >24.05%</td><td align="center" valign="middle" >15%</td></tr><tr><td align="center" valign="middle" >Male</td><td align="center" valign="middle" >9.05%</td><td align="center" valign="middle" >4.07%</td></tr></tbody></table></table-wrap></sec><sec id="s5"><title>5. Conclusions</title><p>This work proposes a framework to predict the mortality risk during pandemic in order to make proper medical decisions as well as generate public health awareness. Our observation is that data mining excels at categorizing data, especially once it has been exposed to large amounts of data on the subject. That creates great promise for data mining when it comes to diagnostics—medical imaging analysis and patient medical records, genetics, and more can all be combined to improve diagnostic outcomes and medical decision support.</p><p>The future scope of this study is to design and develop an industry level Hospital Management ERP solution by incorporating our framework in the proper way in our traditional hospital management system software to minimize cost.</p></sec><sec id="s6"><title>Acknowledgements</title><p>We are thankful to the Department of Computer Science and Engineering, Jahangirnagar University.</p></sec><sec id="s7"><title>Conflicts of Interest</title><p>The authors declare no conflicts of interest regarding the publication of this paper.</p></sec><sec id="s8"><title>Cite this paper</title><p>Chakraborty, D. and Anwar, M.M (2022) Framework Development Using Data Mining Techniques to Predict Mortality Risk during Pandemic. Journal of Computer and Communications, 10, 18-25. https://doi.org/10.4236/jcc.2022.108002</p></sec><sec id="s9"><title>NOTES</title></sec></body><back><ref-list><title>References</title><ref id="scirp.119143-ref1"><label>1</label><mixed-citation publication-type="other" xlink:type="simple">Sajana, T., Navya, M., Gayathri, Y. and Reshma, N. (2018) Classification of Dengue Using Machine Learning Techniques. International Journal of Engineering Technology, 7, 212-218. https://doi.org/10.14419/ijet.v7i2.32.15570</mixed-citation></ref><ref id="scirp.119143-ref2"><label>2</label><mixed-citation publication-type="other" xlink:type="simple">Varma, M., Krishna, S. and Rao, N.K. (2015) Dengue Data Analysis Using Decision Tree Model. International Conference on Emerging Trends in Science Technology Engineering and Management, 9th &amp; 10th, October 2015, 404-414.</mixed-citation></ref><ref id="scirp.119143-ref3"><label>3</label><mixed-citation publication-type="other" xlink:type="simple">Mahdavi, M., Choubdar, H., Zabeh, E., Rieder, M., Safavi-Naeini, S., et al. (2021) A Machine Learning Based Exploration of COVID-19 Mortality Risk. PLOS ONE, 16, e0252384. https://doi.org/10.1371/journal.pone.0252384</mixed-citation></ref><ref id="scirp.119143-ref4"><label>4</label><mixed-citation publication-type="other" xlink:type="simple">Wanyan, T. and Vaid, A. and De Freitas, J.K., et al. (2020) Relational Learning Improves Prediction of Mortality in COVID-19 in the Intensive Care Unit. IEEE transactions on Big Data, 7, 38-44. https://doi.org/10.1109/TBDATA.2020.3048644</mixed-citation></ref><ref id="scirp.119143-ref5"><label>5</label><mixed-citation publication-type="book" xlink:type="simple">Anwar, M.M., Liu, C. and Li, J. (2018) Uncovering Attribute-Driven Active Intimate Communities. In: Wang, J., Cong, G., Chen, J. and Qi, J., Eds., Databases Theory and Applications. ADC 2018, Springer, Cham, 109-122. https://doi.org/10.1007/978-3-319-92013-9_9</mixed-citation></ref><ref id="scirp.119143-ref6"><label>6</label><mixed-citation publication-type="other" xlink:type="simple">Friedman, J., Liu, P., Troeger, C.E., et al. (2021) Predictive Performance of International COVID-19 Mortality Forecasting Models. Nature Communications, 12, Article No. 2609. https://doi.org/10.1038/s41467-021-22457-w</mixed-citation></ref><ref id="scirp.119143-ref7"><label>7</label><mixed-citation publication-type="book" xlink:type="simple">Anwar, M.M., Liu, C., Li, J. (2017) Discovering and Tracking Active Online Social Groups. In: Bouguettaya, A., et al., Eds., Web Information Systems Engineering. WISE 2017. Springer, Cham, 54-69. https://doi.org/10.1007/978-3-319-68783-4_5</mixed-citation></ref><ref id="scirp.119143-ref8"><label>8</label><mixed-citation publication-type="other" xlink:type="simple">Stephany, F., Stoehr, N., et al. (2020) The CoRisk-Index: A Data-Mining Approach to Identify Industry-Specific Risk Assessments Related to COVID-19 in Real-Time. SSRN Electronic Journal, 18 p. https://dx.doi.org/10.2139/ssrn.3607228</mixed-citation></ref><ref id="scirp.119143-ref9"><label>9</label><mixed-citation publication-type="other" xlink:type="simple">Ahmed, M.S., Aurpa, T.T. and Anwar, M.M. (2021) Detecting Sentiment Dynamics and Clusters of Twitter Users for Trending Topics in COVID-19 Pandemic. PLOS ONE, 16, e0253300. https://doi.org/10.1371/journal.pone.0253300</mixed-citation></ref><ref id="scirp.119143-ref10"><label>10</label><mixed-citation publication-type="other" xlink:type="simple">Das Badhan, C., and Anwar, M.B., et al. (2021) Attribute Driven Temporal Active Online Community Search. IEEE Access, 9, 93976-93989.https://doi.org/10.1109/ACCESS.2021.3093368</mixed-citation></ref><ref id="scirp.119143-ref11"><label>11</label><mixed-citation publication-type="other" xlink:type="simple">Anwar, M.M., Liu, C. and Li, J. (2018) Discovering and Tracking Query Oriented Active Online Social Groups in Dynamic Information Network. World Wide Web, 22, 1819-1854. https://doi.org/10.1007/s11280-018-0627-5</mixed-citation></ref></ref-list></back></article>