<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article  PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="3.0" xml:lang="en" article-type="research article"><front><journal-meta><journal-id journal-id-type="publisher-id">JCC</journal-id><journal-title-group><journal-title>Journal of Computer and Communications</journal-title></journal-title-group><issn pub-type="epub">2327-5219</issn><publisher><publisher-name>Scientific Research Publishing</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.4236/jcc.2019.73006</article-id><article-id pub-id-type="publisher-id">JCC-91494</article-id><article-categories><subj-group subj-group-type="heading"><subject>Articles</subject></subj-group><subj-group subj-group-type="Discipline-v2"><subject>Computer Science&amp;Communications</subject></subj-group></article-categories><title-group><article-title>
 
 
  A Clustering Approach for Customer Billing Prediction in Mall: A Machine Learning Mechanism
 
</article-title></title-group><contrib-group><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Sriramakrishnan</surname><given-names>Chandrasekaran</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref><xref ref-type="corresp" rid="cor1"><sup>*</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Abhishek</surname><given-names>Kumar</given-names></name><xref ref-type="aff" rid="aff2"><sup>2</sup></xref></contrib></contrib-group><aff id="aff1"><addr-line>Manager, KTech, Regional Delivery Center, KPMG LLP, Montvale, NJ, USA</addr-line></aff><aff id="aff2"><addr-line>Computer Science Engineering Department, ACERC, Ajmer, India</addr-line></aff><pub-date pub-type="epub"><day>04</day><month>03</month><year>2019</year></pub-date><volume>07</volume><issue>03</issue><fpage>55</fpage><lpage>66</lpage><history><date date-type="received"><day>9,</day>	<month>January</month>	<year>2019</year></date><date date-type="rev-recd"><day>25,</day>	<month>March</month>	<year>2019</year>	</date><date date-type="accepted"><day>28,</day>	<month>March</month>	<year>2019</year></date></history><permissions><copyright-statement>&#169; Copyright  2014 by authors and Scientific Research Publishing Inc. </copyright-statement><copyright-year>2014</copyright-year><license><license-p>This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0/</license-p></license></permissions><abstract><p>
 
 
  Machine learning implementations are being done in a long way in science and technology and especially in medical stream. In this article, we are focusing on machine learning implementation on mall customers and based on their income and how they can invest in the purchase in a mall. This explains the features like Customer ID, gender, age, income, and spending score. There, we mentioned a score in purchasing the goods in the mall. In this scenario, we are implementing clustering mechanisms, and here we apply the dataset of mall customers which is a public dataset and create clusters related to the customer purchase. We implement machine learning models for the prediction of whether the visited customer will purchase any product or not. For this kind of works, we require many of the inputs like the features mentioned in the paper. To maintain the features, we require a model with machine learning capability. We are performing K-Means clustering and Hierarchical clustering mechanisms, and finally, we implement a confusion matrix to achieve and identify the highest accuracy in those two algorithms. Here, we consider machine learning mechanisms to predict the category of the customer about whether they can buy a product or not based on the independent variables. This work presents you a simple machine learning prediction model based on which we can predict the category of the customer based on clustering. Before clustering, we don’t know to what group they belong to. But after clustering, we can identify the category that data node belongs to. In this article, we are mentioning the process of determining the employee based information using machine learning clustering mechanisms.
 
</p></abstract><kwd-group><kwd>Clustering</kwd><kwd> Machine Learning</kwd><kwd> Category</kwd><kwd> Technology</kwd><kwd> Hierarchical</kwd><kwd> K-Means</kwd></kwd-group></article-meta></front><body><sec id="s1"><title>1. Introduction</title><p>Machine learning mechanisms are widely used in a large number of applications related to science and technology, and we can implement those mechanisms even in employee-related things or student-related information. We need to predict something based on the information we have and with the past experiences. In this article, we are concentrating on the prediction of whether the customer will purchase any goods from the mall or not based on the gender and salary. Here, we have multiple independent variables and only one dependent variable which we need to predict whether the customer will make any bill on the product.</p><p>Hierarchical cluster mechanism is one algorithm we implement and the K-Means clustering mechanism we are implementing on the data we have. There are different scenarios for both of the clustering mechanisms, but the resultant work is common. The accuracy of the algorithm differs the most popular and most acceptable algorithm for the prediction model design and implementation. As per the clustering rules, we have two kinds of clustering mechanisms. One is hard clustering, and other is soft clustering.</p><p>1) Soft Clustering:</p><p>We need to identify whether the data point is belonged to any cluster instead of making every data point into the cluster we need to identify whether the current data point will fit into either of the existing data clusters.</p><p>2) Hard Clustering:</p><p>In this scenario, we need to find out whether the current dataset or data point belongs to the existing data set or not [<xref ref-type="bibr" rid="scirp.91494-ref1">1</xref>] [<xref ref-type="bibr" rid="scirp.91494-ref2">2</xref>] [<xref ref-type="bibr" rid="scirp.91494-ref3">3</xref>] . Consider if we have ten different datasets, we need to identify to which cluster the data point will belong.</p><p>There are different types of clustering mechanisms that are identified, and they are mentioned as follows. We need to learn about those, because we are utilizing two kinds of clusters in this mechanism [<xref ref-type="bibr" rid="scirp.91494-ref4">4</xref>] [<xref ref-type="bibr" rid="scirp.91494-ref5">5</xref>] [<xref ref-type="bibr" rid="scirp.91494-ref6">6</xref>] to identify whether the customer will purchase any product or not. Those are mentioned as follows.</p><p>Connectivity models are the first type which deals with the scenario of connecting the data points based on the category or the thing which is common in the relation. For supposition if one data point is lying far away and the new data point is related to the characteristic of the current data point, then there will be connectivity between the data point in the space.</p><p>Centroid models are another model which deals with similarity identification of the data point that will be done by how the data point is close to the centroid of the cluster. If the closeness from the centroid to the group is smaller, then there will be a good connection between then centroid, and the data point and the current data point will belong to the cluster which centroid belongs to [<xref ref-type="bibr" rid="scirp.91494-ref7">7</xref>] [<xref ref-type="bibr" rid="scirp.91494-ref8">8</xref>] [<xref ref-type="bibr" rid="scirp.91494-ref9">9</xref>] .</p><p>The next model is a distribution model which deals with the probability of how the data points in the cluster belong to the same distribution. Based on the probability notation, the distribution will form. The distributions may be Gaussian or any other type.</p><p>The last model type is density type. It deals with the search for the density of the data point in the data space [<xref ref-type="bibr" rid="scirp.91494-ref7">7</xref>] [<xref ref-type="bibr" rid="scirp.91494-ref8">8</xref>] [<xref ref-type="bibr" rid="scirp.91494-ref9">9</xref>] .</p><p>The difference between the previous researches and our model identification is a quite interesting thing. Because of models we implement and the plotting will be done in the form of required features instead of all the existing features. The feature extraction mechanism and the identification of what are the most prominent features in the model is most important thing we implemented. We use two kinds of classifiers and the plotting using the clustering mechanisms are main focus of gamification implementation and explanation. We focused on implementation of the models with those clustering mechanisms and the models will show the optimal model and the features to improve our model at any point of time of extension.</p><p>The following (<xref ref-type="fig" rid="fig1">Figure 1</xref>) [<xref ref-type="bibr" rid="scirp.91494-ref10">10</xref>] [<xref ref-type="bibr" rid="scirp.91494-ref11">11</xref>] [<xref ref-type="bibr" rid="scirp.91494-ref12">12</xref>] describes the structure of different clustering models in deals.</p><p>Here we are implementing the same mechanisms using two primary clustering mechanisms. In (<xref ref-type="fig" rid="fig2">Figure 2</xref>) we implement the distribution of clusters, in (<xref ref-type="fig" rid="fig3">Figure 3</xref>) we tried to project the Centroid model of clustering and in (<xref ref-type="fig" rid="fig4">Figure 4</xref>) we implemented density model of Clustering. They are K-Means (<xref ref-type="fig" rid="fig5">Figure 5</xref>) and hierarchical clustering mechanisms. In the K-Means clustering [<xref ref-type="bibr" rid="scirp.91494-ref13">13</xref>] [<xref ref-type="bibr" rid="scirp.91494-ref14">14</xref>] , we are implementing the method with k = 2 as default value and identify the cluster to</p><p>which data point will belong to. In the hierarchical clustering, we form the dendrograms related to the group. Based on which we can identify the category [<xref ref-type="bibr" rid="scirp.91494-ref15">15</xref>] [<xref ref-type="bibr" rid="scirp.91494-ref16">16</xref>] [<xref ref-type="bibr" rid="scirp.91494-ref17">17</xref>] . The sample dendrogram and cluster as follows in the 2D plane.</p><p>The lateral part of this article will deal with the explanation of the K-Means and hierarchical clustering (<xref ref-type="fig" rid="fig6">Figure 6</xref>) mechanisms related this approach discussed in the article abstract, which is predicting whether the customer will make a bill in the mall or not based on his age and salary as main independent</p><p>variables. Next section will describe the flow of the process, next with sample results and plottings, next, we conclude the process with sample future scope of the work [<xref ref-type="bibr" rid="scirp.91494-ref15">15</xref>] [<xref ref-type="bibr" rid="scirp.91494-ref16">16</xref>] [<xref ref-type="bibr" rid="scirp.91494-ref17">17</xref>] .</p></sec><sec id="s2"><title>2. Mechanisms</title><sec id="s2_1"><title>2.1. K-Means</title><p>K-means works on the iterative process of the algorithm which aims for the local maxima in each of the iterations. There may be different iteration values based on the K Value considered. Here in this process, we found K value as 2. And the following will be the steps to be mentioned [<xref ref-type="bibr" rid="scirp.91494-ref18">18</xref>] [<xref ref-type="bibr" rid="scirp.91494-ref19">19</xref>] .</p><p>Initially, we need to specify the number of clusters K in the 2 D space. In this regard, we are considering k as 2</p><p>In this above image, we can see that we considered two as the K value and the five different data point in the 2D plane space [<xref ref-type="bibr" rid="scirp.91494-ref18">18</xref>] [<xref ref-type="bibr" rid="scirp.91494-ref19">19</xref>] .</p><p>We need to assign each data point to the cluster available. Suppose in this regard we are considering there are two clusters [<xref ref-type="bibr" rid="scirp.91494-ref19">19</xref>] [<xref ref-type="bibr" rid="scirp.91494-ref20">20</xref>] [<xref ref-type="bibr" rid="scirp.91494-ref21">21</xref>] which are mentioned in Red and White as indicated in (<xref ref-type="fig" rid="fig7">Figure 7</xref>).</p><p>Now we need to compute the centroid of the data points. They are mentioned in this below image as a cross symbol. For the red cluster red cross is mentioned, and for the white cluster, the white color cross is specified as below (<xref ref-type="fig" rid="fig8">Figure 8</xref>).</p><p>1) Verify whether the newly created centroid is closest to the related category of the data points or not of the centroid is far from the data points of the same type then re-assign the centroid to the related data points in the cluster. The same mentioned in image nine below.</p><p>If we observer <xref ref-type="fig" rid="fig9">Figure 9</xref>, we can identify that there is an increase in white category data points and a decrease in red category data points. That happened when the centroid is far from the data points of the same category.</p><p>2) Re-compute the centroid based on the available data points if necessary as the new iteration of the data points. The following is the procedure of the centroid re-computing process [<xref ref-type="bibr" rid="scirp.91494-ref21">21</xref>] .</p><p>Repeat the previous two steps until there are improvements identified in the cluster (<xref ref-type="fig" rid="fig1">Figure 1</xref>0).</p></sec><sec id="s2_2"><title>2.2. Hierarchical Clustering</title><p>As the name mentions there will be the hierarchy of the clusters based on the data pointsint he 2D plane or 2D space. In this regard, we design dendrograms which are related to the data points in few iterations as done in the K-means algorithm. First, the cluster starts with the data point assigned to it and then it will merge to the nearest data point in the space and forms the group. For every iteration, there will be a massive change in the cluster and the centroid of the cluster [<xref ref-type="bibr" rid="scirp.91494-ref21">21</xref>] .</p><p>Dendrogram of the cluster will be formed for every iteration, and the best</p><p>choice of the number of groups will be 4, and the red lines mentioned in <xref ref-type="fig" rid="fig1">Figure 1</xref>1 defines the maximum vertical distance [<xref ref-type="bibr" rid="scirp.91494-ref22">22</xref>] .</p></sec></sec><sec id="s3"><title>3. Process Flow</title><p>We are maintaining individual process flow for K-Means and Hierarchical clustering mechanism. They are described in this article with sample codes (<xref ref-type="fig" rid="fig1">Figure 1</xref>2, <xref ref-type="fig" rid="fig1">Figure 1</xref>3).</p><p>1) K-Means</p><p>The process consists of the following steps:</p><p>a) Import the libraries</p><p>b) Import the related dataset in CSV or JSON format</p><p>c) Perform Feature scaling</p><p>d) Split the dataset into test and train set</p><p>e) Use the elbow method to identify the optimal number of clusters</p><p>f) Fit K-Means to the dataset</p><p>g) Visualizing the Cluster</p><p>2) Hierarchical Clustering</p><p>The process consists of the following steps:</p><p>a) Import the libraries</p><p>b) Import the related dataset</p><p>c) Perform feature scaling</p><p>d) Split the dataset</p><p>e) Using dendrogram find the optimal number of clusters</p><p>f) Fit hierarchical clustering to the dataset</p><p>g) Visualize the cluster</p></sec><sec id="s4"><title>4. Results</title><p>The following are the results of the two mechanisms used in this architecture. The first one is K-Means and the second one is Hierarchical Clustering. As mentioned in previous discussions K-Means clustering here will identify whether the model designed with the features will identify the desired result is obtained or not. We use The Elbow method for this implementation to predict whether the customer will make bill in the mail or not based on his features.</p><p>1) K-Means</p><p>In this scenario, there are two sample plots which are consisting of identifying the number of possible clusters and then visualizing the number of groups. The resultant of the clusters is as follows in <xref ref-type="fig" rid="fig1">Figure 1</xref>4, <xref ref-type="fig" rid="fig1">Figure 1</xref>5.</p><p>2) Hierarchical Clustering</p><p>The following are the outputs we acquired for hierarchical clustering mechanisms.</p><p>First one is sample dendrogram (<xref ref-type="fig" rid="fig1">Figure 1</xref>6) and the second one (<xref ref-type="fig" rid="fig1">Figure 1</xref>7) is the clusters of the customers based on the annual income.</p></sec><sec id="s5"><title>5. Conclusion</title><p>We conclude the article with the sample outputs of the K-Means clustering and Hierarchical clustering. There are few scenarios in which we need to perform backward elimination process for identifying the best feature for the model to acquire the best accuracy in the models. As per the acquired results, we identified the best fit model for the addressed problem is k-means. The main reason behind highest accuracy of k-means is because of recurrent changes in Centroid based on the nodes modifications. The point of view of researchers is to identify whether there is a chance of identifying for the path purchasing the item in mall. But the model here requires a simple thing like feature extraction. More number of features will make the model wrong and not optimal. To find the optimal path of the model, we try to implement confusion matrix and identify the difference between obtained predicted result and actual result we want. The future scope of this research is to identify more optimal features to improve the model for better identification of the customer billing prediction.</p></sec><sec id="s6"><title>Conflicts of Interest</title><p>The authors declare no conflicts of interest regarding the publication of this paper.</p></sec><sec id="s7"><title>Cite this paper</title><p>Chandrasekaran, S. and Kumar, A. (2019) A Clustering Approach for Customer Billing Prediction in Mall: A Machine Learning Mechanism. Journal of Computer and Communications, 7, 55-66. https://doi.org/10.4236/jcc.2019.73006</p></sec></body><back><ref-list><title>References</title><ref id="scirp.91494-ref1"><label>1</label><mixed-citation publication-type="other" xlink:type="simple">To, K.B. and Napolitano, L.M. (2012) Common Complications in the Critically Ill Patient. Surgical Clinics of North America, 92, 1519-1557.</mixed-citation></ref><ref id="scirp.91494-ref2"><label>2</label><mixed-citation publication-type="other" xlink:type="simple">Wollschlager, C.M. and Conrad, A.R. (1988) Common Complications in Critically Ill Patients. Disease-a-Month, 34, 225-293.</mixed-citation></ref><ref id="scirp.91494-ref3"><label>3</label><mixed-citation publication-type="other" xlink:type="simple">Desai, S.V., Law, T.J. and Needham, D.M. (2011) Long-Term Complications of Critical Care. Critical Care Medicine, 39, 371-379.</mixed-citation></ref><ref id="scirp.91494-ref4"><label>4</label><mixed-citation publication-type="other" xlink:type="simple">Perichappan, K.A., Sasubilli, S. and Khurshudyan, A.Z. (2018) Approximate Analytical Solution to Non-Linear Young-Laplace Equation with an Infinite Boundary Condition. 2018 International Conference on Computing, Mathematics and Engineering Technologies (iCoMET), Sukkur, 3-4 March 2018, 1-5.https://doi.org/10.1109/ICOMET.2018.8346349</mixed-citation></ref><ref id="scirp.91494-ref5"><label>5</label><mixed-citation publication-type="other" xlink:type="simple">Johnson, A.E.W., Ghassemi, M.M., Nemati, S., Niehaus, K.E., Clifton, D.A. and Clifford, G.D. (2016) Machine Learning and Decision Support in Critical Care. Proceedings of the IEEE, 104, 444-466. https://doi.org/10.1109/JPROC.2015.2501978</mixed-citation></ref><ref id="scirp.91494-ref6"><label>6</label><mixed-citation publication-type="other" xlink:type="simple">Badawi, O., et al. (2014) Making Big Data Useful for Health Care: A Summary of the Inaugural MIT Critical Data Conference. JMIR Medical Informatics, 2, e22.https://doi.org/10.2196/medinform.3447</mixed-citation></ref><ref id="scirp.91494-ref7"><label>7</label><mixed-citation publication-type="other" xlink:type="simple">Reddy, C.K. and Aggarwal, C.C. (2015) Healthcare Data Analytics. Vol. 36, CRC Press, Boca Raton, FL. https://doi.org/10.1201/b18588</mixed-citation></ref><ref id="scirp.91494-ref8"><label>8</label><mixed-citation publication-type="other" xlink:type="simple">Gotz, D., Stavropoulos, H., Sun, J. and Wang, F. (2012) ICDA: A Platform for Intelligent Care Delivery Analytics. AMIA Annual Symposium Proceedings, 2012, 264-273.</mixed-citation></ref><ref id="scirp.91494-ref9"><label>9</label><mixed-citation publication-type="other" xlink:type="simple">Perer, A. and Sun, J. (2012) MatrixFlow: Temporal Network Visual Analytics to Track Symptom Evolution during Disease Progression. AMIA Annual Symposium Proceedings, 2012, 716-725.</mixed-citation></ref><ref id="scirp.91494-ref10"><label>10</label><mixed-citation publication-type="other" xlink:type="simple">Mao, Y., Chen, W., Chen, Y., Lu, C., Kollef, M. and Bailey, T. (2012) An Integrated Data Mining Approach to Real-Time Clinical Monitoring and Deterioration Warning. Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Beijing, 12-16 August 2012, 1140-1148. https://doi.org/10.1145/2339530.2339709</mixed-citation></ref><ref id="scirp.91494-ref11"><label>11</label><mixed-citation publication-type="other" xlink:type="simple">Wiens, J., Horvitz, E. and Guttag, J.V. (2012) Patient Risk Stratification for Hospital-Associated C. Diff as a Time-Series Classification Task. Advances in Neural Information Processing Systems, 467-475.</mixed-citation></ref><ref id="scirp.91494-ref12"><label>12</label><mixed-citation publication-type="other" xlink:type="simple">Saria, S., Koller, D. and Penn, A. (2010) Learning Individual and Population Level Traits from Clinical Temporal Data. Neural Information Processing Systems (NIPS), Predictive Models Personalized Med. Workshop.</mixed-citation></ref><ref id="scirp.91494-ref13"><label>13</label><mixed-citation publication-type="other" xlink:type="simple">Dürichen, R., Pimentel, M.A.F., Clifton, L., Schweikard, A. and Clifton, D.A. (2015) Multitask Gaussian Processes for Multivariate Physiological Time-Series Analysis. IEEE Transactions on Biomedical Engineering, 62, 314-322.</mixed-citation></ref><ref id="scirp.91494-ref14"><label>14</label><mixed-citation publication-type="other" xlink:type="simple">Ghassemi, M., et al. (2015) A Multivariate Time Series Modeling Approach to Severity of Illness Assessment and Forecasting in ICU with Sparse, Heterogeneous Clinical Data. AAAI Conference on Artificial Intelligence, 2015, 446-453.</mixed-citation></ref><ref id="scirp.91494-ref15"><label>15</label><mixed-citation publication-type="other" xlink:type="simple">Batal, I., Valizadegan, H., Cooper, G.F. and Hauskrecht, M. (2011) A Pattern Mining Approach for Classifying Multivariate Temporal Data. Proceedings of the IEEE International Conference on Bioinformatics (BIBM), 2011, 358-365.</mixed-citation></ref><ref id="scirp.91494-ref16"><label>16</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Lasko</surname><given-names> T.A. </given-names></name>,<etal>et al</etal>. (<year>2014</year>)<article-title>Efficient Inference of Gaussian-Process-Modulated Renewal Processes with Application to Medical Event Data</article-title><source> Uncertainty in Artificial Intelligence</source><volume> 2014</volume>,<fpage> 469</fpage>-<lpage>476</lpage>.<pub-id pub-id-type="doi"></pub-id></mixed-citation></ref><ref id="scirp.91494-ref17"><label>17</label><mixed-citation publication-type="other" xlink:type="simple">Barajas, K.L.C. and Akella, R. (2015) Dynamically Modeling Patient’s Health State from Electronic Medical Records: A Time Series Approach. 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, 10-13 August 2015, 69-78.</mixed-citation></ref><ref id="scirp.91494-ref18"><label>18</label><mixed-citation publication-type="other" xlink:type="simple">Wang, X., Sontag, D. and Wang, F. (2014) Unsupervised Learning of Disease Progression Models. 20th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, New York, 24-27 August 2014, 85-94. https://doi.org/10.1145/2623330.2623754</mixed-citation></ref><ref id="scirp.91494-ref19"><label>19</label><mixed-citation publication-type="other" xlink:type="simple">Sasubilli, S., Perichappan, K.A.P., Kumar, P.S. and Kumar, A. (2018) An Approach towards Economical Hierarchic Search over Encrypted Cloud. Annals of Computer Science and Information Systems, 14, 125-129.</mixed-citation></ref><ref id="scirp.91494-ref20"><label>20</label><mixed-citation publication-type="other" xlink:type="simple">Zhou, J., Liu, J., Narayan, V.A. and Ye, J. (2012) Modeling Disease Progression via Fused Sparse Group Lasso. 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Beijing, 12-16 August 2012, 1095-1103. https://doi.org/10.1145/2339530.2339702</mixed-citation></ref><ref id="scirp.91494-ref21"><label>21</label><mixed-citation publication-type="other" xlink:type="simple">Choi, E., Du, N., Chen, R., Song, L. and Sun, J. (2015) Constructing Disease Network and Temporal Progression Model via Context-Sensitive Hawkes Process. Proceedings of the IEEE International Conference on Data Mining Workshop, 14-17 November 2015, 721-726. https://doi.org/10.1109/ICDM.2015.144</mixed-citation></ref><ref id="scirp.91494-ref22"><label>22</label><mixed-citation publication-type="other" xlink:type="simple">Pivovarov, R., Perotte, A.J., Grave, E., Angiolillo, J., Wiggins, C.H. and Elhadad, N. (2015) Learning Probabilistic Phenotypes from Heterogeneous HER Data. Journal of Biomedical Informatics, 58, 156-165.</mixed-citation></ref></ref-list></back></article>