<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article  PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="3.0" xml:lang="en" article-type="research article"><front><journal-meta><journal-id journal-id-type="publisher-id">AM</journal-id><journal-title-group><journal-title>Applied Mathematics</journal-title></journal-title-group><issn pub-type="epub">2152-7385</issn><publisher><publisher-name>Scientific Research Publishing</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.4236/am.2018.98065</article-id><article-id pub-id-type="publisher-id">AM-86931</article-id><article-categories><subj-group subj-group-type="heading"><subject>Articles</subject></subj-group><subj-group subj-group-type="Discipline-v2"><subject>Physics&amp;Mathematics</subject></subj-group></article-categories><title-group><article-title>
 
 
  Methodology for Constructing a Short-Term Event Risk Score in Heart Failure Patients
 
</article-title></title-group><contrib-group><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Kévin</surname><given-names>Duarte</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref><xref ref-type="corresp" rid="cor1"><sup>*</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Jean-Marie</surname><given-names>Monnez</given-names></name><xref ref-type="aff" rid="aff2"><sup>2</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Eliane</surname><given-names>Albuisson</given-names></name><xref ref-type="aff" rid="aff3"><sup>3</sup></xref></contrib></contrib-group><aff id="aff2"><addr-line>CHRU Nancy, INSERM, Université de Lorraine, CIC, Plurithématique, Nancy, France</addr-line></aff><aff id="aff3"><addr-line>Institut Elie Cartan de Lorraine, Université de Lorraine, CNRS, Nancy, France</addr-line></aff><aff id="aff1"><addr-line>CNRS, INRIA, Institut Elie Cartan de Lorraine, Université de Lorraine, Nancy, France</addr-line></aff><pub-date pub-type="epub"><day>14</day><month>08</month><year>2018</year></pub-date><volume>09</volume><issue>08</issue><fpage>954</fpage><lpage>974</lpage><history><date date-type="received"><day>2,</day>	<month>May</month>	<year>2018</year></date><date date-type="rev-recd"><day>26,</day>	<month>August</month>	<year>2018</year>	</date><date date-type="accepted"><day>29,</day>	<month>August</month>	<year>2018</year></date></history><permissions><copyright-statement>&#169; Copyright  2014 by authors and Scientific Research Publishing Inc. </copyright-statement><copyright-year>2014</copyright-year><license><license-p>This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0/</license-p></license></permissions><abstract><p>
 
 
  
    We present a methodology for constructing a short-term event risk score in heart failure patients from an ensemble predictor, using bootstrap samples, two different classification rules, logistic regression and linear discriminant analysis for mixed data, continuous or categorical, and random selection of explanatory variables to build individual predictors. We define a measure of the importance of each variable in the score and an event risk measure by an odds-ratio. Moreover, we establish a property of linear discriminant analysis for mixed data. This methodology is applied to EPHESUS trial patients on whom biological, clinical and medical history variables were measured. 
  
 
</p></abstract><kwd-group><kwd>Ensemble Predictor</kwd><kwd> Linear Discriminant Analysis</kwd><kwd> Logistic Regression</kwd><kwd> Mixed Data</kwd><kwd> Scoring</kwd><kwd> Supervised Classification</kwd></kwd-group></article-meta></front><body><sec id="s1"><title>1. Introduction</title><p>In this study, we focus on the problem of constructing a short-term event risk score in heart failure patients based on observations of biological, clinical and medical history variables.</p><p>Numerous event risk scores in heart failure patients have been proposed in recent years, but one aspect is particularly important to consider in the construction of a score and in the relevance of the results obtained. This concerns the choice of classification models whose conditions of use may be restrictive. The most currently used classification models in these studies are logistic regression and Cox proportional hazard model. Quoting for example the Seattle Heart Failure Model (SHFM) risk score [<xref ref-type="bibr" rid="scirp.86931-ref1">1</xref>] and the Seattle Post Myocardial Infarction Model (SPIM) risk score [<xref ref-type="bibr" rid="scirp.86931-ref2">2</xref>] which allow respectively predicting survival in chronic and post-infarction heart failure patients:</p><p>• SHFM risk score was derived in a cohort of 1153 patients with ejection fraction &lt; 30% and New York Heart Association (NYHA) class III to IV and validated in 5 other cohorts of patients with similar characteristics. Area under ROC curve (AUC) at 1 year was 0.725 in resubstitution and ranged from 0.679 to 0.810 in the 5 validation cohorts.</p><p>• SPIM risk score was derived in a cohort of 6632 patients from the Eplerenone Post-Acute Myocardial Infarction Heart Failure Efficacy and Survival Study (EPHESUS) trial [<xref ref-type="bibr" rid="scirp.86931-ref3">3</xref>] and validated on a cohort of 5477 patients. AUC at 1 year was 0.742 in derivation and 0.774 in validation.</p><p>These two risk scores were developed using Cox proportional hazard model and characteristics available at baseline as explanatory variables. Overall, there are several limitations to using these risk scores. They were constructed using only data available at baseline. However, as many studies include inclusion criteria based on clinical or biological parameters measured at baseline, it is possible that some variables are not present in the score due to these inclusion criteria. For example, patients were included in the EPHESUS trial only if their potassium level at baseline was less than 5 mmol/L. This is a reason why potassium is not present in the SPIM score although this is an important parameter which moreover may evolve considerably over time. Concerning the model, the Cox proportional risk model assumes the proportionality of risks, an important condition not always obtained and verified.</p><p>In this study, we used a new approach:</p><p>• we develop a methodology for constructing a short-term event (death or hospitalization) risk score, taking into account the most recent values of the parameters and therefore the closest values of an event, in order to generate alerts and eventually immediately modify drug prescription; using EPHESUS trial data, we could only construct a score at 1 month in order not to have too few patients with event in the learning sample; but with the same methodology, a score could be constructed at a closer time;</p><p>• we use an ensemble predictor, that is more stable than a predictor built on a single learning sample, using bootstrap samples; this allows an internal validation of the score using AUC out-of-bag (OOB); moreover, we use two classification methods, logistic regression and linear discrimination analysis, and, in order to avoid overlearning, for each predictor we use a random selection of explanatory variables, after testing other methods of selection that did not give better results, the number of drawn variables being optimized after testing all possible choices;</p><p>• furthermore, our method of construction can be adapted to data streams: when patient data arrives continuously, the coefficients of variables in the score function can be updated online.</p><p>In the next section, we present how we defined the learning sample using the available data from EPHESUS trial and the list of explanatory variables used. In the third section, we state a property of linear discriminant analysis (LDA) for mixed data, continuous or categorical. In the fourth section, after presenting the methodology used to build a risk score and to reduce its variation scale from 0 to 100, we define a measure of the importance of variables or groups of correlated variables in the score and a measure of the event risk by an odds-ratio. In the fifth section, we describe the results obtained by applying our methodology to our data. The paper ends with a conclusion.</p></sec><sec id="s2"><title>2. Data</title><p>The database at our disposal was EPHESUS, a clinical trial that included 6632 patients with heart failure (HF) after acute myocardial infarction (MI) complicated by left ventricular systolic dysfunction (left ventricular ejection fraction &lt; 40%) [<xref ref-type="bibr" rid="scirp.86931-ref3">3</xref>] . All patients were randomly assigned to treatment with eplerenone 25 mg/day or placebo.</p><p>In this trial, each patient was regularly monitored, with visits at the inclusion in the study (baseline), 1 month after inclusion, 3 months later, then every 3 months until the end of follow-up. At each visit, biological, clinical parameters or medical history were observed. In addition, all adverse events (deaths, hospitalizations, diseases) that occurred during follow-up were collected.</p><p>To define the learning sample used to construct the short-term event risk score, we made the following working hypothesis: based on biological, clinical measurements or medical history on a patient at a fixed time, we sought to assess the risk that this patient has a short-term HF event. The individuals considered are couples (patient-month) without taking into account the link between several couples (patient-month) concerning the same patient. Therefore, it was assumed that the short-term future of a patient depends only on his current measures.</p><p>Firstly, we did a full review of the database in order to:</p><p>• identify the biological and clinical variables that were regularly measured at each visit,</p><p>• determine the medical history data that we could update from information collected during the follow-up.</p><p>We were thus able to define a set of 27 explanatory variables whose list is presented in <xref ref-type="fig" rid="fig1">Figure 1</xref>. Estimated plasma volume derived from Strauss formula (ePVS) was defined in [<xref ref-type="bibr" rid="scirp.86931-ref4">4</xref>] . Estimated glomerular filtration rate (eGFR) was assessed using three formulas [<xref ref-type="bibr" rid="scirp.86931-ref5">5</xref>] [<xref ref-type="bibr" rid="scirp.86931-ref6">6</xref>] [<xref ref-type="bibr" rid="scirp.86931-ref7">7</xref>] . The different types of hospitalization were defined in supplementary material of [<xref ref-type="bibr" rid="scirp.86931-ref3">3</xref>] .</p><p>Then, we defined the response variable as the occurrence of a composite short-term HF event (death or hospitalization for progression of HF). In order to have enough events, we defined the short term as being equal to 30 days. Patient-months with a follow-up of less than 30 days and no short-term HF event during this incomplete follow-up period, were not taken into account.</p><p>There were finally 21,382 patient-months from 5937 different patients whose 317 with short-term HF event and 21,065 with no short-term event.</p></sec><sec id="s3"><title>3. Property of Linear Discriminant Analysis of Mixed Data</title><p>Denote A' the transposed of a matrix A.</p><p>In case of mixed data, categorical and continuous, a classical method to perform a discriminant analysis is:</p><p>1) perform a preliminary factorial analysis according to the nature of the data, such as multiple correspondence factorial analysis (MCFA) [<xref ref-type="bibr" rid="scirp.86931-ref8">8</xref>] for categorical data, multiple factorial analysis (MFA) [<xref ref-type="bibr" rid="scirp.86931-ref9">9</xref>] for groups of variables, mixed data factorial analysis (MDFA) [<xref ref-type="bibr" rid="scirp.86931-ref10">10</xref>] , ... ;</p><p>2) after defining a convenient distance, perform a discriminant analysis from the set of values of principal components, or factors.</p><p>See for example the DISQUAL (DIScrimination on QUALitative variables) method of Saporta [<xref ref-type="bibr" rid="scirp.86931-ref11">11</xref>] , which performs MCFA, then LDA or quadratic discriminant analysis (QDA).</p><p>Denote as usual T the total inertia matrix of a dataset partitioned in classes, W and B respectively its intraclass and interclass inertia matrix.</p><p>We show hereafter that when performing LDA with metrics T<sup>−1</sup> or W<sup>−1</sup>, it is not necessary to perform a preliminary factorial analysis and LDA can be directly performed from the raw mixed data.</p><p>Metrics W<sup>−1</sup> will be used in the following but can be replaced by T<sup>−1</sup>.</p><p>Let I = { 1 , 2 , ⋯ , n } a set of n individuals, partitioned in q disjoint classes I 1 , ⋯ , I q . Denote n k = c a r d ( I k ) , p k i the weight of i<sup>th</sup> individual of class I k ( i = 1 , ⋯ , n k ; k = 1 , ⋯ , q ) and P k = ∑ i = 1 n k p k i the weight of I k , with ∑ q k = 1 P k = 1 . p quantitative variables or indicators of modalities of categorical variables, denoted x 1 , ⋯ , x p , are observed on these individuals. Suppose that there exists no affine relation between these variables, especially for each categorical variable an indicator is removed.</p><p>For j = 1 , ⋯ , p , denote x k i j the value of x j for i<sup>th</sup> individual of class I k . Denote x k i the vector ( x k i 1 ⋯ x k i p ) ′ and g k the barycenter of the elements x k i for i ∈ I k :</p><p>g k = 1 P k ∑ i ∈ I k p k i x k i . (1)</p><p>Intraclass inertia ( p , p ) matrix W is supposed invertible:</p><p>W = ∑ k = 1 q ∑ i = 1 n k p k i ( x k i − g k ) ( x k i − g k ) ′ . (2)</p><p>A currently used distance in LDA d W − 1 ( a , b ) between two points a and b in ℝ p is such that:</p><p>d W − 1 2 ( a , b ) = ( a − b ) ′ W − 1 ( a − b ) . (3)</p><p>Suppose we want to classify an individual knowing the vector a of values of x 1 , ⋯ , x p . Principle of LDA is to classify it in I k such that d W − 1 2 ( a , g k ) is minimal.</p><p>Consider now new variables y 1 , ⋯ , y m affine combinations of x 1 , ⋯ , x p , with m ≥ p , such that:</p><p>y k i = A x k i + β , (4)</p><p>with y k i = ( y k i 1 ⋯ y k i m ) ′ , A a ( m , p ) matrix of rank p and β a vector in ℝ m .</p><p>Denote h k the barycenter of vectors y k i in ℝ m for i ∈ I k :</p><p>h k = 1 P k ∑ i ∈ I k p k i y k i = 1 P k ∑ i ∈ I k p k i ( A x k i + β ) = A g k + β , (5)</p><p>y k i − h k = A ( x k i − g k ) . (6)</p><p>Let Z the intraclass inertia ( m , m ) matrix of { y k i , i = 1 , ⋯ , n k ; k = 1 , ⋯ , q } :</p><p>Z = ∑ k = 1 q ∑ i ∈ I k p k i ( y k i − h k ) ( y k i − h k ) ′ = A W A ′ . (7)</p><p>The rank of Z is equal to the rank of A, p ≤ m . For m &gt; p , the ( m , m ) matrix Z is not invertible. Then use in this case the pseudoinverse (or Moore-Penrose inverse) of Z, denoted Z<sup>+</sup>, which is equal to the inverse of Z when m = p , to define the pseudodistance denoted d Z + in ℝ m . The denomination pseudodistance is used because Z<sup>+</sup> is not positive definite. Remind the definition of a pseudoinverse and two theorems [<xref ref-type="bibr" rid="scirp.86931-ref12">12</xref>] .</p><p>Definition Let A a ( k , l ) matrix of rank r. The pseudo-inverse of A is the unique ( l , k ) matrix A<sup>+</sup> such that:</p><p>1) A A + A = A ,</p><p>2) A + A A + = A + ,</p><p>3) ( A A + ) ′ = A A + ,</p><p>4) ( A + A ) ′ = A + A .</p><p>Theorem 1 Maximal rank decomposition</p><p>Let A a ( k , l ) matrix of rank r. Then there exist two full-rank (r) matrices, F of dimension ( k , r ) and G of dimension ( r , l ) ( r g ( F ) = r g ( G ) = r ) such that A = F G .</p><p>Theorem 2 Expression of A<sup>+</sup></p><p>Let A = F G a full-rank decomposition of A. Then A + = G ′ ( F ′ A G ′ ) − 1 F ′ .</p><p>Prove now:</p><p>Proposition 1 d Z + 2 ( A a + β , A b + β ) = d W − 1 2 ( a , b ) .</p><p>Proof. Z = ( A W ) A ′ . AW and A' are of full-rank p. Applying theorem 2 yields:</p><p>Z + = A ( ( A W ) ′ A W A ′ A ) − 1 ( A W ) ′ (8)</p><p>= A ( A ′ A ) − 1 ( W A ′ A W ) − 1 ( A W ) ′ (9)</p><p>= A ( A ′ A ) − 1 W − 1 ( A ′ A ) − 1 A ′ . (10)</p><p>A ′ Z + A = W − 1 . (11)</p><p>Note that, when m = p , A is invertible and Z + = ( A W A ′ ) − 1 = Z − 1 .</p><p>d Z + 2 ( A a + β , A b + β ) = ( A ( a − b ) ) ′ Z + ( A ( a − b ) ) = ( a − b ) ′ W − 1 ( a − b ) . □</p><p>Thus:</p><p>Proposition 2 Let A a ( m , p ) matrix, m &gt; p , of rank p and for k = 1 , ⋯ , q , i = 1 , ⋯ , n k , y k i = A x k i + β . The results of LDA of the dataset { x k i , k = 1 , ⋯ , q , i = 1 , ⋯ , n k } with the metrics W<sup>−1</sup> on ℝ p are the same as those of LDA of the dataset { y k i , k = 1 , ⋯ , q , i = 1 , ⋯ , n k } with the pseudometrics Z + = ( A W A ′ ) + .</p><p>Applications</p><p>Denote x i j the value of the variable x j for individual i belonging to I, i = 1 , ⋯ , n , j = 1 , ⋯ , p and x i = ( x i 1 ⋯ x i p ) ′ the vector of values of ( x 1 , ⋯ , x p ) for individual i. Denote p i the weight of individual i, such that ∑ i = 1 n p i = 1 . To perform a factorial analysis of the dataset { x i , i = 1 , ⋯ , n } , the difference between two individuals i and i' is measured by a distance d ( i , i ′ ) defined on ℝ p associated to a metrics M, such that</p><p>d 2 ( i , i ′ ) = ( x i − x i ′ ) ′ M ( x i − x i ′ ) . (12)</p><p>Denote X the ( n , p ) matrix whose element ( i , j ) is x i j . Denote D the diagonal ( n , n ) matrix whose element ( i , i ) is p i .</p><p>Perform a factorial analysis of ( X , M , D ) , for instance principal component analysis (PCA) for continuous variables or MCFA for categorical variables or MDFA for mixed data. Suppose X of rank p. Denote u j = ( u j 1 ⋯ u j p ) ′ a unit vector of the j<sup>th</sup> principal axis. Denote c j = X M u j = ( c 1 j ⋯ c n j ) ′ the j<sup>th</sup> principal component. Denote U the ( p , p ) matrix ( u 1 ⋯ u p ) and C the ( n , p ) matrix ( c 1 ⋯ c p ) = X M U ; as u 1 , ⋯ , u p are M-orthonormal, U ′ M U = I and</p><p>C = X M U ⇔ X = C U ′ ⇔ for   i = 1 , ⋯ , n , x i = U c i (13)</p><p>⇔ for   i = 1 , ⋯ , n , c i = U ′ M x i . (14)</p><p>Using the metrics of intraclass inertia matrix inverse, LDA from C is equivalent to LDA from X.</p><p>Suppose now that the variable x p + 1 = 1 − x p is introduced; when x p is the indicator of a modality of a binary variable, x p + 1 is the indicator of the other modality. Then:</p><p>( x i 1 ⋮ x i p x i p + 1 ) = ( u 1 1 ⋯ u p 1 ⋮ ⋱ ⋮ u 1 p ⋯ u p p − u 1 p ⋯ − u p p ) ( c i 1 ⋮ c i p ) + ( 0 ⋮ 0 1 ) (15)</p><p>Denote X<sub>1</sub> the ( n , p + 1 ) matrix whose element ( i , j ) is x i j . LDA from C with the metrics of intraclass inertia matrix inverse is equivalent to LDA from X<sub>1</sub> with the metrics of intraclass inertia matrix pseudoinverse.</p><p>For instance:</p><p>1) If x 1 , ⋯ , x p are continuous variables, LDA from X is equivalent to LDA from C obtained by PCA, such as normed PCA, or generalized canonical correlation analysis (gCCA) [<xref ref-type="bibr" rid="scirp.86931-ref13">13</xref>] and MFA which can be interpreted as PCA with specific metrics.</p><p>2) If x 1 , ⋯ , x p are indicators of modalities of categorical variables, and if MCFA is performed to obtain C, LDA from C with the metrics of intraclass inertia matrix inverse is equivalent to LDA from X with the metrics of intraclass inertia matrix pseudoinverse.</p><p>3) Likewise, if x 1 , ⋯ , x p are continuous variables or indicators of modalities of categorical variables, and if MDFA [<xref ref-type="bibr" rid="scirp.86931-ref10">10</xref>] is performed to obtain C, LDA from C with the metrics of intraclass inertia matrix inverse is equivalent to LDA from X with the metrics of intraclass inertia matrix pseudoinverse. In this case, other metrics can also be used, such as that of Friedman [<xref ref-type="bibr" rid="scirp.86931-ref14">14</xref>] or that of Gower [<xref ref-type="bibr" rid="scirp.86931-ref15">15</xref>] .</p></sec><sec id="s4"><title>4. Methodology for Constructing a Score</title><sec id="s4_1"><title>4.1. Ensemble Methods</title><p>Consider the problem of predicting an outcome variable y, continuous (in the case of regression) or categorical (in the case of classification) from observable explanatory variables x 1 , ⋯ , x p , continuous or categorical.</p><p>The principle of an ensemble method [<xref ref-type="bibr" rid="scirp.86931-ref16">16</xref>] [<xref ref-type="bibr" rid="scirp.86931-ref17">17</xref>] is to build a collection of N predictors and then aggregate the N predictions obtained using:</p><p>• in regression: the average of predictions y i ^ ;</p><p>• in classification: the rule of the majority vote or the average of the estimations of a posteriori class probabilities.</p><p>The ensemble predictor is expected to be better than each of the individual predictors. For this purpose [<xref ref-type="bibr" rid="scirp.86931-ref16">16</xref>] :</p><p>• each single predictor must be relatively good,</p><p>• single predictors must be sufficiently different from each other.</p><p>To build a set of predictors, we can:</p><p>• use different classifiers,</p><p>• and/or use different samples (e.g. by bootstrapping, boosting, randomizing outputs) [<xref ref-type="bibr" rid="scirp.86931-ref17">17</xref>] [<xref ref-type="bibr" rid="scirp.86931-ref18">18</xref>] [<xref ref-type="bibr" rid="scirp.86931-ref19">19</xref>] ,</p><p>• and/or use different methods of variables selection (e.g. ascending, stepwise, shrinkage, random) [<xref ref-type="bibr" rid="scirp.86931-ref20">20</xref>] [<xref ref-type="bibr" rid="scirp.86931-ref21">21</xref>] [<xref ref-type="bibr" rid="scirp.86931-ref22">22</xref>] [<xref ref-type="bibr" rid="scirp.86931-ref23">23</xref>] ,</p><p>• and/or in general, introduce randomness into the construction of predictors (e.g. in random forests [<xref ref-type="bibr" rid="scirp.86931-ref24">24</xref>] , randomly select a fixed number of variables at each node of a classification or regression tree).</p><p>In Random Generalized Linear Model (RGLM) [<xref ref-type="bibr" rid="scirp.86931-ref25">25</xref>] , at each iteration,</p><p>• a bootstrap sample is drawn,</p><p>• a fixed number of variables are randomly selected,</p><p>• the selected variables are rank-ordered according to their individual association with the outcome variable y and only the top ranking variables are retained,</p><p>• an ascending selection of variables is made using Akaike information criterion (AIC) [<xref ref-type="bibr" rid="scirp.86931-ref26">26</xref>] or Bayesian information criterion (BIC) [<xref ref-type="bibr" rid="scirp.86931-ref27">27</xref>] .</p><p>Tuff&#233;ry [<xref ref-type="bibr" rid="scirp.86931-ref28">28</xref>] wrote that logistic models built from bootstrap samples are too similar for their aggregation to really differ from the base model built on the entire sample. This is in agreement with an assertion by Genuer and Poggi [<xref ref-type="bibr" rid="scirp.86931-ref16">16</xref>] . However, Tuff&#233;ry suggests the use of a method called “random forest of logistic models” introducing an additional randomness: at each iteration,</p><p>• a bootstrap sample is drawn,</p><p>• variables are randomly selected,</p><p>• an ascending variables selection is performed using AIC [<xref ref-type="bibr" rid="scirp.86931-ref26">26</xref>] or BIC [<xref ref-type="bibr" rid="scirp.86931-ref27">27</xref>] criteria.</p><p>Note that this method is in fact a particular case of RGLM method.</p><p>Present now the method used in this study to check the stability of the predictor obtained on the entire learning sample.</p></sec><sec id="s4_2"><title>4.2. Method of Construction of an Ensemble Predictor</title><p>The steps of the method for constructing an ensemble predictor are presented in the form of a tree (<xref ref-type="fig" rid="fig2">Figure 2</xref>).</p><p>At first step, n<sub>1</sub> classifiers are chosen.</p><p>At second step, n<sub>2</sub> bootstrap samples are drawn and are the same for each classifier.</p><p>At third step, for each classifier and each bootstrap sample, n<sub>3</sub> modalities of random selection of variables are chosen, a modality being defined either by a number of randomly drawn variables or by a number of predefined groups of correlated variables, which are randomly drawn, inside each of which a variable is randomly drawn.</p><p>At fourth step, for each classifier, each bootstrap sample and each modality of random selection of variables, one method of selection of variables is chosen, a stepwise or a shrinkage (LASSO, ridge or elastic net) method.</p><p>This yields a set of n 1 &#215; n 2 &#215; n 3 predictors, which are aggregated to obtain an ensemble predictor.</p></sec><sec id="s4_3"><title>4.3. Choices Made</title><p>To assess accuracy of the ensemble predictor, the percentage of well-classified is currently used. But this criteria is not always convenient, especially in the</p><p>present case of unbalanced classes. We decided to use AUC. AUC in resubstitution being usually too optimistic, we used AUC OOB [<xref ref-type="bibr" rid="scirp.86931-ref29">29</xref>] : for each patient, consider the set of predictors built on the bootstrap samples that do not contain this patient, i.e. for which this patient is “out-of bag”, then aggregate the corresponding predictions to obtain an OOB prediction.</p><p>Two classifiers were used: logistic regression and LDA with metrics W<sup>−1</sup>. Other classifiers were tested but not retained because of their less good results, such as random forest-random input (RF-RI) [<xref ref-type="bibr" rid="scirp.86931-ref24">24</xref>] or QDA. The k-nearest neighbors method (k-NN) was not tested, because it was not adapted to this study due to the presence of very unbalanced classes with a too small class size.</p><p>1000 bootstrap samples were randomly drawn.</p><p>Three modalities of random selection were retained, firstly a random draw of a fixed number of variables, secondly and thirdly a random draw of a fixed number of predefined groups of correlated variables followed by a random draw of one variable inside each drawn group. The number of variables or of groups drawn was determined by optimization of AUC OOB.</p><p>Fourth step did not improve prediction accuracy and was not retained.</p></sec><sec id="s4_4"><title>4.4. Construction of an Ensemble Score</title><p>Denote n the total number of patient-months and p the number of variables. Denote x i j the value of variable x j for patient-month i, i = 1 , ⋯ , n , j = 1 , ⋯ , p . Each patient-month i is represented by a vector x i = ( x i 1 ⋯ x i p ) ′ in ℝ p .</p><sec id="s4_4_1"><title>4.4.1. Aggregation of Predictors</title><p>In the case of two classes Ω 1 and Ω 0 , whose barycenters are respectively denoted g 1 and g 0 , Fisher linear discriminant function</p><p>S 1 ( x ) = ( x − g 1 + g 0 2 ) ′ W − 1 ( g 1 − g 0 ) = α ′ 1 x + β 1 (16)</p><p>can be used as score function. For logistic regression, the following score function can be used:</p><p>S 2 ( x ) = ln P ( Ω 1 | X = x ) P ( Ω 0 | X = x ) = α ′ 2 x + β 2 . (17)</p><p>Remind that, in the case of a multinormal model with homoscedasticity (covariance matrices within classes are equal), when P ( Ω 1 ) = P ( Ω 0 ) , logistic model is equivalent to LDA [<xref ref-type="bibr" rid="scirp.86931-ref17">17</xref>] ; indeed:</p><p>S 2 ( x ) = ln P ( Ω 1 | X = x ) P ( Ω 0 | X = x ) = ln P ( Ω 1 ) P ( Ω 0 ) + S 1 ( x ) = S 1 ( x ) . (18)</p><p>So we used the following method to aggregate the obtained predictors:</p><p>1) the score functions obtained by LDA are aggregated by averaging; denote now S<sub>1</sub> the averaged score;</p><p>2) likewise the score functions obtained by logistic regression are aggregated by averaging; denote S<sub>2</sub> the averaged score;</p><p>3) a combination of the two scores, λ S 1 + ( 1 − λ ) S 2 is defined, 0 ≤ λ ≤ 1 ; a value of λ that maximizes AUC OOB is retained; denote S<sub>0</sub> the optimal score obtained by this method.</p><p>If s is an optimal cut-off, the ensemble classifier is defined by:</p><p>If S 0 ( x ) &gt; s , x is classified in Ω 1 ; (19)</p><p>if not, x is classified in Ω 0 . (20)</p></sec><sec id="s4_4_2"><title>4.4.2. Definition of a Score from 0 to 100</title><p>The variation scale of the score function S 0 ( x ) was reduced from 0 to 100 using the following method. Denote:</p><p>S 0 ( x ) = α ′ 0 x + β 0 = ∑ j = 1 p α 0 j x j + β 0 . (21)</p><p>Denote for j = 1 , ⋯ , p :</p><p>P j = | α 0 j | ( max 1 ≤ i ≤ n x i j − min 1 ≤ i ≤ n x i j ) (22)</p><p>and</p><p>P = ∑ j = 1 p P j = ∑ j = 1 p | α 0 j | ( max 1 ≤ i ≤ n x i j − min 1 ≤ i ≤ n x i j ) . (23)</p><p>Let m j the minimal value of the variable x j if α 0 j &gt; 0 , or its maximal value if α 0 j &lt; 0 .</p><p>Denote S ( x ) the “normalized” score function, with values from 0 to 100, defined by:</p><p>S ( x ) = 100 P ∑ j = 1 p   α 0 j ( x j − m j ) (24)</p><p>= 100 ∑ j = 1 p α 0 j ( x j − m j ) ∑ k = 1 p | α 0 k | ( max 1 ≤ i ≤ n x i k − min 1 ≤ i ≤ n x i k ) (25)</p><p>= α ′ x + β , with ( β α 1 ⋮ α p ) = ( − 100 P ∑ j = 1 p α 0 j m j 100 α 0 1 P ⋮ 100 α 0 p P ) . (26)</p></sec><sec id="s4_4_3"><title>4.4.3. Measure of Variables Importance</title><p>Explanatory variables are not expressed in the same unit. To assess their importance in the score, we used “standardized” coefficients, multiplying the coefficient of each variable in the score by its standard deviation. These coefficients are those associated with standardized variables and are directly comparable. For all variables, the absolute values of their standardized coefficient, from the greatest to the lowest, were plotted on a graph. The same type of plot was used for groups of correlated variables, whose importance is assessed by the sum of absolute values of their standardized coefficients.</p></sec><sec id="s4_4_4"><title>4.4.4. Risk Measure by an Odds-Ratio</title><p>Define a risk measure associated to a score s by an odds-ratio O R 1 ( s ) :</p><p>O R 1 ( s ) = P ( Y = 1 | S &gt; s ) P ( Y = 0 | S &gt; s ) P ( Y = 0 ) P ( Y = 1 ) = P ( S &gt; s | Y = 1 ) P ( S &gt; s | Y = 0 ) = S e ( s ) 1 − S p ( s ) . (27)</p><p>An estimation of O R 1 ( s ) , also denoted O R 1 ( s ) , is n 1 n 0 &#215; N 0 N 1 with n k = # { S &gt; s } ∩ { Y = k } and N k = # { Y = k } , k = 0 , 1 .</p><p>Note that:</p><p>• O R 1 ( s ) decreases when S e ( s ) decreases and S p ( s ) is constant. In practice, the decrease will be much smaller when there are many observations;</p><p>• O R 1 ( s ) is not defined when S p ( s ) is equal to 1.</p><p>For these reasons, the following definition can also be used:</p><p>O R 2 ( s ) = max t ≤ s : O R 1 ( t ) &lt; ∞ O R 1 ( t ) . (28)</p><p>Note that O R 1 is the slope y/x of the line joining the origin to the point ( x , y ) of the ROC curve. In the case of an “ideal” ROC curve, supposed continuous above the diagonal line, assuming that there is no vertical segment in the curve, this slope increases from point ( 1,1 ) , corresponding to the minimal value of score, to point ( 0,0 ) , corresponding to its maximal value; the case of a vertical segment (Se decreases, Sp is constant), occurring when the score of a patient with event is between those of two patients without event, is particularly visible in the case of a small number of patients and also justifies the definition of O R 2 , whose curve fits that of O R 1 .</p><p>For very high score values, when n<sub>0</sub> or n<sub>1</sub> are too small, the estimation of O R 1 is no longer reliable. A reliability interval of the score could be defined, depending on the values of n<sub>0</sub> and n<sub>1</sub>.</p></sec></sec></sec><sec id="s5"><title>5. Results</title><sec id="s5_1"><title>5.1. Pre-Processing of Variables</title><sec id="s5_1_1"><title>5.1.1. Winsorization</title><p>To avoid problems related to the presence of outliers or extreme data, all continuous variables were winsorized using the 1<sup>st</sup> percentile and the 99<sup>th</sup> percentile of each variable as limit values [<xref ref-type="bibr" rid="scirp.86931-ref30">30</xref>] . We chose this solution because of the large imbalance of the classes (317 patients with event against 21,065 with no event, so there is a ratio of about 1 to 66). The elimination of extreme data would have led to decrease the number of patients with event.</p></sec><sec id="s5_1_2"><title>5.1.2. Transformation of Variables</title><p>Among qualitative variables, two are ordinal: the NYHA class with 4 modalities and the number of myocardial infarction (no. MI) with 5 modalities. In order to preserve the ordinal nature of these variables, we chose to use an ordinal encoding. For NYHA, we therefore associated 3 binary variables: NYHA ≥ 2, NYHA ≥ 3 and NYHA ≥ 4. In the same way, for the no. MI, we considered 4 binary variables: no. MI ≥ 2, no. MI ≥ 3, no. MI ≥ 4 and no. MI ≥ 5.</p><p>On the other hand, continuous variables were transformed in the context of logistic regression. For each continuous variable, a linearity test was performed using the method of restricted cubic splines with 3 knots [<xref ref-type="bibr" rid="scirp.86931-ref31">31</xref>] . A cubic spline restricted with 3 knots is composed of a linear component and a cubic component. Linearity testing is to test, under the univariable logistic model, the nullity of the coefficient associated with the cubic component. To do this, we used the likelihood ratio test. The results of linearity tests are given in <xref ref-type="table" rid="table1">Table 1</xref> (p-value 1).</p><table-wrap id="table1" ><label><xref ref-type="table" rid="table1">Table 1</xref></label><caption><title> Linearity tests and transformation of continuous variables</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Variable</th><th align="center" valign="middle" >p-value 1</th><th align="center" valign="middle" >Transformation function f ( x )</th><th align="center" valign="middle" >p-value 2</th></tr></thead><tr><td align="center" valign="middle" >Hemoglobin</td><td align="center" valign="middle" >0.090</td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" >Hematocrit</td><td align="center" valign="middle" >0.007</td><td align="center" valign="middle" >x − 2</td><td align="center" valign="middle" >1.00</td></tr><tr><td align="center" valign="middle" >ePVS</td><td align="center" valign="middle" >0.69</td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" >Creatinine</td><td align="center" valign="middle" >0.21</td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" >eGFR Cockroft-Gault</td><td align="center" valign="middle" >&lt;0.0001</td><td align="center" valign="middle" >ln ( x )</td><td align="center" valign="middle" >0.40</td></tr><tr><td align="center" valign="middle" >eGFR MDRD</td><td align="center" valign="middle" >&lt;0.0001</td><td align="center" valign="middle" >x − 0.5</td><td align="center" valign="middle" >0.79</td></tr><tr><td align="center" valign="middle" >eGFR CKD-EPI</td><td align="center" valign="middle" >0.005</td><td align="center" valign="middle" >ln ( x )</td><td align="center" valign="middle" >0.90</td></tr><tr><td align="center" valign="middle" >Sodium</td><td align="center" valign="middle" >0.056</td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" >Potassium</td><td align="center" valign="middle" >&lt;0.0001</td><td align="center" valign="middle" >( x − 4.6 ) 2</td><td align="center" valign="middle" >0.47</td></tr><tr><td align="center" valign="middle" >Heart rate</td><td align="center" valign="middle" >&lt;0.0001</td><td align="center" valign="middle" >( x − 60 ) 2</td><td align="center" valign="middle" >0.91</td></tr><tr><td align="center" valign="middle" >Systolic BP</td><td align="center" valign="middle" >&lt;0.0001</td><td align="center" valign="middle" >( x − 140 ) 2</td><td align="center" valign="middle" >0.34</td></tr><tr><td align="center" valign="middle" >Diastolic BP</td><td align="center" valign="middle" >&lt;0.0001</td><td align="center" valign="middle" >( x − 84 ) 2</td><td align="center" valign="middle" >0.49</td></tr><tr><td align="center" valign="middle" >Mean BP</td><td align="center" valign="middle" >&lt;0.0001</td><td align="center" valign="middle" >( x − 102 ) 2</td><td align="center" valign="middle" >0.66</td></tr><tr><td align="center" valign="middle" >Weight</td><td align="center" valign="middle" >0.090</td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" >BMI</td><td align="center" valign="middle" >0.060</td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" >Age</td><td align="center" valign="middle" >0.64</td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr></tbody></table></table-wrap><p>At 5% level, linearity was rejected for 9 of 16 continuous variables. For each of these 9 variables, we represented graphically the relationship between the logit (natural logarithm of the ratio probability of event/probability of non-event) and the variable. An example of graphical representation is given for potassium: we observe a quadratic relationship between the logit and the potassium (<xref ref-type="fig" rid="fig3">Figure 3</xref>). In agreement with the relationship observed, we applied a simple, monotonous or quadratic transformation function to each of the 9 variables. The transformation function applied to each variable is given in <xref ref-type="table" rid="table1">Table 1</xref>.</p><p>For hematocrit and the three variables of eGFR, the relationship is clearly monotonous. So we considered some simple monotonic transformation functions as f ( x ) = x a with a ∈ { − 2, − 1, − 0.5,0.5,1,2 } or f ( x ) = ln ( x ) , then we retained for each variable the transformation for which the likelihood under univariable logistic model was maximal (minimal p-value).</p><p>For other variables not checking linearity, namely potassium, the three blood pressure measures (systolic, diastolic and mean), and heart rate, the relationship between the logit and the variable was rather quadratic. We therefore applied a quadratic transformation function ( X − k * ) 2 with k ∗ an optimal value determined by maximizing likelihood under univariable logistic model. To compare, we also used the criterion of maximal AUC to determine an optimal value. These results are presented in <xref ref-type="table" rid="table2">Table 2</xref>. Notice that the optimal values determined by the two methods are the same for systolic BP, diastolic BP and heart rate and are very close for potassium and mean BP.</p><p>Also note that the transformation applied to potassium allows to take into account both hypokalemia and hyperkalemia, two different clinical situations pooled here that may increase the risk of death and/or hospitalization measured by the score.</p><table-wrap id="table2" ><label><xref ref-type="table" rid="table2">Table 2</xref></label><caption><title> Quadratic transformations</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Variable</th><th align="center" valign="middle" >“Raw” variable X</th><th align="center" valign="middle"  colspan="2"  >Criterion 1 Maximizing likelihood for ( X − k * ) 2</th><th align="center" valign="middle"  colspan="2"  >Criterion 2 Maximal AUC for ( X − k * ) 2</th></tr></thead><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" >AUC</td><td align="center" valign="middle" >k*</td><td align="center" valign="middle" >AUC</td><td align="center" valign="middle" >k*</td><td align="center" valign="middle" >AUC</td></tr><tr><td align="center" valign="middle" >Systolic BP</td><td align="center" valign="middle" >0.5818</td><td align="center" valign="middle" >140</td><td align="center" valign="middle" >0.5995</td><td align="center" valign="middle" >140</td><td align="center" valign="middle" >0.5995</td></tr><tr><td align="center" valign="middle" >Diastolic BP</td><td align="center" valign="middle" >0.5834</td><td align="center" valign="middle" >84</td><td align="center" valign="middle" >0.5970</td><td align="center" valign="middle" >84</td><td align="center" valign="middle" >0.5970</td></tr><tr><td align="center" valign="middle" >Mean BP</td><td align="center" valign="middle" >0.5915</td><td align="center" valign="middle" >102</td><td align="center" valign="middle" >0.6091</td><td align="center" valign="middle" >101</td><td align="center" valign="middle" >0.6094</td></tr><tr><td align="center" valign="middle" >Potassium</td><td align="center" valign="middle" >0.5312</td><td align="center" valign="middle" >4.6</td><td align="center" valign="middle" >0.5665</td><td align="center" valign="middle" >4.7</td><td align="center" valign="middle" >0.5676</td></tr><tr><td align="center" valign="middle" >Heart rate</td><td align="center" valign="middle" >0.6473</td><td align="center" valign="middle" >60</td><td align="center" valign="middle" >0.6521</td><td align="center" valign="middle" >60</td><td align="center" valign="middle" >0.6521</td></tr></tbody></table></table-wrap><p>To verify that the transformation of the variables was good, a linearity test for each transformed variable was performed according to the previously detailed principle. All tests are not significant at the 5% level (see <xref ref-type="table" rid="table1">Table 1</xref>, p-value 2).</p></sec></sec><sec id="s5_2"><title>5.2. Ensemble Score</title><sec id="s5_2_1"><title>5.2.1. Ensemble Score by Logistic Regression</title><p>As a first step, we applied our methodology with the following parameters:</p><p>• use of a single classification rule, logistic regression ( n 1 = 1 ),</p><p>• draw of 1000 bootstrap samples ( n 2 = 1000 ),</p><p>• random selection of variables according to a single modality ( n 3 = 1 ).</p><p>Three modalities for the random selection of variables were defined:</p><p>• 1<sup>st</sup> modality: random draw of m variables among 32,</p><p>• 2<sup>nd</sup> modality: random draw of m groups among 18, then one variable from each drawn group,</p><p>• 3<sup>rd</sup> modality: random draw of m groups among 24, then one variable from each drawn group.</p><p>The groups of variables considered for each modality are presented in <xref ref-type="table" rid="table3">Table 3</xref>. For modalities 2 and 3, we formed groups of variables based on correlations between variables. For the second modality, we gathered for example in the same group hemoglobin, hematocrit and ePVS because of their high correlations. For the third modality, the same groups were used, except for the two variables linked to hospitalization for HF, the four variables linked to the no. MI and the three variables related to the NYHA class, for which each binary variable was considered as a single group.</p><p>For each modality, an ensemble score was built for all possible values of m and the one that gave maximal AUC OOB was selected. In <xref ref-type="table" rid="table4">Table 4</xref> are reported the results obtained for each modality with the optimal m. The best result was obtained for the third modality, with AUC OOB equal to 0.8634.</p><p>The ensemble score by logistic regression, denoted S 2 ( x ) , obtained by averaging the three ensemble scores that we constructed, gave slightly better results, with AUC OOB of 0.8649.</p></sec><sec id="s5_2_2"><title>5.2.2. Ensemble Score by LDA for Mixed Data</title><p>The same methodology was used by simply replacing the classification rule (logistic regression) by LDA for mixed data and keeping the same other settings.</p><table-wrap id="table3" ><label><xref ref-type="table" rid="table3">Table 3</xref></label><caption><title> Composition of groups of variables</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Variables</th><th align="center" valign="middle" >Modality 1</th><th align="center" valign="middle" >Modality 2</th><th align="center" valign="middle" >Modality 3</th></tr></thead><tr><td align="center" valign="middle" >Systolic BP</td><td align="center" valign="middle" >-</td><td align="center" valign="middle"  rowspan="3"  >Blood pressure</td><td align="center" valign="middle"  rowspan="3"  >Blood pressure</td></tr><tr><td align="center" valign="middle" >Diastolic BP</td><td align="center" valign="middle" >-</td></tr><tr><td align="center" valign="middle" >Mean BP</td><td align="center" valign="middle" >-</td></tr><tr><td align="center" valign="middle" >Heart rate</td><td align="center" valign="middle" >-</td><td align="center" valign="middle" >-</td><td align="center" valign="middle" >-</td></tr><tr><td align="center" valign="middle" >Weight</td><td align="center" valign="middle" >-</td><td align="center" valign="middle"  rowspan="2"  >Obesity</td><td align="center" valign="middle"  rowspan="2"  >Obesity</td></tr><tr><td align="center" valign="middle" >BMI</td><td align="center" valign="middle" >-</td></tr><tr><td align="center" valign="middle" >NYHA ≥ 2</td><td align="center" valign="middle" >-</td><td align="center" valign="middle"  rowspan="3"  >NYHA</td><td align="center" valign="middle" >-</td></tr><tr><td align="center" valign="middle" >NYHA ≥ 3</td><td align="center" valign="middle" >-</td><td align="center" valign="middle" >-</td></tr><tr><td align="center" valign="middle" >NYHA ≥ 4</td><td align="center" valign="middle" >-</td><td align="center" valign="middle" >-</td></tr><tr><td align="center" valign="middle" >Age</td><td align="center" valign="middle" >-</td><td align="center" valign="middle" >-</td><td align="center" valign="middle" >-</td></tr><tr><td align="center" valign="middle" >Gender</td><td align="center" valign="middle" >-</td><td align="center" valign="middle" >-</td><td align="center" valign="middle" >-</td></tr><tr><td align="center" valign="middle" >Caucasian</td><td align="center" valign="middle" >-</td><td align="center" valign="middle" >-</td><td align="center" valign="middle" >-</td></tr><tr><td align="center" valign="middle" >Hemoglobin</td><td align="center" valign="middle" >-</td><td align="center" valign="middle"  rowspan="3"  >Hematology</td><td align="center" valign="middle"  rowspan="3"  >Hematology</td></tr><tr><td align="center" valign="middle" >Hematocrit</td><td align="center" valign="middle" >-</td></tr><tr><td align="center" valign="middle" >ePVS</td><td align="center" valign="middle" >-</td></tr><tr><td align="center" valign="middle" >Creatinine</td><td align="center" valign="middle" >-</td><td align="center" valign="middle"  rowspan="4"  >Renal function</td><td align="center" valign="middle"  rowspan="4"  >Renal function</td></tr><tr><td align="center" valign="middle" >eGFR Cockroft-Gault</td><td align="center" valign="middle" >-</td></tr><tr><td align="center" valign="middle" >eGFR MDRD</td><td align="center" valign="middle" >-</td></tr><tr><td align="center" valign="middle" >eGFR CKD-EPI</td><td align="center" valign="middle" >-</td></tr><tr><td align="center" valign="middle" >Potassium</td><td align="center" valign="middle" >-</td><td align="center" valign="middle" >-</td><td align="center" valign="middle" >-</td></tr><tr><td align="center" valign="middle" >Sodium</td><td align="center" valign="middle" >-</td><td align="center" valign="middle" >-</td><td align="center" valign="middle" >-</td></tr><tr><td align="center" valign="middle" >Hypertension</td><td align="center" valign="middle" >-</td><td align="center" valign="middle" >-</td><td align="center" valign="middle" >-</td></tr><tr><td align="center" valign="middle" >Diabetes</td><td align="center" valign="middle" >-</td><td align="center" valign="middle" >-</td><td align="center" valign="middle" >-</td></tr><tr><td align="center" valign="middle" >Hosp. for HF</td><td align="center" valign="middle" >-</td><td align="center" valign="middle"  rowspan="2"  >Previous hosp. for HF</td><td align="center" valign="middle" >-</td></tr><tr><td align="center" valign="middle" >Hosp. for HF the previous month</td><td align="center" valign="middle" >-</td><td align="center" valign="middle" >-</td></tr><tr><td align="center" valign="middle" >Hosp. for CV cause the previous month</td><td align="center" valign="middle" >-</td><td align="center" valign="middle" >-</td><td align="center" valign="middle" >-</td></tr><tr><td align="center" valign="middle" >Hosp. for other CV cause the previous month</td><td align="center" valign="middle" >-</td><td align="center" valign="middle" >-</td><td align="center" valign="middle" >-</td></tr><tr><td align="center" valign="middle" >Hosp. for non CV cause the previous month</td><td align="center" valign="middle" >-</td><td align="center" valign="middle" >-</td><td align="center" valign="middle" >-</td></tr><tr><td align="center" valign="middle" >No. MI ≥ 2</td><td align="center" valign="middle" >-</td><td align="center" valign="middle"  rowspan="4"  >No. MI</td><td align="center" valign="middle" >-</td></tr><tr><td align="center" valign="middle" >No. MI ≥ 3</td><td align="center" valign="middle" >-</td><td align="center" valign="middle" >-</td></tr><tr><td align="center" valign="middle" >No. MI ≥ 4</td><td align="center" valign="middle" >-</td><td align="center" valign="middle" >-</td></tr><tr><td align="center" valign="middle" >No. MI ≥ 5</td><td align="center" valign="middle" >-</td><td align="center" valign="middle" >-</td></tr></tbody></table></table-wrap><table-wrap id="table4" ><label><xref ref-type="table" rid="table4">Table 4</xref></label><caption><title> Results obtained by logistic regression</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Parameters</th><th align="center" valign="middle" >AUC in resubstitution</th><th align="center" valign="middle" >AUC OOB</th></tr></thead><tr><td align="center" valign="middle" >Modality 1 m = 19</td><td align="center" valign="middle" >0.8716</td><td align="center" valign="middle" >0.8616</td></tr><tr><td align="center" valign="middle" >Modality 2 m = 14</td><td align="center" valign="middle" >0.8688</td><td align="center" valign="middle" >0.8611</td></tr><tr><td align="center" valign="middle" >Modality 3 m = 8</td><td align="center" valign="middle" >0.8691</td><td align="center" valign="middle" >0.8634</td></tr><tr><td align="center" valign="middle" >Ensemble score</td><td align="center" valign="middle" >0.8728</td><td align="center" valign="middle" >0.8649</td></tr></tbody></table></table-wrap><p>Again, for each modality, we searched the optimal m parameter. The obtained results are presented in <xref ref-type="table" rid="table5">Table 5</xref>.</p><p>As for logistic regression, the best results were obtained for the third modality, with AUC OOB equal to 0.8638.</p><table-wrap id="table5" ><label><xref ref-type="table" rid="table5">Table 5</xref></label><caption><title> Results obtained by LDA for mixed data</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Parameters</th><th align="center" valign="middle" >AUC in resubstitution</th><th align="center" valign="middle" >AUC OOB</th></tr></thead><tr><td align="center" valign="middle" >Modality 1 m = 12</td><td align="center" valign="middle" >0.8679</td><td align="center" valign="middle" >0.8614</td></tr><tr><td align="center" valign="middle" >Modality 2 m = 5</td><td align="center" valign="middle" >0.8673</td><td align="center" valign="middle" >0.8631</td></tr><tr><td align="center" valign="middle" >Modality 3 m = 7</td><td align="center" valign="middle" >0.8690</td><td align="center" valign="middle" >0.8638</td></tr><tr><td align="center" valign="middle" >Ensemble score</td><td align="center" valign="middle" >0.8707</td><td align="center" valign="middle" >0.8654</td></tr></tbody></table></table-wrap><p>The ensemble score by LDA, denoted S 1 ( x ) , yielded better results with AUC OOB equal to 0.8654.</p></sec><sec id="s5_2_3"><title>5.2.3. Ensemble Score Obtained by Synthesis of Logistic Regression and LDA</title><p>The final ensemble score denoted S 0 ( x ) , obtained by synthesis of the two ensemble scores S 1 ( x ) and S 2 ( x ) presented previously, provided the best results with AUC equal to 0.8733 in resubstitution and 0.8667 in OOB.</p><p>This ensemble score corresponds to the one obtained by applying our methodology with the following parameters:</p><p>• two classification rules are used, logistic regression and LDA for mixed data ( n 1 = 2 ),</p><p>• 1000 bootstrap samples are drawn ( n 2 = 1000 ),</p><p>• m variables are randomly selected according to three modalities ( n 3 = 3 ).</p><p>The scale of variation of the score function S 0 ( x ) was reduced from 0 to 100 according to the procedure described previously. We denote this “normalized” score S ( x ) .</p><p>In <xref ref-type="table" rid="table6">Table 6</xref>, we present the “raw” and “standardized” coefficients associated with each of the variables in the score function S 0 ( x ) and the “normalized” score function S ( x ) .</p></sec><sec id="s5_2_4"><title>5.2.4. Importance of Variables in the Score</title><p>To have a global view of the importance of the variables in the “normalized” score, we represented on a graph the absolute value of standardized coefficient associated with each variable, from the largest value to the smallest (see <xref ref-type="fig" rid="fig4">Figure 4</xref>). Note that the most important variables are heart rate, NYHA class ≥ 3 and history of hospitalization for HF in the previous month. On the other hand, variables such as weight, no. MI ≥ 5 or BMI do not play a large part in the presence of others.</p><p>The same type of graph was made to represent the importance of the groups of variables in configuration 2 defined by the sum of the absolute values of the “standardized” coefficients associated with the variables of the group, from the largest sum to the smallest (see <xref ref-type="fig" rid="fig4">Figure 4</xref>). Note that the two most influential groups are “NYHA” (NYHA ≥ 2, NYHA ≥ 3 and NYHA ≥ 4) and “History of hospitalization for HF” (hospitalization for HF in the previous month and hospitalization for HF during life). Three important groups follow: “Hematology” (ePVS, hemoglobin, hematocrit), “Heart rate” and “Renal function” (creatinine and three formulas of eGFR). The least important groups of variables are “Obesity” (weight, BMI) and “Gender”.</p><table-wrap id="table6" ><label><xref ref-type="table" rid="table6">Table 6</xref></label><caption><title> Ensemble score</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Variables</th><th align="center" valign="middle"  colspan="2"  >Ensemble score S 0 ( x )</th><th align="center" valign="middle"  colspan="2"  >Ensemble score “normalized” S ( x )</th></tr></thead><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" >coefficient</td><td align="center" valign="middle" >Standardized coefficient</td><td align="center" valign="middle" >coefficient</td><td align="center" valign="middle" >Standardized coefficient</td></tr><tr><td align="center" valign="middle" >Constant</td><td align="center" valign="middle" >−0.210</td><td align="center" valign="middle" >−0.210</td><td align="center" valign="middle" >44.60</td><td align="center" valign="middle" >44.60</td></tr><tr><td align="center" valign="middle" >Hemoglobin</td><td align="center" valign="middle" >−0.0580</td><td align="center" valign="middle" >−0.0871</td><td align="center" valign="middle" >−0.478</td><td align="center" valign="middle" >−0.717</td></tr><tr><td align="center" valign="middle" >Hematocrit<sup>−2</sup></td><td align="center" valign="middle" >314.00</td><td align="center" valign="middle" >0.0442</td><td align="center" valign="middle" >2590.00</td><td align="center" valign="middle" >0.364</td></tr><tr><td align="center" valign="middle" >ePVS</td><td align="center" valign="middle" >0.131</td><td align="center" valign="middle" >0.107</td><td align="center" valign="middle" >1.07</td><td align="center" valign="middle" >0.877</td></tr><tr><td align="center" valign="middle" >Creatinine</td><td align="center" valign="middle" >0.00349</td><td align="center" valign="middle" >0.0964</td><td align="center" valign="middle" >0.0287</td><td align="center" valign="middle" >0.794</td></tr><tr><td align="center" valign="middle" >Ln (eGFR Cockroft-Gault)</td><td align="center" valign="middle" >−0.0940</td><td align="center" valign="middle" >−0.0396</td><td align="center" valign="middle" >−0.774</td><td align="center" valign="middle" >−0.326</td></tr><tr><td align="center" valign="middle" >eGFR MDRD<sup>−0.5</sup></td><td align="center" valign="middle" >−0.892</td><td align="center" valign="middle" >−0.0183</td><td align="center" valign="middle" >−7.34</td><td align="center" valign="middle" >−0.151</td></tr><tr><td align="center" valign="middle" >Ln(eGFR CKD-EPI)</td><td align="center" valign="middle" >−0.175</td><td align="center" valign="middle" >−0.0590</td><td align="center" valign="middle" >−1.44</td><td align="center" valign="middle" >−0.486</td></tr><tr><td align="center" valign="middle" >Sodium</td><td align="center" valign="middle" >−0.0232</td><td align="center" valign="middle" >−0.0861</td><td align="center" valign="middle" >−0.191</td><td align="center" valign="middle" >−0.709</td></tr><tr><td align="center" valign="middle" >(Potassium-4.6)<sup>2</sup></td><td align="center" valign="middle" >0.301</td><td align="center" valign="middle" >0.0889</td><td align="center" valign="middle" >2.48</td><td align="center" valign="middle" >0.732</td></tr><tr><td align="center" valign="middle" >(Heart rate-60)<sup>2</sup></td><td align="center" valign="middle" >0.000696</td><td align="center" valign="middle" >0.221</td><td align="center" valign="middle" >0.00572</td><td align="center" valign="middle" >1.82</td></tr><tr><td align="center" valign="middle" >(Systolic BP-140)<sup>2</sup></td><td align="center" valign="middle" >0.000125</td><td align="center" valign="middle" >0.0729</td><td align="center" valign="middle" >0.00103</td><td align="center" valign="middle" >0.600</td></tr><tr><td align="center" valign="middle" >(Diastolic BP-84)<sup>2</sup></td><td align="center" valign="middle" >0.0000985</td><td align="center" valign="middle" >0.0220</td><td align="center" valign="middle" >0.000810</td><td align="center" valign="middle" >0.181</td></tr><tr><td align="center" valign="middle" >(Mean BP-102)<sup>2</sup></td><td align="center" valign="middle" >0.000201</td><td align="center" valign="middle" >0.0545</td><td align="center" valign="middle" >0.00165</td><td align="center" valign="middle" >0.448</td></tr><tr><td align="center" valign="middle" >Weight</td><td align="center" valign="middle" >0.0000258</td><td align="center" valign="middle" >0.000374</td><td align="center" valign="middle" >0.000212</td><td align="center" valign="middle" >0.00308</td></tr><tr><td align="center" valign="middle" >BMI</td><td align="center" valign="middle" >0.00196</td><td align="center" valign="middle" >0.00844</td><td align="center" valign="middle" >0.0161</td><td align="center" valign="middle" >0.0695</td></tr><tr><td align="center" valign="middle" >Age</td><td align="center" valign="middle" >0.00449</td><td align="center" valign="middle" >0.0506</td><td align="center" valign="middle" >0.0370</td><td align="center" valign="middle" >0.416</td></tr><tr><td align="center" valign="middle" >Caucasian</td><td align="center" valign="middle" >−0.162</td><td align="center" valign="middle" >−0.0455</td><td align="center" valign="middle" >−1.33</td><td align="center" valign="middle" >−0.374</td></tr><tr><td align="center" valign="middle" >Male</td><td align="center" valign="middle" >0.0434</td><td align="center" valign="middle" >0.0195</td><td align="center" valign="middle" >0.357</td><td align="center" valign="middle" >0.161</td></tr><tr><td align="center" valign="middle" >Hypertension</td><td align="center" valign="middle" >0.136</td><td align="center" valign="middle" >0.0665</td><td align="center" valign="middle" >1.12</td><td align="center" valign="middle" >0.547</td></tr><tr><td align="center" valign="middle" >Diabetes</td><td align="center" valign="middle" >0.0904</td><td align="center" valign="middle" >0.0422</td><td align="center" valign="middle" >0.744</td><td align="center" valign="middle" >0.347</td></tr><tr><td align="center" valign="middle" >Hosp. for HF</td><td align="center" valign="middle" >0.549</td><td align="center" valign="middle" >0.175</td><td align="center" valign="middle" >4.52</td><td align="center" valign="middle" >1.44</td></tr><tr><td align="center" valign="middle" >Hosp. for HF the previous month</td><td align="center" valign="middle" >1.53</td><td align="center" valign="middle" >0.185</td><td align="center" valign="middle" >12.60</td><td align="center" valign="middle" >1.52</td></tr><tr><td align="center" valign="middle" >Hosp. for CV cause the previous month</td><td align="center" valign="middle" >0.403</td><td align="center" valign="middle" >0.168</td><td align="center" valign="middle" >3.31</td><td align="center" valign="middle" >1.38</td></tr><tr><td align="center" valign="middle" >Hosp. for non-CV cause the previous month</td><td align="center" valign="middle" >0.361</td><td align="center" valign="middle" >0.0486</td><td align="center" valign="middle" >2.97</td><td align="center" valign="middle" >0.400</td></tr><tr><td align="center" valign="middle" >Hosp. for other CV cause the previous month</td><td align="center" valign="middle" >0.104</td><td align="center" valign="middle" >0.0205</td><td align="center" valign="middle" >0.852</td><td align="center" valign="middle" >0.169</td></tr><tr><td align="center" valign="middle" >No. MI ≥ 2</td><td align="center" valign="middle" >0.0840</td><td align="center" valign="middle" >0.0377</td><td align="center" valign="middle" >0.692</td><td align="center" valign="middle" >0.310</td></tr><tr><td align="center" valign="middle" >No. MI ≥ 3</td><td align="center" valign="middle" >0.118</td><td align="center" valign="middle" >0.0323</td><td align="center" valign="middle" >0.973</td><td align="center" valign="middle" >0.266</td></tr><tr><td align="center" valign="middle" >No. MI ≥ 4</td><td align="center" valign="middle" >0.242</td><td align="center" valign="middle" >0.0342</td><td align="center" valign="middle" >1.99</td><td align="center" valign="middle" >0.281</td></tr><tr><td align="center" valign="middle" >No. MI ≥ 5</td><td align="center" valign="middle" >0.0443</td><td align="center" valign="middle" >0.00370</td><td align="center" valign="middle" >0.365</td><td align="center" valign="middle" >0.0304</td></tr><tr><td align="center" valign="middle" >NYHA ≥ 2</td><td align="center" valign="middle" >0.309</td><td align="center" valign="middle" >0.150</td><td align="center" valign="middle" >2.54</td><td align="center" valign="middle" >1.23</td></tr><tr><td align="center" valign="middle" >NYHA ≥ 3</td><td align="center" valign="middle" >0.612</td><td align="center" valign="middle" >0.194</td><td align="center" valign="middle" >5.04</td><td align="center" valign="middle" >1.60</td></tr><tr><td align="center" valign="middle" >NYHA ≥ 4</td><td align="center" valign="middle" >1.65</td><td align="center" valign="middle" >0.142</td><td align="center" valign="middle" >13.60</td><td align="center" valign="middle" >1.16</td></tr></tbody></table></table-wrap></sec><sec id="s5_2_5"><title>5.2.5. Risk Measure by an Odds-Ratio</title><p>We represented the variation of n 0 , n 1 , S e ( s ) , 1 − S p ( s ) , O R 1 ( s ) and O R 2 ( s ) according to the score s (<xref ref-type="table" rid="table7">Table 7</xref>). For score values s &gt; 49.1933 , n 1 is less than or equal to 30. Thus, beyond this threshold value 49.1933, O R 1 is no longer very reliable. We therefore defined as reliability interval of the O R 1 and O R 2 functions [ 0 ; 49.1933 ] .</p><table-wrap id="table7" ><label><xref ref-type="table" rid="table7">Table 7</xref></label><caption><title> Variation of n 0 , n 1 , S e ( s ) , 1 − S p ( s ) , O R 1 ( s ) and O R 2 ( s ) according to the values of score s</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >s</th><th align="center" valign="middle" >n 0</th><th align="center" valign="middle" >n 1</th><th align="center" valign="middle" >S e ( s )</th><th align="center" valign="middle" >1 − S p ( s )</th><th align="center" valign="middle" >O R 1 ( s )</th><th align="center" valign="middle" >O R 2 ( s )</th></tr></thead><tr><td align="center" valign="middle" >s* = 23.7094</td><td align="center" valign="middle" >4527</td><td align="center" valign="middle" >250</td><td align="center" valign="middle" >0.7918</td><td align="center" valign="middle" >0.2149</td><td align="center" valign="middle" >3.6844</td><td align="center" valign="middle" >3.6844</td></tr><tr><td align="center" valign="middle" >11.8489</td><td align="center" valign="middle" >19683</td><td align="center" valign="middle" >317</td><td align="center" valign="middle" >1.0000</td><td align="center" valign="middle" >0.9344</td><td align="center" valign="middle" >1.0702</td><td align="center" valign="middle" >1.0702</td></tr><tr><td align="center" valign="middle" >13.7320</td><td align="center" valign="middle" >17684</td><td align="center" valign="middle" >316</td><td align="center" valign="middle" >0.9968</td><td align="center" valign="middle" >0.8395</td><td align="center" valign="middle" >1.1874</td><td align="center" valign="middle" >1.1874</td></tr><tr><td align="center" valign="middle" >15.1105</td><td align="center" valign="middle" >15684</td><td align="center" valign="middle" >316</td><td align="center" valign="middle" >0.9968</td><td align="center" valign="middle" >0.7446</td><td align="center" valign="middle" >1.3388</td><td align="center" valign="middle" >1.3388</td></tr><tr><td align="center" valign="middle" >16.3630</td><td align="center" valign="middle" >13686</td><td align="center" valign="middle" >314</td><td align="center" valign="middle" >0.9905</td><td align="center" valign="middle" >0.6498</td><td align="center" valign="middle" >1.5245</td><td align="center" valign="middle" >1.5245</td></tr><tr><td align="center" valign="middle" >17.6044</td><td align="center" valign="middle" >11689</td><td align="center" valign="middle" >311</td><td align="center" valign="middle" >0.9811</td><td align="center" valign="middle" >0.5549</td><td align="center" valign="middle" >1.7679</td><td align="center" valign="middle" >1.7679</td></tr><tr><td align="center" valign="middle" >18.9050</td><td align="center" valign="middle" >9697</td><td align="center" valign="middle" >303</td><td align="center" valign="middle" >0.9558</td><td align="center" valign="middle" >0.4604</td><td align="center" valign="middle" >2.0762</td><td align="center" valign="middle" >2.0762</td></tr><tr><td align="center" valign="middle" >20.4525</td><td align="center" valign="middle" >7709</td><td align="center" valign="middle" >291</td><td align="center" valign="middle" >0.9180</td><td align="center" valign="middle" >0.3660</td><td align="center" valign="middle" >2.5081</td><td align="center" valign="middle" >2.5081</td></tr><tr><td align="center" valign="middle" >22.3007</td><td align="center" valign="middle" >5729</td><td align="center" valign="middle" >271</td><td align="center" valign="middle" >0.8549</td><td align="center" valign="middle" >0.2720</td><td align="center" valign="middle" >3.1428</td><td align="center" valign="middle" >3.1428</td></tr><tr><td align="center" valign="middle" >24.7670</td><td align="center" valign="middle" >3766</td><td align="center" valign="middle" >234</td><td align="center" valign="middle" >0.7382</td><td align="center" valign="middle" >0.1788</td><td align="center" valign="middle" >4.1278</td><td align="center" valign="middle" >4.1278</td></tr><tr><td align="center" valign="middle" >28.8573</td><td align="center" valign="middle" >1822</td><td align="center" valign="middle" >178</td><td align="center" valign="middle" >0.5615</td><td align="center" valign="middle" >0.0865</td><td align="center" valign="middle" >6.4884</td><td align="center" valign="middle" >6.4884</td></tr><tr><td align="center" valign="middle" >33.2656</td><td align="center" valign="middle" >872</td><td align="center" valign="middle" >128</td><td align="center" valign="middle" >0.4038</td><td align="center" valign="middle" >0.0414</td><td align="center" valign="middle" >9.7431</td><td align="center" valign="middle" >9.8363</td></tr><tr><td align="center" valign="middle" >38.2403</td><td align="center" valign="middle" >414</td><td align="center" valign="middle" >86</td><td align="center" valign="middle" >0.2713</td><td align="center" valign="middle" >0.0197</td><td align="center" valign="middle" >13.7706</td><td align="center" valign="middle" >13.7706</td></tr><tr><td align="center" valign="middle" >49.1933</td><td align="center" valign="middle" >70</td><td align="center" valign="middle" >30</td><td align="center" valign="middle" >0.0978</td><td align="center" valign="middle" >0.0033</td><td align="center" valign="middle" >29.4283</td><td align="center" valign="middle" >31.5217</td></tr><tr><td align="center" valign="middle" >55.1424</td><td align="center" valign="middle" >28</td><td align="center" valign="middle" >22</td><td align="center" valign="middle" >0.0694</td><td align="center" valign="middle" >0.0014</td><td align="center" valign="middle" >50.4112</td><td align="center" valign="middle" >50.4112</td></tr><tr><td align="center" valign="middle" >58.0352</td><td align="center" valign="middle" >14</td><td align="center" valign="middle" >16</td><td align="center" valign="middle" >0.0505</td><td align="center" valign="middle" >0.0007</td><td align="center" valign="middle" >70.8812</td><td align="center" valign="middle" >74.7575</td></tr></tbody></table></table-wrap><p>We represented the variation of odds-ratio O R 1 and O R 2 in this reliability interval (<xref ref-type="fig" rid="fig5">Figure 5</xref>). By reading the graph, for a patient with a score of 40 for</p><p>example, P ( Y = 1 | S &gt; 40 ) P ( Y = 0 | S &gt; 40 ) is about 15 times higher than P ( Y = 1 ) P ( Y = 0 ) .</p></sec></sec></sec><sec id="s6"><title>6. Conclusions and Perspectives</title><p>In this article, we presented a new methodology for constructing a short-term event risk score in heart failure patients, based on an ensemble predictor built using two classification rules (logistic regression and LDA for mixed data), 1000 bootstrap samples and three modalities of random selection of variables. This score was normalized on a scale from 0 to 100. AUC OOB is equal to 0.8667. Note</p><p>that an important variable such as potassium that does not appear in other scores (as SPIM risk score) is taken into account in this score.</p><p>Moreover, we defined a measure of the importance of each variable and each group of variables in the score and defined an event risk measure by an odds-ratio.</p><p>Due to the nature of the data available (data obtained from the EPHESUS study), we had to define the short term to 30 days in order to have enough patients with HF event. It would be better to have data of patients with shorter intervals, in order to have data the closest possible of an event and eventually improve the quality of the score. When such data will be available, it will be interesting to apply the same methodology to construct a new score.</p><p>Furthermore, we proved a property of linear discriminant analysis for mixed data.</p><p>Finally, this methodology can be adapted to the case of a data stream. Suppose that new data for heart failure patients arrives continuously. Data can be allocated to bootstrap samples using Poisson bootstrap [<xref ref-type="bibr" rid="scirp.86931-ref32">32</xref>] . The coefficients of each variable in each predictor based on logistic regression or binary linear discriminant analysis can be updated online using a stochastic gradient algorithm. Such algorithms are presented in [<xref ref-type="bibr" rid="scirp.86931-ref33">33</xref>] for binary LDA and [<xref ref-type="bibr" rid="scirp.86931-ref34">34</xref>] for logistic regression; they use online standardized data in order to avoid a numerical explosion in the presence of extreme values. Thus the ensemble score obtained by averaging can be updated online. To the best of our knowledge, it is the first time that this problematics is studied in this context.</p></sec><sec id="s7"><title>Acknowledgements</title><p>Results incorporated in this article received funding from the Investments for the Future program under grant agreement No ANR-15-RHU-0004.</p></sec><sec id="s8"><title>Conflicts of Interest</title><p>The authors declare no conflicts of interest regarding the publication of this paper.</p></sec><sec id="s9"><title>Cite this paper</title><p>Duarte, K., Monnez, J.-M. and Albuisson, E. (2018) Methodology for Constructing a Short-Term Event Risk Score in Heart Failure Patients. Applied Mathematics, 9, 954-974. https://doi.org/10.4236/am.2018.98065</p></sec></body><back><ref-list><title>References</title><ref id="scirp.86931-ref1"><label>1</label><mixed-citation publication-type="other" xlink:type="simple">Levy, W.C., Mozaffarian, D., Linker, D.T., et al. (2006) The Seattle Heart Failure Model: Prediction of Survival in Heart Failure. Circulation, 113, 1424-1433.  
https://doi.org/10.1161/CIRCULATIONAHA.105.584102</mixed-citation></ref><ref id="scirp.86931-ref2"><label>2</label><mixed-citation publication-type="other" xlink:type="simple">Ketchum, E.S., Dickstein, K., Kjekshus, J., et al. (2014) The Seattle Post Myocardial Infarction Model (SPIM): Prediction of Mortality after Acute Myocardial Infarction with Left Ventricular Dysfunction. European Heart Journal: Acute Cardiovascular Care, 3, 46-55. https://doi.org/10.1177/2048872613502283</mixed-citation></ref><ref id="scirp.86931-ref3"><label>3</label><mixed-citation publication-type="other" xlink:type="simple">Pitt, B., Remme, W., Zannad, F., et al. (2003) Eplerenone, a Selective Aldosterone Blocker, in Patients with Left Ventricular Dysfunction after Myocardial Infarction. New England Journal of Medicine, 348, 1309-1321.  
https://doi.org/10.1056/NEJMoa030207</mixed-citation></ref><ref id="scirp.86931-ref4"><label>4</label><mixed-citation publication-type="other" xlink:type="simple">Duarte, K., Monnez, J.M., Albuisson, E., Pitt, B., Zannad, F. and Rossignol, P. (2015) Prognostic Value of Estimated Plasma Volume in Heart Failure. JACC: Heart Failure, 3, 886-893. https://doi.org/10.1016/j.jchf.2015.06.014</mixed-citation></ref><ref id="scirp.86931-ref5"><label>5</label><mixed-citation publication-type="other" xlink:type="simple">Cockcroft, D.W. and Gault, H. (1976) Prediction of Creatinine Clearance from Serum Creatinine. Nephron, 16, 31-41. https://doi.org/10.1159/000180580</mixed-citation></ref><ref id="scirp.86931-ref6"><label>6</label><mixed-citation publication-type="other" xlink:type="simple">Levey, A.S., Coresh, J., Balk, E., et al. (2003) National Kidney Foundation Practice Guidelines for Chronic Kidney Disease: Evaluation, Classification, and Stratification. Annals of Internal Medicine, 139, 137-147.  
https://doi.org/10.7326/0003-4819-139-2-200307150-00013</mixed-citation></ref><ref id="scirp.86931-ref7"><label>7</label><mixed-citation publication-type="other" xlink:type="simple">Levey, A.S., Stevens, L.A., Schmid, C.H., et al. (2009) A New Equation to Estimate Glomerular Filtration Rate. Annals of Internal Medicine, 150, 604-612.  
https://doi.org/10.7326/0003-4819-150-9-200905050-00006</mixed-citation></ref><ref id="scirp.86931-ref8"><label>8</label><mixed-citation publication-type="other" xlink:type="simple">Lebart, L., Morineau, A. and Warwick, K. (1984) Multivariate Descriptive Statistical Analysis: Correspondence Analysis and Related Techniques for Large Matrices. Wiley, New York.</mixed-citation></ref><ref id="scirp.86931-ref9"><label>9</label><mixed-citation publication-type="other" xlink:type="simple">Escofier, B. and Pagès, J. (1990) Multiple Factor Analysis. Computational Statistics and Data Analysis, 18, 121-140. https://doi.org/10.1016/0167-9473(94)90135-X</mixed-citation></ref><ref id="scirp.86931-ref10"><label>10</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Pagès</surname><given-names> J. </given-names></name>,<etal>et al</etal>. (<year>2004</year>)<article-title>Analyse Factorielle de Données Mixtes</article-title><source> Revue de Statistique Appliquée</source><volume> 52</volume>,<fpage> 93</fpage>-<lpage>111</lpage>.<pub-id pub-id-type="doi"></pub-id></mixed-citation></ref><ref id="scirp.86931-ref11"><label>11</label><mixed-citation publication-type="other" xlink:type="simple">Saporta, G. (1977) Une Méthode et un Programme d’Analyse Discriminante sur Variables Qualitatives. Analyse des Données et Informatique, Inria, 201-210.</mixed-citation></ref><ref id="scirp.86931-ref12"><label>12</label><mixed-citation publication-type="other" xlink:type="simple">Rotella, F. and Borne, P. (1995) Théorie et Pratique du Calcul Matriciel. Editions Technip.</mixed-citation></ref><ref id="scirp.86931-ref13"><label>13</label><mixed-citation publication-type="other" xlink:type="simple">Carroll, J.D. (1968) A Generalization of Canonical Correlation Analysis to Three or More Sets of Variables. Proceedings of the 76th Annual Convention of the American Psychological Association, Washington DC, 227-228.</mixed-citation></ref><ref id="scirp.86931-ref14"><label>14</label><mixed-citation publication-type="other" xlink:type="simple">Friedman, J.H. and Meulman, J.J. (2004) Clustering Objects on Subsets of Attributes (with Discussion). Journal of the Royal Statistical Society: Series B (Statistical Methodology), 66, 815-849. https://doi.org/10.1111/j.1467-9868.2004.02059.x</mixed-citation></ref><ref id="scirp.86931-ref15"><label>15</label><mixed-citation publication-type="other" xlink:type="simple">Gower, J.C. (1971) A General Coefficient of Similarity and Some of its Properties. Biometrics, 27, 857-871. https://doi.org/10.2307/2528823</mixed-citation></ref><ref id="scirp.86931-ref16"><label>16</label><mixed-citation publication-type="other" xlink:type="simple">Genuer, R. and Poggi, J.M. (2017) Arbres CART et Forêts Aléatoires, Importance et Sélection de Variables. https://arxiv.org/pdf/1610.08203v2.pdf</mixed-citation></ref><ref id="scirp.86931-ref17"><label>17</label><mixed-citation publication-type="other" xlink:type="simple">Hastie, T., Tibshirani, R. and Friedman, J. (2009) The Elements of Statistical Learning. Springer, New York. https://doi.org/10.1007/978-0-387-84858-7</mixed-citation></ref><ref id="scirp.86931-ref18"><label>18</label><mixed-citation publication-type="other" xlink:type="simple">Efron, B. and Tibshirani, R.J. (1994) An Introduction to the Bootstrap. CRC Press, Boca Raton.</mixed-citation></ref><ref id="scirp.86931-ref19"><label>19</label><mixed-citation publication-type="other" xlink:type="simple">Breiman, L. (1996) Bagging Predictors. Machine Learning, 24, 123-140.  
https://doi.org/10.1007/BF00058655</mixed-citation></ref><ref id="scirp.86931-ref20"><label>20</label><mixed-citation publication-type="other" xlink:type="simple">In Lee, K. and Koval, J.J. (1997) Determination of the Best Significance Level in Forward Stepwise Logistic Regression. Communications in Statistics-Simulation and Computation, 26, 559-575. https://doi.org/10.1080/03610919708813397</mixed-citation></ref><ref id="scirp.86931-ref21"><label>21</label><mixed-citation publication-type="other" xlink:type="simple">Wang, Q., Koval, J.J., Mills, C.A. and Lee, K.I.D. (2007) Determination of the Selection Statistics and Best Significance Level in Backward Stepwise Logistic Regression. Communications in Statistics-Simulation and Computation, 37, 62-72.  
https://doi.org/10.1080/03610910701723625</mixed-citation></ref><ref id="scirp.86931-ref22"><label>22</label><mixed-citation publication-type="other" xlink:type="simple">Bendel, R.B. and Afifi, A.A. (1977) Comparison of Stopping Rules in Forward “Stepwise” Regression. Journal of the American Statistical Association, 72, 46-53.</mixed-citation></ref><ref id="scirp.86931-ref23"><label>23</label><mixed-citation publication-type="other" xlink:type="simple">Tibshirani, R. (1996) Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society. Series B (Methodological), 58, 267-288.  
http://www.jstor.org/stable/2346178</mixed-citation></ref><ref id="scirp.86931-ref24"><label>24</label><mixed-citation publication-type="other" xlink:type="simple">Breiman, L. (2001) Random Forests. Machine Learning, 45, 5-35.  
https://doi.org/10.1023/A:1010933404324</mixed-citation></ref><ref id="scirp.86931-ref25"><label>25</label><mixed-citation publication-type="other" xlink:type="simple">Song, L., Langfelder, P. and Horvath, S. (2013) Random Generalized Linear Model: A Highly Accurate and Interpretable Ensemble Predictor. BMC Bioinformatics, 14, 5. https://doi.org/10.1186/1471-2105-14-5</mixed-citation></ref><ref id="scirp.86931-ref26"><label>26</label><mixed-citation publication-type="book" xlink:type="simple">Akaike, H. (1998) Information Theory and an Extension of the Maximum Likelihood Principle. In: Parzen, E., Tanabe, K. and Kitagawa, G., Eds., Selected Papers of Hirotugu Akaike, Springer Series in Statistics (Perspectives in Statistics), Springer, New York, 199-213.</mixed-citation></ref><ref id="scirp.86931-ref27"><label>27</label><mixed-citation publication-type="other" xlink:type="simple">Schwarz, G. (1978) Estimating the Dimension of a Model. The Annals of Statistics, 6, 461-464. https://doi.org/10.1214/aos/1176344136</mixed-citation></ref><ref id="scirp.86931-ref28"><label>28</label><mixed-citation publication-type="other" xlink:type="simple">Tufféry, S. (2015) Modélisation Prédictive et Apprentissage Statistique avec R. Editions Technip.</mixed-citation></ref><ref id="scirp.86931-ref29"><label>29</label><mixed-citation publication-type="other" xlink:type="simple">Breiman, L. (1996) Out-of-Bag Estimation.  
https://www.stat.berkeley.edu/~breiman/OOBestimation.pdf</mixed-citation></ref><ref id="scirp.86931-ref30"><label>30</label><mixed-citation publication-type="other" xlink:type="simple">Dixon, W.J. (1960) Simplified Estimation from Censored Normal Samples. The Annals of Mathematical Statistics, 31, 385-391.  
https://doi.org/10.1214/aoms/1177705900</mixed-citation></ref><ref id="scirp.86931-ref31"><label>31</label><mixed-citation publication-type="other" xlink:type="simple">Royston, P. and Sauerbrei, W. (2007) Multivariable Modeling with Cubic Regression Splines: A Principled Approach. Stata Journal, 7, 45-70.</mixed-citation></ref><ref id="scirp.86931-ref32"><label>32</label><mixed-citation publication-type="other" xlink:type="simple">Oza, N.C. and Russell, S. (2001) Online Bagging and Boosting. Proceedings of Eighth International Workshop on Artificial Intelligence and Statistics, Key West, 4-7 January 2001, 105-112.</mixed-citation></ref><ref id="scirp.86931-ref33"><label>33</label><mixed-citation publication-type="other" xlink:type="simple">Duarte, K., Monnez, J.M. and Albuisson, E. (2018) Sequential Linear Regression with Online Standardized Data. PLoS ONE, 13, e0191186.  
https://doi.org/10.1371/journal.pone.0191186</mixed-citation></ref><ref id="scirp.86931-ref34"><label>34</label><mixed-citation publication-type="other" xlink:type="simple">Monnez, J.M. (2018) Online Logistic Regression Process with Online Standardized Data.</mixed-citation></ref></ref-list></back></article>