<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article  PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="3.0" xml:lang="en" article-type="research article"><front><journal-meta><journal-id journal-id-type="publisher-id">OJS</journal-id><journal-title-group><journal-title>Open Journal of Statistics</journal-title></journal-title-group><issn pub-type="epub">2161-718X</issn><publisher><publisher-name>Scientific Research Publishing</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.4236/ojs.2016.61009</article-id><article-id pub-id-type="publisher-id">OJS-63649</article-id><article-categories><subj-group subj-group-type="heading"><subject>Articles</subject></subj-group><subj-group subj-group-type="Discipline-v2"><subject>Physics&amp;Mathematics</subject></subj-group></article-categories><title-group><article-title>
 
 
  An Alternative Approach to AIC and Mallow’s Cp Statistic-Based Relative Influence Measures (RIMS) in Regression Variable Selection
 
</article-title></title-group><contrib-group><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>meh</surname><given-names>Edith Uzoma</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref><xref ref-type="corresp" rid="cor1"><sup>*</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Obulezi</surname><given-names>Okechukwu Jeremiah</given-names></name><xref ref-type="aff" rid="aff2"><sup>2</sup></xref></contrib></contrib-group><aff id="aff1"><addr-line>Department of Statistics, Nnamdi Azikiwe University, Awka, Nigeria</addr-line></aff><aff id="aff2"><addr-line>Department of Statistics, Abia State Polytechnic, Aba, Nigeria</addr-line></aff><author-notes><corresp id="cor1">* E-mail:<email>eu.umeh@unizik.ng(MEU)</email>;</corresp></author-notes><pub-date pub-type="epub"><day>03</day><month>02</month><year>2016</year></pub-date><volume>06</volume><issue>01</issue><fpage>70</fpage><lpage>75</lpage><history><date date-type="received"><day>2</day>	<month>January</month>	<year>2016</year></date><date date-type="rev-recd"><day>accepted</day>	<month>20</month>	<year>February</year>	</date><date date-type="accepted"><day>23</day>	<month>February</month>	<year>2016</year></date></history><permissions><copyright-statement>&#169; Copyright  2014 by authors and Scientific Research Publishing Inc. </copyright-statement><copyright-year>2014</copyright-year><license><license-p>This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0/</license-p></license></permissions><abstract><p>
 
 
  Outlier detection is an important data screening type. RIM is a mechanism of outlier detection that identifies the contribution of data points in a regression model. A BIC-based RIM is essentially a technique developed in this work to simultaneously detect influential data points and select optimal predictor variables. It is an addition to the body of existing literature in this area of study to both having an alternative to the AIC and Mallow’s Cp Statistic-based RIM as well as conditions of no influence, some sort of influence and perfectly single outlier data point in an entire data set which are proposed in this work. The method is implemented in R by an algorithm that iterates over all data points; deleting data points one at a time while computing BICs and selecting optimal predictors alongside RIMs. From the analyses done using evaporation data to compare the proposed method and the existing methods, the results show that the same data cases selected as having high influences by the two existing methods are also selected by the proposed method. The three methods show same performance; hence the relevance of the BIC-based RIM cannot be undermined.
 
</p></abstract><kwd-group><kwd>Relative Influence Measure (RIM)</kwd><kwd> BIC</kwd><kwd> AIC</kwd><kwd> Mallow’s Cp Statistic</kwd><kwd> Cook’s Distance</kwd></kwd-group></article-meta></front><body><sec id="s1"><title>1. Introduction</title><p>Model selection (variable selection) in regression has received great attention in literature in the recent times. A large number of predictors usually are introduced at the initial stage of modeling to attenuate possible modeling biases [<xref ref-type="bibr" rid="scirp.63649-ref1">1</xref>] . As noted by [<xref ref-type="bibr" rid="scirp.63649-ref2">2</xref>] , inference under models with too few parameters (variables) can be biased while with models having too many parameters (variables), there may be poor precision or identification of effects. Hence, the need for a balance between under- and over-fitted models is known as variable selection.</p><p>Influential observation is a special case of outliers. In the simplest sense, outlying or extreme values are observations which are well separated from the remainder of the data. Outliers result from either (1) the errors of measurement or (2) intrinsic variability (mean shift-inflation of variances or others) and appear either in the form of (i) change in the direction of response (Y) variable, (ii) deviation in the space of explanatory variables, deviated points in X-direction called leverage points or (iii) change in both the directions (direction of the explanatory variable(s) and the response variable). These outlying observations may involve large residuals and often have dramatic effects on the fitted least squares regression function. The influence of an individual case (data point) in a regression model can be adverse causing a significant shift (upward or downward) in the value of the parameters of a model in turn reducing the predictive power of the model. Only few papers dealing with the influence of individual data cases in regression explicitly take an initial variable selection step into account. This problem is handled by [<xref ref-type="bibr" rid="scirp.63649-ref3">3</xref>] -[<xref ref-type="bibr" rid="scirp.63649-ref6">6</xref>] .</p><p>One objective of regression variable selection is to reduce the predictors to some optimal subset of the available regressors [<xref ref-type="bibr" rid="scirp.63649-ref3">3</xref>] . In literature, several approaches of variable selection exist, which include the stepwise deletion and subset selection. Stepwise deletion includes regression models in which the choice of predictive variables is carried out by an automatic procedure. Usually, this takes the form of a sequence of F-tests or t-tests, but other techniques are possible, such as adjusted R-square, AIC, BIC, Mallow’s statistic, PRESS or false discovery rate [<xref ref-type="bibr" rid="scirp.63649-ref7">7</xref>] .</p><p>[<xref ref-type="bibr" rid="scirp.63649-ref8">8</xref>] proposed the coefficient of determination ratio (CDR) which was based on the value coefficient of determination (R<sup>2</sup>) of the linear regression model. [<xref ref-type="bibr" rid="scirp.63649-ref9">9</xref>] developed an outlier detection and robust selection method that combined robust least angle regression with least trimmed squares regression on jack-knife subset. When the detected outliers are removed, the standard least angle regression is applied on the cleaned data to robustly sequence the predictor variables in order of importance. [<xref ref-type="bibr" rid="scirp.63649-ref3">3</xref>] proposed a method called the Relative Influence Measure using the Mallow’s <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x6.png" xlink:type="simple"/></inline-formula> and AIC Statistics. These methods are dimensionally consistent, computationally efficient and able to identify influential case, though, failed in asymptotic consistency. [<xref ref-type="bibr" rid="scirp.63649-ref10">10</xref>] in comparing the BIC and AIC, stated that the AIC was not consistent. That is, as the number of observations n grows very large, the probability that AIC recovers a true low-dimensional model does not approach unity [<xref ref-type="bibr" rid="scirp.63649-ref11">11</xref>] . [<xref ref-type="bibr" rid="scirp.63649-ref12">12</xref>] supported same argument that the BIC has the advantage of being asymptotically consistent: as<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x7.png" xlink:type="simple"/></inline-formula>, BIC will select the correct model.</p><p>Hence, the specific objectives of this paper are to propose a relative influence measurewith an indication of whether the fit of the selected model improves or deteriorates owing to the presence of an observation (case) and that retains asymptotic consistency and hence not violating the sampling properties of the model parameters.</p></sec><sec id="s2"><title>2. Existing Methods</title><sec id="s2_1"><title>2.1. Cook’s Distance and the Influence Measure</title><p>Let V be the set of indices corresponding to the predictor variables selected from the full data set and let <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x8.png" xlink:type="simple"/></inline-formula> be the prediction vector based on the selected variables and calculated from the full data set. Also let <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x9.png" xlink:type="simple"/></inline-formula> be the prediction vector based on the variables corresponding to V, but calculated from the full data set without case i. [<xref ref-type="bibr" rid="scirp.63649-ref3">3</xref>] noted that <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x10.png" xlink:type="simple"/></inline-formula> contains prediction for case i, although this case is not used in calculating<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x11.png" xlink:type="simple"/></inline-formula>. The conditional Cook’s distance for the <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x12.png" xlink:type="simple"/></inline-formula> case is</p><disp-formula id="scirp.63649-formula1770"><label>(1)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/9-1240643x13.png"  xlink:type="simple"/></disp-formula><p>approximately scaled. Here, <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x14.png" xlink:type="simple"/></inline-formula>denotes the Euclidean norm. Repeating the variable selection using the data without case i as pointed out by [<xref ref-type="bibr" rid="scirp.63649-ref12">12</xref>] , this selection yields a subset <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x15.png" xlink:type="simple"/></inline-formula> of indices with <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x16.png" xlink:type="simple"/></inline-formula> possibly different from V. Hence, the unconditional Cook’s distance is</p><disp-formula id="scirp.63649-formula1771"><label>(2)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/9-1240643x17.png"  xlink:type="simple"/></disp-formula><p>approximately scaled. Since the unconditional version explicitly takes the selection effect into account [<xref ref-type="bibr" rid="scirp.63649-ref13">13</xref>] argued that it is preferable. As explained in the literature, a measure say M calculated from the complete data set can as well be calculated from the reduced data set as <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x18.png" xlink:type="simple"/></inline-formula> and then quantify the influence of case i in terms of a function <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x19.png" xlink:type="simple"/></inline-formula> of M and<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x20.png" xlink:type="simple"/></inline-formula>. The <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x20.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x21.png" xlink:type="simple"/></inline-formula> has to be based on the difference in the value of the selection criterion before and after omitting case i. This difference <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x20.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x21.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x22.png" xlink:type="simple"/></inline-formula> may then be divided by M in order to calculate the relative change in the selection criterion. As proposed by [<xref ref-type="bibr" rid="scirp.63649-ref3">3</xref>] , the influence measure for the <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x20.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x21.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x22.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x23.png" xlink:type="simple"/></inline-formula> case when the Cook’s distance is used becomes</p><disp-formula id="scirp.63649-formula1772"><label>(3)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/9-1240643x24.png"  xlink:type="simple"/></disp-formula></sec><sec id="s2_2"><title>2.2. Mallow’s C<sub>p</sub> Estimate and the Influence Measure</title><p>Let Y be an <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x25.png" xlink:type="simple"/></inline-formula> vector of response in a linear regression with corresponding <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x25.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x26.png" xlink:type="simple"/></inline-formula> design matrix X of explanatory variables. A traditional model is</p><disp-formula id="scirp.63649-formula1773"><label>(4)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/9-1240643x27.png"  xlink:type="simple"/></disp-formula><p>where<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x28.png" xlink:type="simple"/></inline-formula>, <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x28.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x29.png" xlink:type="simple"/></inline-formula>and <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x28.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x29.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x30.png" xlink:type="simple"/></inline-formula> are unknown parameters. Usually, <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x28.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x29.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x30.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x31.png" xlink:type="simple"/></inline-formula>is a <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x28.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x29.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x30.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x31.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x32.png" xlink:type="simple"/></inline-formula> vector of parameters and<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x28.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x29.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x30.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x31.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x32.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x33.png" xlink:type="simple"/></inline-formula>. <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x28.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x29.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x30.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x31.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x32.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x33.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x34.png" xlink:type="simple"/></inline-formula>often contains redundant or unimportant variables. Let RSS be the usual sum of squares from</p><p>the OLS fit of (4), then <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x35.png" xlink:type="simple"/></inline-formula> is the commonly used unbiased estimator of<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x35.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x36.png" xlink:type="simple"/></inline-formula>. Consider the subset</p><p>V of <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x37.png" xlink:type="simple"/></inline-formula> and let <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x37.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x38.png" xlink:type="simple"/></inline-formula> be the residual sum of squares from the least squares fit using only the regressors corresponding to the indices in V together with an intercept. The <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x37.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x38.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x39.png" xlink:type="simple"/></inline-formula> statistic corresponding model is</p><disp-formula id="scirp.63649-formula1774"><label>(5)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/9-1240643x40.png"  xlink:type="simple"/></disp-formula><p>where v is the number of indices in V. Variable selection based on (5) entails calculating <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x41.png" xlink:type="simple"/></inline-formula> for each subset of <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x41.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x42.png" xlink:type="simple"/></inline-formula> and selecting the variables corresponding to<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x41.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x42.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x43.png" xlink:type="simple"/></inline-formula>; the subset minimizing (5). This approach is based on the fact that for a given V, <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x41.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x42.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x43.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x44.png" xlink:type="simple"/></inline-formula>is an estimate of the expected squared error if a (multiple) linear regression function based on the variables corresponding to V is used to predict<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x41.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x42.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x43.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x44.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x45.png" xlink:type="simple"/></inline-formula>, a new (future) observation of the response random vector Y. Therefore, choosing <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x41.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x42.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x43.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x44.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x45.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x46.png" xlink:type="simple"/></inline-formula> to minimize (5) is equivalent to selecting the variables which minimize the estimated expected prediction error. As proposed by [<xref ref-type="bibr" rid="scirp.63649-ref3">3</xref>] , the influence measure for the <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x41.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x42.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x43.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x44.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x45.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x46.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x47.png" xlink:type="simple"/></inline-formula> case when the <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x41.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x42.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x43.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x44.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x45.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x46.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x47.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x48.png" xlink:type="simple"/></inline-formula> criterion is used becomes</p><disp-formula id="scirp.63649-formula1775"><label>(6)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/9-1240643x49.png"  xlink:type="simple"/></disp-formula><p>where <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x50.png" xlink:type="simple"/></inline-formula> is calculated as in (5) but with the <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x50.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x51.png" xlink:type="simple"/></inline-formula> case omitted. In calculating<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x50.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x51.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x52.png" xlink:type="simple"/></inline-formula>, the estimator for the error variance <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x50.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x51.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x52.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x53.png" xlink:type="simple"/></inline-formula> is obtained from the full data set.</p></sec><sec id="s2_3"><title>2.3. The AIC Estimate and the Influence Measure</title><p>The AIC is based on the maximized log-likelihood function of the model under consideration. Suppose<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x54.png" xlink:type="simple"/></inline-formula>, and ignoring constant terms, the maximized log-likelihood for the model corresponding to a sub-</p><p>set V is given by<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x55.png" xlink:type="simple"/></inline-formula>. This is a non-decreasing function of the number of selected regressors.</p><p>[<xref ref-type="bibr" rid="scirp.63649-ref13">13</xref>] therefore included a penalty termviz;<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x56.png" xlink:type="simple"/></inline-formula>, which equals the number of parameters which have to be estimated. Multiplying the resulting expression by −2 yields</p><disp-formula id="scirp.63649-formula1776"><label>(7)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/9-1240643x57.png"  xlink:type="simple"/></disp-formula><p>See [<xref ref-type="bibr" rid="scirp.63649-ref14">14</xref>] for details. It is known that <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x58.png" xlink:type="simple"/></inline-formula> does not perform when the number of parameters to be estimated is large compared to the sample size (typically cases where<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x58.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x59.png" xlink:type="simple"/></inline-formula>. In such a case, a modified version of (7) should be</p><disp-formula id="scirp.63649-formula1777"><label>(8)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/9-1240643x60.png"  xlink:type="simple"/></disp-formula><p>Variable selection based on (5) and (8) calculating the criterion for each subset V of <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x61.png" xlink:type="simple"/></inline-formula> and selecting the variables corresponding to the minimizing subset. This is equivalent to selecting the variables which maximize a penalized version of the maximum log-likelihood. As proposed by [<xref ref-type="bibr" rid="scirp.63649-ref3">3</xref>] , the influence measure for the <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x61.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x62.png" xlink:type="simple"/></inline-formula> case when the AIC criterion is used becomes</p><disp-formula id="scirp.63649-formula1778"><label>(9)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/9-1240643x63.png"  xlink:type="simple"/></disp-formula><p>The value of <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x64.png" xlink:type="simple"/></inline-formula> in (9) is obtained by using either (7) or (8) but with the <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x64.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x65.png" xlink:type="simple"/></inline-formula> case omitted.</p></sec><sec id="s2_4"><title>2.4. The Proposed BIC-Based Relative Influence Measure</title><p>A popular alternative to AIC as proposed by [<xref ref-type="bibr" rid="scirp.63649-ref15">15</xref>] is the Bayesian Information Criterion (BIC) [<xref ref-type="bibr" rid="scirp.63649-ref16">16</xref>] - [<xref ref-type="bibr" rid="scirp.63649-ref18">18</xref>] .</p><p>The BIC is formally defined as</p><disp-formula id="scirp.63649-formula1779"><label>(10)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/9-1240643x66.png"  xlink:type="simple"/></disp-formula><p><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x67.png" xlink:type="simple"/></inline-formula>, where <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x67.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x68.png" xlink:type="simple"/></inline-formula> are theparameter values that maximize the likelihood function. The BIC is an asymptotic result derived under the assumptions that the data distribution is in the exponential family. That is,</p><p>the integral of the likelihood function <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x69.png" xlink:type="simple"/></inline-formula> times the prior probability distribution <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x69.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x70.png" xlink:type="simple"/></inline-formula> over the</p><p>parameters <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x71.png" xlink:type="simple"/></inline-formula> of the model M for fixed observed data y is approximated as</p><disp-formula id="scirp.63649-formula1780"><label>(11)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/9-1240643x72.png"  xlink:type="simple"/></disp-formula><p>Under the assumption that the model errors or disturbances are independent and identically distributed according to a normal distribution and that the boundary condition that the derivative of the log likelihood with respect to the true variance is zero, this becomes (up to an additive constant, which depends only on n and not on the model).</p><disp-formula id="scirp.63649-formula1781"><label>(12)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/9-1240643x73.png"  xlink:type="simple"/></disp-formula><p>where <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x74.png" xlink:type="simple"/></inline-formula> is the error variance. The error variance in this case is defined as</p><disp-formula id="scirp.63649-formula1782"><label>(13)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/9-1240643x75.png"  xlink:type="simple"/></disp-formula><p>which is a biased estimator for the true variance. In terms of residual sum of squares, the BIC is defined thus</p><disp-formula id="scirp.63649-formula1783"><label>(14)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/9-1240643x76.png"  xlink:type="simple"/></disp-formula><p>The BIC is an increasing function of the error variance <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x77.png" xlink:type="simple"/></inline-formula> and an increasing function of v. That is, unexplained variations in the dependent variable and the number of explanatory variables increase the value of BIC. Hence, lower BIC implies either fewer explanatory variables, better fit or both.</p><p>Based on (14), the proposed influence measure for the <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x78.png" xlink:type="simple"/></inline-formula> case when the BIC criterion is used becomes</p><disp-formula id="scirp.63649-formula1784"><label>(15)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/9-1240643x79.png"  xlink:type="simple"/></disp-formula><p>(2.15) can take the form of</p><disp-formula id="scirp.63649-formula1785"><label>(16)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/9-1240643x80.png"  xlink:type="simple"/></disp-formula><p>Suppose<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x81.png" xlink:type="simple"/></inline-formula>, by invoking trichotomy law of real numbers, (16) can be rewritten as</p><disp-formula id="scirp.63649-formula1786"><label>(17)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/9-1240643x82.png"  xlink:type="simple"/></disp-formula><disp-formula id="scirp.63649-formula1787"><graphic  xlink:href="http://html.scirp.org/file/9-1240643x83.png"  xlink:type="simple"/></disp-formula><p>The values of <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x84.png" xlink:type="simple"/></inline-formula> in (15, 16 and 17) are obtained by using (14) but with the <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x84.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x85.png" xlink:type="simple"/></inline-formula> case omitted. Steel and Uys (2007) claimed that influence measure can be calculated for all selection criteria where the particular criterion is a combination of some sort of goodness-of-fit measure and a penalty function (such a penalty function usually include the number of predictors of the particular selected model as one of its components [<xref ref-type="bibr" rid="scirp.63649-ref19">19</xref>] . Closely evaluating (14), it is clear that <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x84.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x85.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/9-1240643x86.png" xlink:type="simple"/></inline-formula> is a huge penalty term compared to the penalty term in (5) and (8) and hence it gives a good model fit of the data set.</p></sec></sec><sec id="s3"><title>3. Results</title><p>The results above <xref ref-type="table" rid="table1">Table 1</xref> show that the method was able to detect cases 33 and 41 as having high influence on the model given that their respective RIMs are relatively larger than others just as the AIC and Mallow’s C<sub>p</sub> Statistic-based RIM detected. The method proposed here for simultaneously detecting influential data points and variable selection, detects outliers one at a time. However, further study can be embarked upon to detect multiple influential data points all at a time while selecting optimal predictor variables.</p><p>The problems of masking and swamping were not covered in this study. Masking occurs when one outlier is not detected because of the presence of others; swamping occurs when a non-outlier is wrongly identified owing to the effect of some hidden outliers. Therefore, further studies can be carried out to detect influential outliers and simultaneously select optimal predictor variables while incorporating the solutions to problems of masking and swamping.</p><table-wrap id="table1" ><label><xref ref-type="table" rid="table1">Table 1</xref></label><caption><title> BIC-based RIM for the Evaporation data contained in [<xref ref-type="bibr" rid="scirp.63649-ref20">20</xref>] </title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Case Omitted</th><th align="center" valign="middle" >Variables Selected</th><th align="center" valign="middle" >Influence Measure (BIC)</th><th align="center" valign="middle" ></th></tr></thead><tr><td align="center" valign="middle" >1</td><td align="center" valign="middle" >1, 3, 6, 9</td><td align="center" valign="middle" >0.01797914</td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" >2</td><td align="center" valign="middle" >1, 3, 6, 9</td><td align="center" valign="middle" >0.03826104</td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" >14</td><td align="center" valign="middle" >1, 3, 6, 9</td><td align="center" valign="middle" >0.0179898</td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" >15</td><td align="center" valign="middle" >1, 3, 6, 9</td><td align="center" valign="middle" >0.01751532</td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" >31</td><td align="center" valign="middle" >1, 3, 4, 8, 9</td><td align="center" valign="middle" >0.03268965</td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" >32</td><td align="center" valign="middle" >1, 3, 6, 9</td><td align="center" valign="middle" >0.02016968</td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" >33</td><td align="center" valign="middle" >6, 9, 10</td><td align="center" valign="middle" >0.05645203</td><td align="center" valign="middle" ><sup>***</sup>high influence measure comparable to the Steel &amp; Uys (2007) paper results that used the C<sub>p</sub> and AIC</td></tr><tr><td align="center" valign="middle" >34</td><td align="center" valign="middle" >1, 3, 6, 9</td><td align="center" valign="middle" >0.01791702</td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" >40</td><td align="center" valign="middle" >6, 9, 10</td><td align="center" valign="middle" >0.03512789</td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" >41</td><td align="center" valign="middle" >1, 3, 6, 9</td><td align="center" valign="middle" >0.06042516</td><td align="center" valign="middle" ><sup>***</sup>high influence measure comparable to the Steel &amp; Uys (2007) paper results that used the C<sub>p</sub> and AIC</td></tr><tr><td align="center" valign="middle" >42</td><td align="center" valign="middle" >1, 3, 6, 9</td><td align="center" valign="middle" >0.02053766</td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" >45</td><td align="center" valign="middle" >1, 3, 6, 9</td><td align="center" valign="middle" >0.01754306</td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" >46</td><td align="center" valign="middle" >1, 3, 6, 9</td><td align="center" valign="middle" >0.01905877</td><td align="center" valign="middle" ></td></tr></tbody></table></table-wrap><p>Again, because it was not intended initially to carry out a test of convergence through which we can compare the computational cost of the three methods, this work avoided the task of re-sampling which was done by Steel and Uys (2007). Meanwhile Steel and Uys (2007) did not run any test of convergence after bootstrapping rather they calculated the estimated average prediction error to substantiate their results. Hence, their additional task of re-sampling is a repetition of the results they achieved with their methods and as a result it is not necessary in this study. One can further implement these existing methods by adding a test of convergence after re-sampling.</p></sec><sec id="s4"><title>4. Conclusion</title><p>Two things are unique about this paper namely a new approach to detecting influential outlier and then the conditions for the interpretation of the result. The later is achieved by invoking the trichotomy law of real numbers. The proposed method penalizes models hugely as the sample size becomes very large and hence has greater likelihood of choosing a better model while detecting influential data cases one at a time.</p></sec><sec id="s5"><title>Cite this paper</title><p>Umeh EdithUzoma,Obulezi OkechukwuJeremiah, (2016) An Alternative Approach to AIC and Mallow’s Cp Statistic-Based Relative Influence Measures (RIMS) in Regression Variable Selection. Open Journal of Statistics,06,70-75. doi: 10.4236/ojs.2016.61009</p></sec></body><back><ref-list><title>References</title><ref id="scirp.63649-ref1"><label>1</label><mixed-citation publication-type="other" xlink:type="simple">Fan, J. and Li, R. (2001) Variable Selection via Nonconcave Penalized Likelihood and Its Oracle Properties. Journal of the American Statistical Association, 96, 1348-1360. http://dx.doi.org/10.1198/016214501753382273</mixed-citation></ref><ref id="scirp.63649-ref2"><label>2</label><mixed-citation publication-type="other" xlink:type="simple">Burnham, K.P. and Anderson, D.R. (2004) Kullback-Leibler Information as a Basis for Strong Inference in Ecological Studies. Wildlife Research, 28, 111-119. http://dx.doi.org/10.1071/WR99107</mixed-citation></ref><ref id="scirp.63649-ref3"><label>3</label><mixed-citation publication-type="other" xlink:type="simple">Steel, S.J. and Uys, D.W. (2007) Variable Selection in Multiple Linear Regression: The Influence of Individual Cases. ORiON, 23, 123-136. http://dx.doi.org/10.5784/23-2-52</mixed-citation></ref><ref id="scirp.63649-ref4"><label>4</label><mixed-citation publication-type="other" xlink:type="simple">Cook, R.D. (1977) Detection of Influential Observations in Linear Regression. Technometrics, 19, 15-18. http://dx.doi.org/10.1080/00401706.1977.10489493</mixed-citation></ref><ref id="scirp.63649-ref5"><label>5</label><mixed-citation publication-type="other" xlink:type="simple">Cook, R.D. (1986) Assessment of Local Influence. Journal of the Royal Statistical Society, Series B, 48, 133-169.</mixed-citation></ref><ref id="scirp.63649-ref6"><label>6</label><mixed-citation publication-type="other" xlink:type="simple">Belsley, D.A., Kul, E. and Welsch, R.E. (1980) Regression Diagnostics. Wiley, New York. http://dx.doi.org/10.1002/0471725153</mixed-citation></ref><ref id="scirp.63649-ref7"><label>7</label><mixed-citation publication-type="other" xlink:type="simple">Tibshirani, R.J. (1997) The LASSO Method for Variable Selection in the Cox Model. Statistics in Medicine, 16, 385-395. http://dx.doi.org/10.1002/(SICI)1097-0258(19970228)16:4&lt;385::AID-SIM380&gt;3.0.CO;2-3</mixed-citation></ref><ref id="scirp.63649-ref8"><label>8</label><mixed-citation publication-type="other" xlink:type="simple">Zakaria, A., Howard, N.K. and Nkansah, B.K. (2014) On the Detection of Influential Outliers in Linear Regression Analysis. American Journal of Theoretical and Applied Statistics, 3, 100-106. http://dx.doi.org/10.11648/j.ajtas.20140304.14</mixed-citation></ref><ref id="scirp.63649-ref9"><label>9</label><mixed-citation publication-type="other" xlink:type="simple">Shahriari, S., Faria, S., Goricalves, A.M. and Van Aelst, S. (2014) Outlier Detection and Robust Variable Selection for Least Angle Regression. Computational Science and Its Application-ICCSA, Vol. 8581, Springer-Verlag, New York, 512-522.</mixed-citation></ref><ref id="scirp.63649-ref10"><label>10</label><mixed-citation publication-type="other" xlink:type="simple">Wagenmakers, E.J. and Farrell, S. (2004) AIC Model Selection Using Akaike Weights. Psychonomic Bulletin and Review, 11, 192-196. http://dx.doi.org/10.3758/BF03206482</mixed-citation></ref><ref id="scirp.63649-ref11"><label>11</label><mixed-citation publication-type="other" xlink:type="simple">Bozdogan, H. (1987) Model Selection and Akaike’s Information Criterion (AIC): The General Theory and Its Analytical Extensions. Psychometrika, 52, 345-370. http://dx.doi.org/10.1007/BF02294361</mixed-citation></ref><ref id="scirp.63649-ref12"><label>12</label><mixed-citation publication-type="other" xlink:type="simple">Guetta, D. (2010) High Dimensional Variable Selection. www.columbia.edu/.../part IIIEssay.pdf</mixed-citation></ref><ref id="scirp.63649-ref13"><label>13</label><mixed-citation publication-type="other" xlink:type="simple">Leger, C. and Altman, N. (1993) Assessing Influence in Variable Selection Problems. Journal of the American Statistical Association, 88, 547-556.</mixed-citation></ref><ref id="scirp.63649-ref14"><label>14</label><mixed-citation publication-type="other" xlink:type="simple">Akaike, H. (1973) Information Theory and an Extension of the Maximum Likelihood Principle. The 2nd International Symposium on Information Theory, Budapest, 267-281.</mixed-citation></ref><ref id="scirp.63649-ref15"><label>15</label><mixed-citation publication-type="other" xlink:type="simple">Schwarz, G. (1978) Estimating the Dimension of a Model. Annals of Statistics, 6, 461-464. http://dx.doi.org/10.1214/aos/1176344136</mixed-citation></ref><ref id="scirp.63649-ref16"><label>16</label><mixed-citation publication-type="other" xlink:type="simple">Burnham, K.P. and Anderson, D.R. (2002) Model Selection and Multi-Model Inference. Springer, New York.</mixed-citation></ref><ref id="scirp.63649-ref17"><label>17</label><mixed-citation publication-type="other" xlink:type="simple">Hastie, T., Tibshirani, R. and Freidman, J. (2001) The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer-Verlag, New York. http://dx.doi.org/10.1007/978-0-387-21606-5</mixed-citation></ref><ref id="scirp.63649-ref18"><label>18</label><mixed-citation publication-type="other" xlink:type="simple">Kass, R.E. and Raftery, A.E. (1995) Bayes Factors. Journal of the American Statistical Association, 90, 773-795. http://dx.doi.org/10.1080/01621459.1995.10476572</mixed-citation></ref><ref id="scirp.63649-ref19"><label>19</label><mixed-citation publication-type="other" xlink:type="simple">Kundu, D. and Murali, G. (1996) Model Selection in Linear Regression. Computational Statistics and Data Analysis, 22, 461-469. http://dx.doi.org/10.1016/0167-9473(96)00008-4</mixed-citation></ref><ref id="scirp.63649-ref20"><label>20</label><mixed-citation publication-type="other" xlink:type="simple">Freund, R.J. (1979) Multicollinearity etc.: Some “New” Examples. Proceedings of the Statistical Computing Section, American Statistical Association, USA, 111-112.</mixed-citation></ref></ref-list></back></article>