<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article  PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="3.0" xml:lang="en" article-type="research article"><front><journal-meta><journal-id journal-id-type="publisher-id">OJS</journal-id><journal-title-group><journal-title>Open Journal of Statistics</journal-title></journal-title-group><issn pub-type="epub">2161-718X</issn><publisher><publisher-name>Scientific Research Publishing</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.4236/ojs.2022.121001</article-id><article-id pub-id-type="publisher-id">OJS-115008</article-id><article-categories><subj-group subj-group-type="heading"><subject>Articles</subject></subj-group><subj-group subj-group-type="Discipline-v2"><subject>Physics&amp;Mathematics</subject></subj-group></article-categories><title-group><article-title>
 
 
  Quasi-Binomial Regression Model for the Analysis of Data with Extra-Binomial Variation
 
</article-title></title-group><contrib-group><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Mohamed</surname><given-names>M. Shoukri</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref><xref ref-type="corresp" rid="cor1"><sup>*</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Maha</surname><given-names>M. Aleid</given-names></name><xref ref-type="aff" rid="aff2"><sup>2</sup></xref></contrib></contrib-group><aff id="aff1"><addr-line>Department of Epidemiology and Biostatistics, Schulich School of Medicine and Dentistry, University of Western Ontario, London Ontario, Canada</addr-line></aff><aff id="aff2"><addr-line>Department of Biostatistics, Epidemiology and Scientific Computing, King Faisal Specialist Hospital and Research Center, Riyadh, KSA</addr-line></aff><pub-date pub-type="epub"><day>28</day><month>01</month><year>2022</year></pub-date><volume>12</volume><issue>01</issue><fpage>1</fpage><lpage>14</lpage><history><date date-type="received"><day>17,</day>	<month>December</month>	<year>2021</year></date><date date-type="rev-recd"><day>26,</day>	<month>January</month>	<year>2022</year>	</date><date date-type="accepted"><day>29,</day>	<month>January</month>	<year>2022</year></date></history><permissions><copyright-statement>&#169; Copyright  2014 by authors and Scientific Research Publishing Inc. </copyright-statement><copyright-year>2014</copyright-year><license><license-p>This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0/</license-p></license></permissions><abstract><p>
 
 
  Objectives
  : Developing inference procedures on the quasi-binomial distribution and the regression model. <b>Methods</b>: Score testing and the method of maximum likelihood for regression parameters estimation. <b>Data</b>: Several examples are included, based on published data. <b>Results</b>: A quasi-binomial model is used to model binary response data which exhibit extra-binomial variation. A partial score test on the binomial hypothesis versus the quasi-binomial alternative is developed and illustrated on three data sets. The extended logit transformation on the binomial parameter is introduced and the large sample dispersion matrix of the estimated parameters is derived. The Nonlinear Mixed Procedure (NLMIXED) in SAS is shown to be very appropriate for the estimation of nonlinear regression.
 
</p></abstract><kwd-group><kwd>Quasi-Binomial Distribution</kwd><kwd> Extra Binomial Variations</kwd><kwd> Score Test</kwd><kwd> Quasi-Binomial Regression Model</kwd><kwd> COVID-19 Case Fatality Data</kwd></kwd-group></article-meta></front><body><sec id="s1"><title>1. Introduction</title><p>In many biological and toxicological experiments, the variable of interest is in the form of counts resulting from binary responses. In such experiments the data may sometimes exhibit greater heterogeneity (variation) than the binomial model. It has long been presumed that an inherent characteristic of data from these types of studies is the tendency for individual experimental units to respond more alike than individuals from other groups, which is commonly known as the “group effect”. When the experimental units are animals which are treated with varying doses of compounds, such group effect is also known as “litter effect”. The litters in each group contain varying numbers of live fetuses and some of these have a specific abnormality. To explain the extra-variation caused by the “litter effect”, several generalized statistical models have been proposed in the literature. Altham [<xref ref-type="bibr" rid="scirp.115008-ref1">1</xref>] proposed that the analysis of such experiments be based on two-parameter generalizations of the binomial model which allows for the presence of dependent responses within groups and gave two models. Kupper and Haseman [<xref ref-type="bibr" rid="scirp.115008-ref2">2</xref>] suggested a correlated binomial model which is identical to Altham’s additive generalization of the binomial model. Williams [<xref ref-type="bibr" rid="scirp.115008-ref3">3</xref>] proposed that the analysis of toxicological studies be based on the beta-binomial model, which is another generalization of the binomial model. However, [<xref ref-type="bibr" rid="scirp.115008-ref1">1</xref>] indicated that the beta-binomial model allows only positive association between the subjects of a group whereas the correlated binomial and the multiplicative generalization of the binomial model allow negative as well as positive associations. A much wider class of family of distributions known as “The generalized Linear Mixed Models” or GLMM [<xref ref-type="bibr" rid="scirp.115008-ref4">4</xref>] is developed and is used extensively in many applications and to deal with overdispersion that exists in count and binary data.</p><p>In this paper we show that the quasi-binomial distribution of Consul [<xref ref-type="bibr" rid="scirp.115008-ref5">5</xref>] reviewed by Shenton in [<xref ref-type="bibr" rid="scirp.115008-ref6">6</xref>] can be used as an alternative model for the analysis of overly dispersed dichotomous data. The quasi-binomial (QBD) model has two parameters p and ϕ . The parameter p will be called the binomial parameter and the other parameter ϕ will be called the dispersion parameter. When ϕ = 0 , the quasi-binomial distribution (QBD) reduces to the binomial distribution. Since the binomial distribution hypothesis is the focus of our investigations, it is natural to derive a test statistic for testing the null hypothesis ϕ = 0 .</p><p>The paper is structured as follows: in Section 2 we derive the C ( α ) binomial score test of significance [<xref ref-type="bibr" rid="scirp.115008-ref7">7</xref>] and [<xref ref-type="bibr" rid="scirp.115008-ref8">8</xref>] which is asymptomatically optimal against a QBD alternative and apply the test to some real data in Section 3. In Section 4 we develop a QBD regression model to account for possible extraneous sources of variation. The methods are applied to COVID-19 mortality data.</p><p>The flowchart in the Appendix outlines the steps of the model developments and the applications.</p></sec><sec id="s2"><title>2. Quasi-Binomial Distribution and C ( α ) Binomial Score Test of Significance</title><p>A discrete random variable Y is said to have a QBD if and only if its probability function is given from [<xref ref-type="bibr" rid="scirp.115008-ref6">6</xref>] as:</p><p>P r ( Y = y ) = p ( y ) = ( m y ) p ( p + y ϕ ) y − 1 ( 1 − p − y ϕ ) m − y , (1)</p><p>for y = 0 , 1 , 2 , 3 , ⋯ , m and zero otherwise and where 0 &lt; p &lt; 1 , − p / m &lt; ϕ &lt; ( 1 − p ) / m . It reduces to the binomial when ϕ = 0 . The r.v. Y represents the number of successes in m trials such that the probability for the first success is p and that the probability of success in each of the other trials is p + y ϕ . Thus the probability of success increases or decreases as ϕ is positive or negative and is directly proportional to the number of successes y. All the moments of the QBD are finite and the parameter ϕ has a very substantial effect on the model. The Variance of the QBD is larger or smaller than the variance of the binomial model depending upon ϕ &gt; 0 or ϕ &lt; 0 . Consul [<xref ref-type="bibr" rid="scirp.115008-ref9">9</xref>] provided a detailed study of the characteristics of the QBD and gave numerous properties and moment based estimation of the model parameters. The mean μ of the QBD model (1) is given by</p><p>μ = m p [ 1 + ∑ j = 1 m − 1 ϕ j ( m − 1 ) ( j ) ] (2)</p><p>We shall formulate a C ( α ) test for testing the binomial model against the QBD alternative. This can be done by testing the null hypothesis H 0 : ϕ = 0 against its negation in the presence of the nuisance parameter p. Moran [<xref ref-type="bibr" rid="scirp.115008-ref9">9</xref>] showed that for such problems the C ( α ) tests, suggested by Neyman [<xref ref-type="bibr" rid="scirp.115008-ref8">8</xref>], are asymptomatically equivalent to tests using the maximum likelihood estimates.</p><p>Let Y 1 , Y 2 , ⋯ , Y n be n independent random variables where each r. v. Y i is distributed as a QBD with ( m i , p , ϕ ) . The likelihood function L is given by (3):</p><p>L = ∏ i = 1 n [ ( m i y i ) p ( p + y i ϕ ) y i − 1 ( 1 − p − y i ϕ ) m i − y i ] (3)</p><p>and, its logarithm (4) equals</p><p>l = constant + n ln p + ∑ i = 1 n ( y i − 1 ) ln ( p + y i ϕ ) + ∑ i = 1 n ( m i − y i ) ln ( 1 − p − y i ϕ ) (4)</p><p>To derive the C ( α ) test statistic for H 0 : ϕ = 0 , the first and second partial derivatives of the log-likelihood function l , evaluated at ϕ = 0 , are needed.</p><p>All summations are from i = 1 to n in the expressions unless stated otherwise. Differentiating the right hand-side of (4) with respect to the model parameters, and setting ϕ = 0 we get</p><p>∂ l ∂ ϕ | ϕ = 0 = T 1 ( p ) = p − 1 ∑ y i ( y i − 1 ) − q − 1 ∑ y i ( m i − y i ) ∂ l ∂ p | ϕ = 0 = T 2 ( p ) = p − 1 ∑ y i − q − 1 ∑ y i ( m i − y i ) } (5)</p><p>where q = 1 − p .</p><p>Setting the second equation in (5) to zero and solving for p yields</p><p>p ^ = ∑ y i / ∑ m i (6)</p><p>as the maximum likelihood estimator of p under H 0 : ϕ = 0 .</p><p>Also, the second partial derivatives are given in (7), (8), (9)</p><p>∂ 2 l ∂ ϕ 2 = − ∑ ​ y i 2 ( y i − 1 ) ( p + y i ϕ ) 2 − ∑ ​ y i 2 ( m i − 1 ) ( 1 − p − y i ϕ ) 2 , (7)</p><p>∂ 2 l ∂ ϕ ∂ p = − ∑ ​ y i ( y i − 1 ) ( p + y i ϕ ) 2 − ∑ ​ y i ( m i − 1 ) ( 1 − p − y i ϕ ) 2 , (8)</p><p>∂ 2 l ∂ p 2 = − n p − 2 − ∑ ​ ( y i − 1 ) ( p + y i ϕ ) 2 − ∑ ​ ( m i − y i ) ( 1 − p − y i ϕ ) 2 (9)</p><p>Setting ϕ = 0 , the above three equations are obtained in their respective orders as:</p><p>T 11 ( p ) = − p − 2 ∑ y i 2 ( y i − 1 ) − q − 2 ∑ y i 2 ( m i − y i ) (10)</p><p>T 12 ( p ) = − p − 2 ∑ y i ( y i − 1 ) − q − 2 ∑ y i ( m i − y i ) (11)</p><p>and,</p><p>T 22 ( p ) = − n p − 2 − p − 2 ∑ ​ ( y i − 1 ) − q − 2 ∑ ​ ( m i − y i ) (12)</p><p>Under the null hypothesis H 0 : ϕ = 0 , the Y ′ i s are independent binomial variates. Using the expected values of Y i , Y i 2 and Y i 3 for binomial variates one can easily see that E [ T 1 ( p ) ] = 0 .</p><p>Denoting − E [ T 11 ( p ) ] = A 11 ( p ) , − E [ T 12 ( p ) ] = A 12 ( p ) and − E [ T 22 ( p ) ] = A 22 ( p ) we can then show that</p><p>A 11 ( p ) = ( 2 − 3 p ) q − 1 ∑ m i ( m i − 1 ) + p q − 1 ∑ m i 2 ( m i − 1 ) , (13)</p><p>A 12 ( p ) = q − 1 ∑ m i ( m i − 1 ) , (14)</p><p>and</p><p>A 22 ( p ) = ( p q ) − 1 ∑ m i . (15)</p><p>Equations (13), (14), and (15) are in fact the elements of Fisher’s information matrix when the null hypothesis H 0 : ϕ = 0 is true.</p><p>To test the hypothesis H 0 : ϕ = 0 , one can use the statistic T 1 ( p ) according to Neyman’s methodology [<xref ref-type="bibr" rid="scirp.115008-ref7">7</xref>]. Since p is unknown, we can follow Moran’s suggestion [<xref ref-type="bibr" rid="scirp.115008-ref8">8</xref>] and use the statistic T 1 ( p ˜ ) , where p ˜ is any root-n consistent estimator of p. The maximum likelihood estimator p ^ , given in (5) is the simplest such estimator. On substituting p ^ in (4) and on simplifying, we get</p><p>T 1 ( p ^ ) = ( p ^ q ^ ) − 1 ∑ ​ ( y i − m i p ^ ) 2 + q ^ − 1 ∑ m i ( y i − m i p ^ ) − ∑ m i (16)</p><p>It may be noted that when m i = m 2 = ⋯ = m n = m , the expression for T 1 ( p ^ ) reduces to</p><p>( p ^ q ^ ) − 1 ∑ ​ ( y i − m p ^ ) 2 − m n</p><p>which is like Fisher’s variance test statistic. From Cox and Hinkley [<xref ref-type="bibr" rid="scirp.115008-ref10">10</xref>],</p><p>V a r [ T 1 ( p ^ ) ] = A 11 ( p ) − A 12 2 ( p ) / A 22 ( p ) (17)</p><p>The substitution of p ^ for p in (17) gives the functional form of the test statistic, under H 0 : ϕ = 0 , as</p><p>M 2 = [ T 1 ( p ^ ) ] 2 / V a r ^ [ T 1 ( p ^ ) ] . (18)</p><p>The statistic M<sup>2</sup> (18) has an asymptotic (for n → ∞ ) chi-square distribution with one degree of freedom. Accordingly, the above statistic provides a C ( α ) a binomial score test which is asymptotically optimal against the quasi-binomial alternative.</p></sec><sec id="s3"><title>3. Examples</title><p>We shall now consider two examples. In the first example the data sets are binomially distributed and the test statistic M<sup>2</sup> does not reject the hypothesis of a binomial distribution and in the second example the test statistic M<sup>2</sup> indicates that the data sets are not binomially distributed.</p><p>Example 1. Paul [<xref ref-type="bibr" rid="scirp.115008-ref11">11</xref>] discussed a teratological experiment in which pregnant Dutch rabbits were treated with varying doses of a compound. Each litter (group) consisted of a varying number of live fetuses in each rabbit. The number of fetuses in each litter with skeletal of visceral abnormalities were then observed. For illustration, we consider the group, treated with high dose, consisting of n = 17 litters which gave the following observations:</p><p>m i : 9   10   7   5   4   6   3   8   5   4   4   5   3   8   6   8   6 y i :         1       0     1     0   1     0     1     1   2   0     4     1     1   4   2   3   1</p><p>Since ∑ m i = 101 and ∑ y i = 23 , p ^ = 23 / 101 = 0.228 .</p><p>To test the null hypothesis H<sub>0</sub>: The data sets are binomially distributed i.e. ϕ ≠ 0 against H<sub>1</sub>: The data sets are quasi-binomially distributed i.e. ϕ ≠ 0 , we compute the following values for (13) to (14) and apply them to (15) and (16).</p><p>A 12 ( p ^ ) = 570 0.772 = 738.342 ,     A 22 ( p ^ ) = 101 ( 0.228 ) ( 0.772 ) = 573.811 ,</p><p>A 11 ( p ^ ) = 2 − 3 ( 0.228 ) 0.772 ( 570 ) + 0.228 0.772 ( 4206 ) = 2213.84 ,</p><p>and</p><p>V a r ^ ( T 1 ( p ^ ) ) = 2213.84 − ( 738.342 ) 2 573.811 = 1263.79.</p><p>Thus, from (11),</p><p>M 2 = ( 42.781 ) 2 ( 1263.79 ) = 1.448</p><p>Since P r ( M 2 ≥ 1.448 ) = P r ( X i 2 ≥ 1.448 ) = 0.22 , the null hypothesis cannot be rejected. Thus, we conclude that the data sets are binomially distributed with p ^ = 0.228 .</p></sec><sec id="s4"><title>4. Quasi-Binomial Regression Model</title><p>It is well known that the logistic-linear model is a basis for analyzing regression data or the data from designed experiments when the response variable is measured on the binary scale. The purpose of this section is to modify the QBD so that a finite number of concomitant variables may be included which may account for most of the sources of the extra-binomial variation.</p><p>Suppose that the i<sup>th</sup> response Y i ( 1 ≤ i ≤ n ) has the QBD given by (1). Also, let x i 1 , x i 2 , ⋯ , x i k be the values of k explanatory variables associated with the response variable y i , where the n &#215; k matrix is of rank k. We now employ the customary logistic transformation on the binomial parameter p as indicated below”</p><p>p i = e θ i ( 1 + e θ i ) − 1 ,</p><p>where,</p><p>θ i = l n [ p i ( 1 − p i ) − 1 ] = ∑ j = 1 k     x i j β j (19)</p><p>where β 1 , β 2 , ⋯ , β k in the right-hand side of (19) are the regression coefficients which are to be estimated along with the parameter ϕ .</p><p>The likelihood function will be given by</p><p>L = ∏ i = 1 n [ ( m i y i ) ( e θ i 1 + e θ i ) ( e θ i 1 + e θ i + y i ϕ ) y 1 − 1 ( 1 1 + e θ i − y i ϕ ) m i − y i ] (20)</p><p>Taking the log of the likelihood function (20) we get the log-likelihood function in (21)</p><p>l = ∑ l n [ e θ i ( 1 + e θ i ) − 1 ] + ∑ ​ ( y i − 1 ) l n [ e θ i ( 1 + e θ i ) − 1 + y i ϕ ]     + ∑ ​ ( m i − y i ) l n [ ( 1 + e θ i ) − 1 − y i ϕ ] + constant , (21)</p><p>where the summations are for i = 1 to n and θ i is defined in (19).</p><p>Differentiating l , given in (21) partially with respect to β r , r = 1 , 2 , ⋯ , k , and ϕ , we have the following system of ( k + 1 ) M L equations:</p><p>l ˙ r = ∂ l ∂ β r = ∑ ​     x i r − ∑ ​     e θ i ( 1 + e θ i ) − 1 x i r                               + ∑ ​ ( y i − 1 ) e θ i ( 1 + e θ i ) − 2 e θ i ( 1 + e θ i ) − 1 + y i ϕ x i r                               − ∑ ​ ( m i − y i ) e θ i ( 1 + e θ i ) − 2 ( 1 + e θ i ) − 1 − y i ϕ x i r = 0 ,   r = 1 , 2 , ⋯ , k (22)</p><p>and</p><p>l ˙ ϕ = ∂ l ∂ ϕ = ∑ ​ y i ( y i − 1 ) e θ i ( 1 + e θ i ) − 1 + y i ϕ − ∑ ​ y i ( m i − y i ) ( 1 + e θ i ) − 1 − y i ϕ = 0. (23)</p><p>The second partial derivatives are given by (where q i = 1 − p i )</p><p>l ˙ ϕ ϕ = ∂ 2 l ∂ ϕ 2 = − ∑ ​ y i 2 ( y i − 1 ) ( p i + y i ϕ ) 2 − ∑ ​ y i 2 ( m i − m i ) ( 1 − p i − y i ϕ ) 2</p><p>l ˙ ϕ r = ∂ 2 l ∂ ϕ   ∂ β r = − ∑ ​ y i ( y i − 1 ) p i q i ( p i + y i ϕ ) 2 x i r − ∑ ​ y i ( m i − y i ) p i q i ( 1 − p i − y i ϕ ) 2</p><p>and</p><p>l ˙ r s = ∂ 2 l ∂ β r ∂ β s = − ∑ ​     p i q i x i r x i s + ∑ ​ y i − 1 p i + y i ϕ p i q i ( 1 − 2 p i ) x i r x i s                                           − ∑ ​ y i − 1 ( p i − y i ϕ ) 2 p i 2 q i 2 x i r x i s − ∑ ​ ( m i − y i ) p i 2 q i 2 ( 1 − p i − y i ϕ ) 2 x i r x i s</p><p>L for r , s = 1 , 2 , ⋯ , k .</p><p>The expectations of the negatives of the above second partial derivatives would give the elements of the Fisher’s information matrix. For these we use some results from [<xref ref-type="bibr" rid="scirp.115008-ref9">9</xref>] on inverse moments of the QBD. Thus</p><p>I ϕ ϕ = ∑ E [ Y i 2 ( Y i − 1 ) ( p i − Y i ϕ ) 2 ] + ∑ E [ Y i 2 ( m i − Y i ) ( 1 − p i − Y i ϕ ) 2 ] = ∑ i = 1 n m i ( m i − 1 ) p i [ 2 q i + ( m i − 1 ) p i ] [ q i − ( m i − 1 ) ϕ ] ( p i + 2 ϕ ) (24)</p><p>I ϕ r = ∑ E [ Y i ( Y i − 1 ) ( p i − Y i ϕ ) 2 ] p i q i x i r + ∑ E [ Y i ( m i − Y i ) ( 1 − p i − Y i ϕ ) 2 ] p i q i x i r = ∑ i = 1 n m i ( m i − 1 ) ( 1 − ( m i − 3 ) ϕ ) p i 2 q i x i r ( p i + 2 ϕ ) ( 1 − p i − m i ϕ ) + ϕ ,     r = 1 , 2 , ⋯ , k (25)</p><p>I r s = ∑ i = 1 n [ 1 − ( m i − 1 ) p i p i − 2 ϕ + ( 1 + ϕ − m i ϕ ) p i q i − m i ϕ + ϕ ] m i p i q i 2 x i r x i s , (26)</p><p>where r , s = 1 , 2 , 3 , ⋯ , k .</p><p>Equations (24), (25), (26) are the elements of Fisher’s information matrix. From [<xref ref-type="bibr" rid="scirp.115008-ref12">12</xref>], and based on the large sample theory of the likelihood estimation, we can establish the asymptotic normality of Λ ^ = ( β ^ , ϕ ^ ) ; that is</p><p>L [ n ( Λ ^ − Λ ) → N k + 1 ( 0 , Σ ) ]</p><p>in law. The large sample variance covariance matrix is given by</p><p>Σ = n [ I r s I r ϕ I ϕ r I ϕ ϕ ] − 1 .</p><p>In testing hypothesis about parameters in a logit model, one generally uses large sample tests. The choice is between the likelihood ratio test and other consistent tests which are asymptotically equivalent to the likelihood ratio test under the null hypothesis [<xref ref-type="bibr" rid="scirp.115008-ref8">8</xref>], in contrast to the likelihood-ratio test which requires fitting the model under both the null and alternative hypotheses). Now, to test the null hypothesis H 0 : ϕ = 0 versus H 1 : ϕ ≠ 0 , the Wald statistic given in (27) is</p><p>W = ( ϕ ^ ) 2 A V 0 ( ϕ ^ ) , (27)</p><p>In (27) A V 0 ( ϕ ^ ) is the asymptotic variance of ϕ ^ , evaluated under the null hypothesis H<sub>0</sub>. Under H<sub>0</sub>, the statistic W has the same asymptotic (for large samples) X i 2 distribution as the likelihood ratio statistic. Equivalently, H 0 : ϕ = 0 is rejected whenever the value of</p><p>ϕ ^ / A V ^ 0 ( ϕ ^ ) &gt; Z 1 − α ,</p><p>where Z 1 − α is the standard normal deviate for α-level of significance, and A V ^ 0 ( ϕ ^ ) denotes the large sample variance of ϕ ^ , under H<sub>0</sub>, and after all other parameters are replaced by their maximum likelihood estimates.</p></sec><sec id="s5"><title>5. Applications of the QBD Regression</title><p>1) Clinical trial results</p><p>One group of 16 pregnant female rats was fed a control diet during pregnancy and lactation and a second group of 16 pregnant female rats was given a diet treated with a chemical. Weil [<xref ref-type="bibr" rid="scirp.115008-ref13">13</xref>] published clinical trial data on the number m of pups alive at 4 days and the number y of pups that died at the end of 21 days lactation period for each litter. The fractions y i / m i for the two groups are given below:</p><p>Control: 0/13, 0/12, 0/9, 0/9, 0/8, 0/8, 1/13, 1/12,</p><p>1/10, 1/10, 1/9, 2/13, 1/5, 2/7, 3/10, 3/10.</p><p>Treated: 0/12, 0/11, 0/10, 0/9, 1/11, 1/10, 1/10, 1/9,</p><p>1/9, 1/5, 2/9, 3/7, 5/10, 3/6, 7/10, 7/7.</p><p>We apply the quasi-binomial regression model to the above data with 16 replications in each group and take</p><p>θ i = ∑ j = 1 2     x i j β j ,     i = 1 , 2 , ⋯ , 32</p><p>where x i 1 = 1 and x i 2 = 0 when the subject is in the control group and x i 2 = 1 when it is in the treatment group.</p><p>The maximum likelihood estimates of ( β 1 , β 2 , ϕ ) were obtained by simultaneously solving the system of equations.</p><p>l ˙ r = 0 and l ˙ ϕ = 0 , given in (14) and (15), with the help of NLMIX procedure in SAS (version 9.4). ML estimates are</p><p>β ^ 1 = − 2.5135 ( 0.3435 ) ,   β ^ 2 = 0.6595 ( 0.4307 )</p><p>and</p><p>ϕ ^ = 0.0517 ( 0.0114 )</p><p>The numbers in the brackets are the large sample standard deviations. Both β ^ 1 and ϕ ^ are highly significant (p-value &lt; 0.001).</p><p>2) Example 2: Multiple regression (risk factors associated with COVID 19 case fatality)</p><p>The novel coronavirus disease (COVID-19) pandemic affected every country in our world and imposed tremendous strains on the world economies and the health care systems.</p><p>During the 2901-2020 year over 5000 research papers have been published and the fundamental aim has been to understand the mechanism of spread of the virus and the main risk factors leading to associated mortality. Many of these reports on the COVID-19 pandemic suggested that the coronavirus was associated with more serious chronic diseases and mortality regardless of country and age. Other reports suggested that those with underlying comorbidities, including obesity, type 2 diabetes, heart, and kidney diseases are at high risk of infection and death. Therefore, there is a need to understand how common comorbidities and other factors are associated with the risk of death due to COVID-19 infection. Our investigation aims at exploring this relationship. Specifically, our fundamental aim is to explore the relationship between the aggregate numbers of deaths among total number of reported COVID-19 cases.</p><p>The WHO website [<xref ref-type="bibr" rid="scirp.115008-ref14">14</xref>] provided detailed account of the number of COVID-19 cases by country, which we accessed on December 2-2020. We included in the study the cumulative number of COVID-19 cases and the associated death counts by country as of December 2-2020. We excluded countries that had cumulative counts less than 10,000 cases. We denote the number of cases pe-country by m, and the corresponding deaths denoted by y. The data base has 112 countries, we divided them into regions according to the classification given in data source number [<xref ref-type="bibr" rid="scirp.115008-ref15">15</xref>]. The most referenced risk factors are:</p><p>1) X<sub>1</sub> = log (percentage of obese persons in a country reported in the year (2018) [<xref ref-type="bibr" rid="scirp.115008-ref17">17</xref>].</p><p>2) X<sub>2</sub> = log (population density) [<xref ref-type="bibr" rid="scirp.115008-ref18">18</xref>] [<xref ref-type="bibr" rid="scirp.115008-ref19">19</xref>] [<xref ref-type="bibr" rid="scirp.115008-ref20">20</xref>].</p><p>3) X<sub>3</sub> = log (number of people with colorectal cancer in a country reported in the year (2017) [<xref ref-type="bibr" rid="scirp.115008-ref21">21</xref>].</p><p>4) X<sub>4</sub> = log (Chronic Kidney Disease-case fatality in a country as reported in (2017) [<xref ref-type="bibr" rid="scirp.115008-ref15">15</xref>] [<xref ref-type="bibr" rid="scirp.115008-ref16">16</xref>].</p><p>Note that we used the log (factor) to stabilize the variance. The data are summarized in <xref ref-type="table" rid="table1">Table 1</xref>.</p><p>The histogram of y is given in <xref ref-type="fig" rid="fig1">Figure 1</xref>, showing the severe skewness in the distribution.</p><p>Figures 2-5 are the box plots of the risk factors. The plot shows that the distributions are evenly distributed among regions, except for X<sub>3</sub>.</p><table-wrap id="table1" ><label><xref ref-type="table" rid="table1">Table 1</xref></label><caption><title> Summary statistics of the COVID-19 cases (m), deaths among cases, and the four covariates</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" >N</th><th align="center" valign="middle" >Minimum</th><th align="center" valign="middle" >Maximum</th><th align="center" valign="middle" >Mean</th><th align="center" valign="middle" >Std. Deviation</th></tr></thead><tr><td align="center" valign="middle" >m</td><td align="center" valign="middle" >113</td><td align="center" valign="middle" >10,129</td><td align="center" valign="middle" >13,385,755</td><td align="center" valign="middle" >555,864.71</td><td align="center" valign="middle" >1,657,855.674</td></tr><tr><td align="center" valign="middle" >y</td><td align="center" valign="middle" >113</td><td align="center" valign="middle" >29</td><td align="center" valign="middle" >266,043</td><td align="center" valign="middle" >12,972.13</td><td align="center" valign="middle" >34,784.047</td></tr><tr><td align="center" valign="middle" >LOG_CKD_CASE_FATALITY</td><td align="center" valign="middle" >113</td><td align="center" valign="middle" >3.44</td><td align="center" valign="middle" >6.50</td><td align="center" valign="middle" >5.0616</td><td align="center" valign="middle" >0.51962</td></tr><tr><td align="center" valign="middle" >LOG_COLOREC_CANCER</td><td align="center" valign="middle" >113</td><td align="center" valign="middle" >3.69</td><td align="center" valign="middle" >12.98</td><td align="center" valign="middle" >8.0393</td><td align="center" valign="middle" >1.69123</td></tr><tr><td align="center" valign="middle" >LOG_OBESITY</td><td align="center" valign="middle" >113</td><td align="center" valign="middle" >1.28</td><td align="center" valign="middle" >3.63</td><td align="center" valign="middle" >2.8430</td><td align="center" valign="middle" >0.61066</td></tr><tr><td align="center" valign="middle" >LOG_POPDENSITY</td><td align="center" valign="middle" >113</td><td align="center" valign="middle" >6.57</td><td align="center" valign="middle" >16.65</td><td align="center" valign="middle" >12.3378</td><td align="center" valign="middle" >1.91657</td></tr></tbody></table></table-wrap><p>We estimated the average case-fatality rate as:</p><p>p ^ = ∑ y i / ∑ m i = 0.023 .</p><p>Moreover, the other quantities are given as:</p><p>A 11 ( p ) = 8.49 &#215; 10 19 , A 12 ( p ) = 3.51 &#215; 10 14 , A 22 ( p ) = 2772 &#215; 10 6 ,</p><p>T 1 ( p ^ ) = − 1.1 &#215; 10 12 , V a r [ T 1 ( p ^ ) ] = 4.05 &#215; 10 19 .</p><p>Hence M 2 = [ T 1 ( p ^ ) ] 2 / V a r ^ [ T 1 ( p ^ ) ] = 29932.22 , and we therefore reject the binomial hypothesis. We used the SAS NLMIXED procedure to fit the QB regression model. The results are shown in <xref ref-type="table" rid="table2">Table 2</xref>.</p><p>We note that the fitting algorithm produces variance covariance matrix of the estimated regression parameters (not shown here).</p><p>The Nonlinear Mixed Model procedure (NLMIXED) is an iterative algorithm and its convergence, which can be slow, depends heavily on the starting.</p><table-wrap id="table2" ><label><xref ref-type="table" rid="table2">Table 2</xref></label><caption><title> Results of the quasi-binomial regression for the COVID-19 case fatality data</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Parameter</th><th align="center" valign="middle" >Estimate</th><th align="center" valign="middle" >Standard Error</th><th align="center" valign="middle" >95% confidence</th><th align="center" valign="middle" >limits</th></tr></thead><tr><td align="center" valign="middle" >b<sub>0</sub></td><td align="center" valign="middle" >−6.3423</td><td align="center" valign="middle" >0.0122</td><td align="center" valign="middle" >−6.3662</td><td align="center" valign="middle" >−6.3183</td></tr><tr><td align="center" valign="middle" >b<sub>1</sub></td><td align="center" valign="middle" >0.4180</td><td align="center" valign="middle" >0.0020</td><td align="center" valign="middle" >0.4145</td><td align="center" valign="middle" >0.4222</td></tr><tr><td align="center" valign="middle" >b<sub>2</sub></td><td align="center" valign="middle" >−0.0960</td><td align="center" valign="middle" >0.0007</td><td align="center" valign="middle" >−0.0974</td><td align="center" valign="middle" >−0.0946</td></tr><tr><td align="center" valign="middle" >b<sub>3</sub></td><td align="center" valign="middle" >0.2063</td><td align="center" valign="middle" >0.0010</td><td align="center" valign="middle" >0.2039</td><td align="center" valign="middle" >0.2087</td></tr><tr><td align="center" valign="middle" >b<sub>4</sub></td><td align="center" valign="middle" >0.0560</td><td align="center" valign="middle" >0.0007</td><td align="center" valign="middle" >0.0547</td><td align="center" valign="middle" >0.0574</td></tr><tr><td align="center" valign="middle" >phi</td><td align="center" valign="middle" >0.0050</td><td align="center" valign="middle" >0.0020</td><td align="center" valign="middle" >0.001</td><td align="center" valign="middle" >0.0090</td></tr></tbody></table></table-wrap></sec><sec id="s6"><title>6. Discussion</title><p>For observed data sets which exhibit variation greater than what is expected under the hypothesized model, the researchers often try to determine the sources of this phenomenon which is known as over-dispersion. There are three broad categories of such sources of over dispersion: 1) genuine or significant over-dispersion or under-dispersion which may be accounted for by generalizations of the known distribution, 2) the apparent over-dispersion is due to some outliers, which may be detected by residual analysis by some other diagnostic method, 3) poor choice of some of the explanatory variables. Therefore, it seems appropriate that one should apply a model which includes a dispersion parameter as well as a reasonable number of carefully chosen covariates and variates. The fitting of the QBD regression model can be tricky, and one may adopt one of the algorithms described in [<xref ref-type="bibr" rid="scirp.115008-ref22">22</xref>] and [<xref ref-type="bibr" rid="scirp.115008-ref23">23</xref>].</p></sec><sec id="s7"><title>Acknowledgements</title><p>The authors thank anonymous reviewers for their constructive comments.</p></sec><sec id="s8"><title>Conflicts of Interest</title><p>None declared by both authors.</p></sec><sec id="s9"><title>Cite this paper</title><p>Shoukri, M.M. and Aleid, M.M. (2022) Quasi-Binomial Regression Model for the Analysis of Data with Extra-Binomial Variation. Open Journal of Statistics, 12, 1-14. https://doi.org/10.4236/ojs.2022.121001</p></sec><sec id="s10"><title>Appendix: Flow Chart for the Manuscripts</title><disp-formula id="scirp.115008-formula1"><graphic  xlink:href="//html.scirp.org/file/1-1241560x151.png?20220129091059689"  xlink:type="simple"/></disp-formula></sec></body><back><ref-list><title>References</title><ref id="scirp.115008-ref1"><label>1</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Altham</surname><given-names> P.M.E. </given-names></name>,<etal>et al</etal>. (<year>1978</year>)<article-title>Two Generalizations of the Binomial Distribution</article-title><source> Applied Statistics</source><volume> 27</volume>,<fpage> 162</fpage>-<lpage>167</lpage>.<pub-id pub-id-type="doi"></pub-id></mixed-citation></ref><ref id="scirp.115008-ref2"><label>2</label><mixed-citation publication-type="other" xlink:type="simple">Kupper, L.L. and Haseman, J.K. (1978) The Use of a Correlated Binomial Model for the Analysis of Certain Toxicological Experiments. Biometrics, 34, 67-76.</mixed-citation></ref><ref id="scirp.115008-ref3"><label>3</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Williams</surname><given-names> D.A. </given-names></name>,<etal>et al</etal>. (<year>1975</year>)<article-title>The Analysis of Binary Responses from Toxicological Experiments Involving Reproduction and Teratogenicity</article-title><source> Biometrics</source><volume> 31</volume>,<fpage> 949</fpage>-<lpage>952</lpage>.<pub-id pub-id-type="doi"></pub-id></mixed-citation></ref><ref id="scirp.115008-ref4"><label>4</label><mixed-citation publication-type="other" xlink:type="simple">Breslow, N.E. and Clayton, D.G. (1993) Approximate Inference in Generalized Linear Mixed Models. Journal of the American Statistical Association, 88, 9-25. 
https://doi.org/10.1080/01621459.1993.10594284</mixed-citation></ref><ref id="scirp.115008-ref5"><label>5</label><mixed-citation publication-type="other" xlink:type="simple">Consul, P.C. (1974) A Simple Urn Model Dependent upon Predetermined Strategy. Sankhyā: The Indian Journal of Statistics, Series B, 36, 391-399.</mixed-citation></ref><ref id="scirp.115008-ref6"><label>6</label><mixed-citation publication-type="other" xlink:type="simple">Shenton, L.R. (2006) Quasi Binomial Distribution. Wiley StatsRef: Statistics Reference Online.</mixed-citation></ref><ref id="scirp.115008-ref7"><label>7</label><mixed-citation publication-type="book" xlink:type="simple">Neyman, J. (1959) Optimal Asymptotic Tests of Composite Statistical Hypotheses. In: Grenander, V., Ed., Probability and Statistics, John Wiley &amp; Sons, New York, 13-34.</mixed-citation></ref><ref id="scirp.115008-ref8"><label>8</label><mixed-citation publication-type="other" xlink:type="simple">Moran, P.A. (1970) On Asymptotically Optimal Tests of Composite Hypotheses. Biometrika, 57, 47-55. https://doi.org/10.1093/biomet/57.1.47</mixed-citation></ref><ref id="scirp.115008-ref9"><label>9</label><mixed-citation publication-type="other" xlink:type="simple">Consul, P.C. (1990) On Some Properties and Applications of Quasi-Binomial Distribution. Communications in Statistics—Theory and Methods, 19, 477-504. 
https://doi.org/10.1080/03610929008830214</mixed-citation></ref><ref id="scirp.115008-ref10"><label>10</label><mixed-citation publication-type="other" xlink:type="simple">Cox, D.R. and Hinkley, D.V. (1974) Theoretical Statistics. Chapman and Hall, London.</mixed-citation></ref><ref id="scirp.115008-ref11"><label>11</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Paul</surname><given-names> S.R. </given-names></name>,<etal>et al</etal>. (<year>1982</year>)<article-title>Analysis of Proportions of Affected Fetuses in Teratological Experiments</article-title><source> Biometrics</source><volume> 38</volume>,<fpage> 361</fpage>-<lpage>370</lpage>.<pub-id pub-id-type="doi"></pub-id></mixed-citation></ref><ref id="scirp.115008-ref12"><label>12</label><mixed-citation publication-type="other" xlink:type="simple">Rao, C.R. (1973) Linear Statistical Inference and Applications. John Wiley &amp; Sons Inc., New York.</mixed-citation></ref><ref id="scirp.115008-ref13"><label>13</label><mixed-citation publication-type="other" xlink:type="simple">Weil, C.S. (1970) Selection of the Valid Number of Sampling Units and a Consideration of Their Combination in Toxicological Studies Involving Reproduction, Teratogenesis or Carcinogenesis. Food and Cosmetic Toxicology, 8, 177-182. 
https://doi.org/10.1016/S0015-6264(70)80337-6</mixed-citation></ref><ref id="scirp.115008-ref14"><label>14</label><mixed-citation publication-type="other" xlink:type="simple">WHO Coronavirus (COVID-19) Dashboard. https://covid19.who.int/table</mixed-citation></ref><ref id="scirp.115008-ref15"><label>15</label><mixed-citation publication-type="other" xlink:type="simple">Chueh, T.-I., Zheng, C.-M., Hou, Y.-C. and Lu, K.-C. (2020) Novel Evidence of Acute Kidney Injury in COVID-19. Journal of Clinical medicine, 9, 3547.  
https://doi.org/10.3390/jcm9113547</mixed-citation></ref><ref id="scirp.115008-ref16"><label>16</label><mixed-citation publication-type="other" xlink:type="simple">GBD Chronic Kidney Disease Collaboration (2020) Global, Regional, and National Burden of Chronic Kidney Disease, 1990-2017: A Systematic Analysis for the Global Burden of Disease Study 2017. The Lancet, 395, 709-733.  
https://doi.org/10.1016/s0140-6736(20)30045-3</mixed-citation></ref><ref id="scirp.115008-ref17"><label>17</label><mixed-citation publication-type="other" xlink:type="simple">Diabetes Prevalence (% of Population Ages 20 to 79)—Country Ranking. 
https://www.indexmundi.com/facts/indicators/SH.STA.DIAB.ZS/rankings</mixed-citation></ref><ref id="scirp.115008-ref18"><label>18</label><mixed-citation publication-type="other" xlink:type="simple">https://openknowledge.worldbank.org/bitstream/handle/10986/32383/9781464814914.pdf</mixed-citation></ref><ref id="scirp.115008-ref19"><label>19</label><mixed-citation publication-type="other" xlink:type="simple">Rashed, E.A., Kodera, S., Gomez-Tames, J. and Hirata, A. (2020) Correlation between COVID-19 Morbidity and Mortality Rates in Japan and Local Population Density, Temperature, and Absolute Humidity. International Journal of Environmental Research and Public Health, 17, Article No. 5447.  
https://doi.org/10.3390/ijerph17155477</mixed-citation></ref><ref id="scirp.115008-ref20"><label>20</label><mixed-citation publication-type="other" xlink:type="simple">Population Density and Population Counts. https://data.worldbank.org/</mixed-citation></ref><ref id="scirp.115008-ref21"><label>21</label><mixed-citation publication-type="other" xlink:type="simple">GBD 2017 Colorectal Cancer Collaborators (2019) The Global, Regional, and National Burden of Colorectal Cancer and Its Attributable Risk Factors in 195 Countries and Territories, 1990-2017: A Systematic Review for the Global Burden of Disease Study 2017. The Lancet Gastroenterology and Hepatology, 4, 913-933.  
https://doi.org/10.1016/S2468-1253(19)30345-0</mixed-citation></ref><ref id="scirp.115008-ref22"><label>22</label><mixed-citation publication-type="other" xlink:type="simple">Boateng, E. and Abaye, D. (2019) A Review of the Logistic Regression Model with Emphasis on Medical Research. Journal of Data Analysis and Information Processing, 7, 190-207. https://doi.org/10.4236/jdaip.2019.74012</mixed-citation></ref><ref id="scirp.115008-ref23"><label>23</label><mixed-citation publication-type="other" xlink:type="simple">Deng, J. and Lu, Q.J. (2018) Fuzzy Regression Model Based on Fuzzy Distance Measure. Journal of Data Analysis and Information Processing, 6, 126-140.  
https://doi.org/10.4236/jdaip.2018.63008</mixed-citation></ref></ref-list></back></article>