Multivariate probit model













































In statistics and econometrics, the multivariate probit model is a generalization of the probit model used to estimate several correlated binary outcomes jointly. For example, if it is believed that the decisions of sending at least one child to public school and that of voting in favor of a school budget are correlated (both decisions are binary), then the multivariate probit model would be appropriate for jointly predicting these two choices on an individual-specific basis. This approach was initially developed by Siddhartha Chib and Edward Greenberg.[1]




Contents






  • 1 Example: bivariate probit


  • 2 Multivariate Probit


  • 3 References


  • 4 Further reading





Example: bivariate probit


In the ordinary probit model, there is only one binary dependent variable Y{displaystyle Y}Y and so only one latent variable Y∗{displaystyle Y^{*}}Y^* is used. In contrast, in the bivariate probit model there are two binary dependent variables Y1{displaystyle Y_{1}}Y_{1} and Y2{displaystyle Y_{2}}Y_2, so there are two latent variables: Y1∗{displaystyle Y_{1}^{*}} Y^*_1 and Y2∗{displaystyle Y_{2}^{*}} Y^*_2 .
It is assumed that each observed variable takes on the value 1 if and only if its underlying continuous latent variable takes on a positive value:


Y1={1if Y1∗>0,0otherwise,{displaystyle Y_{1}={begin{cases}1&{text{if }}Y_{1}^{*}>0,\0&{text{otherwise}},end{cases}}}<br />
Y_1 = begin{cases} 1 & text{if }Y^*_1>0, \<br />
0 & text{otherwise},<br />
end{cases}<br />

Y2={1if Y2∗>0,0otherwise,{displaystyle Y_{2}={begin{cases}1&{text{if }}Y_{2}^{*}>0,\0&{text{otherwise}},end{cases}}}<br />
Y_2 = begin{cases}<br />
1 & text{if }Y^*_2>0, \<br />
0 & text{otherwise},<br />
end{cases}<br />

with


{Y1∗=X1β1+ε1Y2∗=X2β2+ε2{displaystyle {begin{cases}Y_{1}^{*}=X_{1}beta _{1}+varepsilon _{1}\Y_{2}^{*}=X_{2}beta _{2}+varepsilon _{2}end{cases}}}<br />
begin{cases}<br />
Y_1^* = X_1beta_1+varepsilon_1 \<br />
Y_2^* = X_2beta_2+varepsilon_2<br />
end{cases}<br />

and


2]∣X∼N([00],[1ρρ1]){displaystyle {begin{bmatrix}varepsilon _{1}\varepsilon _{2}end{bmatrix}}mid Xsim {mathcal {N}}left({begin{bmatrix}0\0end{bmatrix}},{begin{bmatrix}1&rho \rho &1end{bmatrix}}right)}<br />
begin{bmatrix}<br />
varepsilon_1\<br />
varepsilon_2<br />
end{bmatrix}<br />
mid X<br />
sim mathcal{N}<br />
left(<br />
begin{bmatrix}<br />
0\<br />
0<br />
end{bmatrix},<br />
begin{bmatrix}<br />
1&rho\<br />
rho&1<br />
end{bmatrix}<br />
right)<br />

Fitting the bivariate probit model involves estimating the values of β1, β2,{displaystyle beta _{1}, beta _{2},}beta_1, beta_2, and ρ{displaystyle rho }rho . To do so, the likelihood of the model has to be maximized. This likelihood is


L(β1,β2)=(∏P(Y1=1,Y2=1∣β1,β2)Y1Y2P(Y1=0,Y2=1∣β1,β2)(1−Y1)Y2P(Y1=1,Y2=0∣β1,β2)Y1(1−Y2)P(Y1=0,Y2=0∣β1,β2)(1−Y1)(1−Y2)){displaystyle {begin{aligned}L(beta _{1},beta _{2})={Big (}prod &P(Y_{1}=1,Y_{2}=1mid beta _{1},beta _{2})^{Y_{1}Y_{2}}P(Y_{1}=0,Y_{2}=1mid beta _{1},beta _{2})^{(1-Y_{1})Y_{2}}\[8pt]&{}qquad P(Y_{1}=1,Y_{2}=0mid beta _{1},beta _{2})^{Y_{1}(1-Y_{2})}P(Y_{1}=0,Y_{2}=0mid beta _{1},beta _{2})^{(1-Y_{1})(1-Y_{2})}{Big )}end{aligned}}}<br />
begin{align}<br />
L(beta_1,beta_2) =Big( prod & P(Y_1=1,Y_2=1midbeta_1,beta_2)^{Y_1Y_2} P(Y_1=0,Y_2=1midbeta_1,beta_2)^{(1-Y_1)Y_2} \[8pt]<br />
& {}qquad P(Y_1=1,Y_2=0midbeta_1,beta_2)^{Y_1(1-Y_2)}<br />
P(Y_1=0,Y_2=0midbeta_1,beta_2)^{(1-Y_1)(1-Y_2)} Big)<br />
end{align}<br />

Substituting the latent variables Y1∗{displaystyle Y_{1}^{*}}Y_1^* and Y2∗{displaystyle Y_{2}^{*}}Y_2^* in the probability functions and taking logs gives


(Y1Y2ln⁡P(ε1>−X1β1,ε2>−X2β2)+(1−Y1)Y2ln⁡P(ε1<−X1β1,ε2>−X2β2)+Y1(1−Y2)ln⁡P(ε1>−X1β1,ε2<−X2β2)+(1−Y1)(1−Y2)ln⁡P(ε1<−X1β1,ε2<−X2β2)).{displaystyle {begin{aligned}sum &{Big (}Y_{1}Y_{2}ln P(varepsilon _{1}>-X_{1}beta _{1},varepsilon _{2}>-X_{2}beta _{2})\[4pt]&{}quad {}+(1-Y_{1})Y_{2}ln P(varepsilon _{1}<-X_{1}beta _{1},varepsilon _{2}>-X_{2}beta _{2})\[4pt]&{}quad {}+Y_{1}(1-Y_{2})ln P(varepsilon _{1}>-X_{1}beta _{1},varepsilon _{2}<-X_{2}beta _{2})\[4pt]&{}quad {}+(1-Y_{1})(1-Y_{2})ln P(varepsilon _{1}<-X_{1}beta _{1},varepsilon _{2}<-X_{2}beta _{2}){Big )}.end{aligned}}}<br />
begin{align}<br />
sum & Big( Y_1Y_2 ln P(varepsilon_1>-X_1beta_1,varepsilon_2>-X_2beta_2) \[4pt]<br />
& {}quad{}+(1-Y_1)Y_2ln P(varepsilon_1<-X_1beta_1,varepsilon_2>-X_2beta_2) \[4pt]<br />
& {}quad{}+Y_1(1-Y_2)ln P(varepsilon_1>-X_1beta_1,varepsilon_2<-X_2beta_2) \[4pt]<br />
& {}quad{}+(1-Y_1)(1-Y_2)ln P(varepsilon_1<-X_1beta_1,varepsilon_2<-X_2beta_2) Big).<br />
end{align}<br />

After some rewriting, the log-likelihood function becomes:


(Y1Y2ln⁡Φ(X1β1,X2β2,ρ)+(1−Y1)Y2ln⁡Φ(−X1β1,X2β2,−ρ)+Y1(1−Y2)ln⁡Φ(X1β1,−X2β2,−ρ)+(1−Y1)(1−Y2)ln⁡Φ(−X1β1,−X2β2,ρ)).{displaystyle {begin{aligned}sum &{Big (}Y_{1}Y_{2}ln Phi (X_{1}beta _{1},X_{2}beta _{2},rho )\[4pt]&{}quad {}+(1-Y_{1})Y_{2}ln Phi (-X_{1}beta _{1},X_{2}beta _{2},-rho )\[4pt]&{}quad {}+Y_{1}(1-Y_{2})ln Phi (X_{1}beta _{1},-X_{2}beta _{2},-rho )\[4pt]&{}quad {}+(1-Y_{1})(1-Y_{2})ln Phi (-X_{1}beta _{1},-X_{2}beta _{2},rho ){Big )}.end{aligned}}}<br />
begin{align}<br />
sum & Big ( Y_1Y_2ln Phi(X_1beta_1,X_2beta_2,rho) \[4pt]<br />
& {}quad{} + (1-Y_1)Y_2ln Phi(-X_1beta_1,X_2beta_2,-rho) \[4pt]<br />
& {}quad{} + Y_1(1-Y_2)ln Phi(X_1beta_1,-X_2beta_2,-rho) \[4pt]<br />
& {}quad{} +(1-Y_1)(1-Y_2)ln Phi(-X_1beta_1,-X_2beta_2,rho) Big).<br />
end{align}<br />

Note that Φ{displaystyle Phi }Phi is the cumulative distribution function of the bivariate normal distribution. Y1{displaystyle Y_{1}} Y_1 and Y2{displaystyle Y_{2}} Y_2 in the log-likelihood function are observed variables being equal to one or zero.



Multivariate Probit


For the general case, yi=(y1,...,yj), (i=1,...,N){displaystyle mathbf {y_{i}} =(y_{1},...,y_{j}), (i=1,...,N)}{displaystyle mathbf {y_{i}} =(y_{1},...,y_{j}), (i=1,...,N)} where we can take j{displaystyle j}j as choices and i{displaystyle i}i as individuals or observations, the probability of observing choice yi{displaystyle mathbf {y_{i}} }{displaystyle mathbf {y_{i}} } is


Pr(yi|Xiβ)=∫AJ⋯A1fN(yi∗|Xiβ)dy1∗dyJ∗Pr(yi|Xiβ)=∫1y∗AfN(yi∗|Xiβ)dyi∗{displaystyle {begin{aligned}Pr(mathbf {y_{i}} |mathbf {X_{i}beta } ,Sigma )=&int _{A_{J}}cdots int _{A_{1}}f_{N}(mathbf {y} _{i}^{*}|mathbf {X_{i}beta } ,Sigma )dy_{1}^{*}dots dy_{J}^{*}\Pr(mathbf {y_{i}} |mathbf {X_{i}beta } ,Sigma )=&int mathbb {1} _{y^{*}in A}f_{N}(mathbf {y} _{i}^{*}|mathbf {X_{i}beta } ,Sigma )dmathbf {y} _{i}^{*}end{aligned}}}{displaystyle {begin{aligned}Pr(mathbf {y_{i}} |mathbf {X_{i}beta } ,Sigma )=&int _{A_{J}}cdots int _{A_{1}}f_{N}(mathbf {y} _{i}^{*}|mathbf {X_{i}beta } ,Sigma )dy_{1}^{*}dots dy_{J}^{*}\Pr(mathbf {y_{i}} |mathbf {X_{i}beta } ,Sigma )=&int mathbb {1} _{y^{*}in A}f_{N}(mathbf {y} _{i}^{*}|mathbf {X_{i}beta } ,Sigma )dmathbf {y} _{i}^{*}end{aligned}}}

Where A=A1××AJ{displaystyle A=A_{1}times cdots times A_{J}}{displaystyle A=A_{1}times cdots times A_{J}} and,


Aj={(−,0]yj∗=0(0,∞)yj∗=1{displaystyle A_{j}={begin{cases}(-infty ,0]&y_{j}^{*}=0\(0,infty )&y_{j}^{*}=1end{cases}}}{displaystyle A_{j}={begin{cases}(-infty ,0]&y_{j}^{*}=0\(0,infty )&y_{j}^{*}=1end{cases}}}

The log-likelihood function in this case would be
i=1Nlog⁡Pr(yi|Xiβ){displaystyle sum _{i=1}^{N}log Pr(mathbf {y_{i}} |mathbf {X_{i}beta } ,Sigma )}{displaystyle sum _{i=1}^{N}log Pr(mathbf {y_{i}} |mathbf {X_{i}beta } ,Sigma )}


Except for J≤2{displaystyle Jleq 2}{displaystyle Jleq 2} typically there is no closed form solution to the integrals in the log-likelihood equation. Instead simulation methods can be used to simulated the choice probabilities. Methods using importance sampling include the GHK algorithm (Geweke, Hajivassilou, McFadden and Keane),[2] AR (accept-reject), Stern's method. There are also MCMC approaches to this problem including CRB (Chib's method with Rao-Blackwellization), CRT (Chib, Ritter, Tanner), ARK (accept-reject kernel), and ASK (adaptive sampling kernel).[3]. A variational approach scaling to large datasets is proposed in Probit-LMM (Mandt, Wenzel, Nakajima et al.).[4]



References




  1. ^ Chib, Siddhartha; Greenberg, Edward (June 1998). "Analysis of multivariate probit models". Biometrika. 85: 347–361 – via Oxford Academic..mw-parser-output cite.citation{font-style:inherit}.mw-parser-output .citation q{quotes:"""""""'""'"}.mw-parser-output .citation .cs1-lock-free a{background:url("//upload.wikimedia.org/wikipedia/commons/thumb/6/65/Lock-green.svg/9px-Lock-green.svg.png")no-repeat;background-position:right .1em center}.mw-parser-output .citation .cs1-lock-limited a,.mw-parser-output .citation .cs1-lock-registration a{background:url("//upload.wikimedia.org/wikipedia/commons/thumb/d/d6/Lock-gray-alt-2.svg/9px-Lock-gray-alt-2.svg.png")no-repeat;background-position:right .1em center}.mw-parser-output .citation .cs1-lock-subscription a{background:url("//upload.wikimedia.org/wikipedia/commons/thumb/a/aa/Lock-red-alt-2.svg/9px-Lock-red-alt-2.svg.png")no-repeat;background-position:right .1em center}.mw-parser-output .cs1-subscription,.mw-parser-output .cs1-registration{color:#555}.mw-parser-output .cs1-subscription span,.mw-parser-output .cs1-registration span{border-bottom:1px dotted;cursor:help}.mw-parser-output .cs1-ws-icon a{background:url("//upload.wikimedia.org/wikipedia/commons/thumb/4/4c/Wikisource-logo.svg/12px-Wikisource-logo.svg.png")no-repeat;background-position:right .1em center}.mw-parser-output code.cs1-code{color:inherit;background:inherit;border:inherit;padding:inherit}.mw-parser-output .cs1-hidden-error{display:none;font-size:100%}.mw-parser-output .cs1-visible-error{font-size:100%}.mw-parser-output .cs1-maint{display:none;color:#33aa33;margin-left:0.3em}.mw-parser-output .cs1-subscription,.mw-parser-output .cs1-registration,.mw-parser-output .cs1-format{font-size:95%}.mw-parser-output .cs1-kern-left,.mw-parser-output .cs1-kern-wl-left{padding-left:0.2em}.mw-parser-output .cs1-kern-right,.mw-parser-output .cs1-kern-wl-right{padding-right:0.2em}


  2. ^ Hajivassiliou, Vassilis (1994). "CLASSICAL ESTIMATION METHODS FOR LDV MODELS USING SIMULATION". Handbook of Econometrics.


  3. ^ Jeliazkov, Ivan (2010). "MCMC PERSPECTIVES ON SIMULATED LIKELIHOOD ESTIMATION". Advances in Econometrics. 26.


  4. ^ Mandt, Stephan; Wenzel, Florian; Nakajima, Shinichi; John, Cunningham; Lippert, Christoph; Kloft, Marius (2017). "Sparse probit linear mixed model" (PDF). Machine Learning. 106 (9–10): 1–22. arXiv:1507.04777. doi:10.1007/s10994-017-5652-6.



Further reading


  • Greene, William H., Econometric Analysis, seventh edition, Prentice-Hall, 2012.



這個網誌中的熱門文章

Tangent Lines Diagram Along Smooth Curve

Yusuf al-Mu'taman ibn Hud

Zucchini