Admissible decision rule




Type of "good" decision rule in Bayesian statistics
















In statistical decision theory, an admissible decision rule is a rule for making a decision such that there is no other rule that is always "better" than it[1] (or at least sometimes better and never worse), in the precise sense of "better" defined below. This concept is analogous to Pareto efficiency.




Contents






  • 1 Definition


  • 2 Bayes rules and generalized Bayes rules


    • 2.1 Bayes rules


    • 2.2 Generalized Bayes rules


    • 2.3 Admissibility of (generalized) Bayes rules




  • 3 Examples


  • 4 Notes


  • 5 References





Definition


Define sets Θ{displaystyle Theta ,}Theta ,, X{displaystyle {mathcal {X}}}{mathcal {X}} and A{displaystyle {mathcal {A}}}{mathcal {A}}, where Θ{displaystyle Theta ,}Theta , are the states of nature, X{displaystyle {mathcal {X}}}{mathcal {X}} the possible observations, and A{displaystyle {mathcal {A}}}{mathcal {A}} the actions that may be taken. An observation x∈X{displaystyle xin {mathcal {X}},!}xin {mathcal  {X}},! is distributed as F(x∣θ){displaystyle F(xmid theta ),!}F(xmid theta ),! and therefore provides evidence about the state of nature θΘ{displaystyle theta in Theta ,!}theta in Theta ,!. A decision rule is a function δ:X→A{displaystyle delta :{mathcal {X}}rightarrow {mathcal {A}}}delta :{{mathcal  {X}}}rightarrow {{mathcal  {A}}}, where upon observing x∈X{displaystyle xin {mathcal {X}}}xin {mathcal  {X}}, we choose to take action δ(x)∈A{displaystyle delta (x)in {mathcal {A}},!}delta (x)in {mathcal  {A}},!.


Also define a loss function L:Θ×A→R{displaystyle L:Theta times {mathcal {A}}rightarrow mathbb {R} }L:Theta times {mathcal  {A}}rightarrow {mathbb  {R}}, which specifies the loss we would incur by taking action a∈A{displaystyle ain {mathcal {A}}}ain {mathcal  {A}} when the true state of nature is θΘ{displaystyle theta in Theta }theta in Theta . Usually we will take this action after observing data x∈X{displaystyle xin {mathcal {X}}}xin {mathcal {X}}, so that the loss will be L(θ(x)){displaystyle L(theta ,delta (x)),!}L(theta ,delta (x)),!. (It is possible though unconventional to recast the following definitions in terms of a utility function, which is the negative of the loss.)


Define the risk function as the expectation


R(θ)=EF(x∣θ)⁡[L(θ(x))].{displaystyle R(theta ,delta )=operatorname {E} _{F(xmid theta )}[{L(theta ,delta (x))]}.,!}R(theta ,delta )=operatorname {E}_{{F(xmid theta )}}[{L(theta ,delta (x))]}.,!

Whether a decision rule δ{displaystyle delta ,!}delta,! has low risk depends on the true state of nature θ{displaystyle theta ,!}theta,!. A decision rule δ{displaystyle delta ^{*},!}delta ^{*},! dominates a decision rule δ{displaystyle delta ,!}delta,! if and only if R(θ)≤R(θ){displaystyle R(theta ,delta ^{*})leq R(theta ,delta )}R(theta ,delta ^{*})leq R(theta ,delta ) for all θ{displaystyle theta ,!}theta,!, and the inequality is strict for some θ{displaystyle theta ,!}theta,!.


A decision rule is admissible (with respect to the loss function) if and only if no other rule dominates it; otherwise it is inadmissible. Thus an admissible decision rule is a maximal element with respect to the above partial order.
An inadmissible rule is not preferred (except for reasons of simplicity or computational efficiency), since by definition there is some other rule that will achieve equal or lower risk for all θ{displaystyle theta ,!}theta,!. But just because a rule δ{displaystyle delta ,!}delta,! is admissible does not mean it is a good rule to use. Being admissible means there is no other single rule that is always as good or better – but other admissible rules might achieve lower risk for most θ{displaystyle theta ,!}theta,! that occur in practice. (The Bayes risk discussed below is a way of explicitly considering which θ{displaystyle theta ,!}theta,! occur in practice.)



Bayes rules and generalized Bayes rules




Bayes rules


Let π){displaystyle pi (theta ),!}pi (theta ),! be a probability distribution on the states of nature. From a Bayesian point of view, we would regard it as a prior distribution. That is, it is our believed probability distribution on the states of nature, prior to observing data. For a frequentist, it is merely a function on Θ{displaystyle Theta ,!}Theta ,! with no such special interpretation. The Bayes risk of the decision rule δ{displaystyle delta ,!}delta,! with respect to π){displaystyle pi (theta ),!}pi (theta ),! is the expectation


r(π)=Eπ)⁡[R(θ)].{displaystyle r(pi ,delta )=operatorname {E} _{pi (theta )}[R(theta ,delta )].,!}r(pi ,delta )=operatorname {E}_{{pi (theta )}}[R(theta ,delta )].,!

A decision rule δ{displaystyle delta ,!}delta,! that minimizes r(π){displaystyle r(pi ,delta ),!}r(pi ,delta ),! is called a Bayes rule with respect to π){displaystyle pi (theta ),!}pi (theta ),!. There may be more than one such Bayes rule. If the Bayes risk is infinite for all δ{displaystyle delta ,!}delta,!, then no Bayes rule is defined.



Generalized Bayes rules



In the Bayesian approach to decision theory, the observed x{displaystyle x,!}x,! is considered fixed. Whereas the frequentist approach (i.e., risk) averages over possible samples x∈X{displaystyle xin {mathcal {X}},!}xin {mathcal  {X}},!, the Bayesian would fix the observed sample x{displaystyle x,!}x,! and average over hypotheses θΘ{displaystyle theta in Theta ,!}theta in Theta ,!. Thus, the Bayesian approach is to consider for our observed x{displaystyle x,!}x,! the expected loss


ρx)=Eπx)⁡[L(θ(x))].{displaystyle rho (pi ,delta mid x)=operatorname {E} _{pi (theta mid x)}[L(theta ,delta (x))].,!}rho (pi ,delta mid x)=operatorname {E}_{{pi (theta mid x)}}[L(theta ,delta (x))].,!

where the expectation is over the posterior of θ{displaystyle theta ,!}theta,! given x{displaystyle x,!}x,! (obtained from π){displaystyle pi (theta ),!}pi (theta ),! and F(x∣θ){displaystyle F(xmid theta ),!}F(xmid theta ),! using Bayes' theorem).


Having made explicit the expected loss for each given x{displaystyle x,!}x,! separately, we can define a decision rule δ{displaystyle delta ,!}delta,! by specifying for each x{displaystyle x,!}x,! an action δ(x){displaystyle delta (x),!}delta (x),! that minimizes the expected loss. This is known as a generalized Bayes rule with respect to π){displaystyle pi (theta ),!}pi (theta ),!. There may be more than one generalized Bayes rule, since there may be multiple choices of δ(x){displaystyle delta (x),!}delta (x),! that achieve the same expected loss.


At first, this may appear rather different from the Bayes rule approach of the previous section, not a generalization. However, notice that the Bayes risk already averages over Θ{displaystyle Theta ,!}Theta ,! in Bayesian fashion, and the Bayes risk may be recovered as the expectation over X{displaystyle {mathcal {X}}}{mathcal {X}} of the expected loss (where x∼θ{displaystyle xsim theta ,!}xsim theta ,! and θπ{displaystyle theta sim pi ,!}theta sim pi ,!). Roughly speaking, δ{displaystyle delta ,!}delta,! minimizes this expectation of expected loss (i.e., is a Bayes rule) if and only if it minimizes the expected loss for each x∈X{displaystyle xin {mathcal {X}}}xin {mathcal {X}} separately (i.e., is a generalized Bayes rule).


Then why is the notion of generalized Bayes rule an improvement? It is indeed equivalent to the notion of Bayes rule when a Bayes rule exists and all x{displaystyle x,!}x,! have positive probability. However, no Bayes rule exists if the Bayes risk is infinite (for all δ{displaystyle delta ,!}delta,!). In this case it is still useful to define a generalized Bayes rule δ{displaystyle delta ,!}delta,!, which at least chooses a minimum-expected-loss action δ(x){displaystyle delta (x)!,}delta (x)!, for those x{displaystyle x,!}x,! for which a finite-expected-loss action does exist. In addition, a generalized Bayes rule may be desirable because it must choose a minimum-expected-loss action δ(x){displaystyle delta (x),!}delta (x),! for every x{displaystyle x,!}x,!, whereas a Bayes rule would be allowed to deviate from this policy on a set X⊆X{displaystyle Xsubseteq {mathcal {X}}}Xsubseteq {mathcal  {X}} of measure 0 without affecting the Bayes risk.


More important, it is sometimes convenient to use an improper prior π){displaystyle pi (theta ),!}pi (theta ),!. In this case, the Bayes risk is not even well-defined, nor is there any well-defined distribution over x{displaystyle x,!}x,!. However, the posterior πx){displaystyle pi (theta mid x),!}pi (theta mid x),!—and hence the expected loss—may be well-defined for each x{displaystyle x,!}x,!, so that it is still possible to define a generalized Bayes rule.



Admissibility of (generalized) Bayes rules


According to the complete class theorems, under mild conditions every admissible rule is a (generalized) Bayes rule (with respect to some prior π){displaystyle pi (theta ),!}pi (theta ),!—possibly an improper one—that favors distributions θ{displaystyle theta ,!}theta,! where that rule achieves low risk). Thus, in frequentist decision theory it is sufficient to consider only (generalized) Bayes rules.


Conversely, while Bayes rules with respect to proper priors are virtually always admissible, generalized Bayes rules corresponding to improper priors need not yield admissible procedures. Stein's example is one such famous situation.



Examples


The James–Stein estimator is a nonlinear estimator of the mean of Gaussian random vectors which can be shown to dominate, or outperform, the ordinary least squares technique with respect to a mean-square error loss function.[2] Thus least squares estimation is not an admissible estimation procedure in this context. Some others of the standard estimates associated with the normal distribution are also inadmissible: for example, the sample estimate of the variance when the population mean and variance are unknown.[3]







Notes





  1. ^ Dodge, Y. (2003) The Oxford Dictionary of Statistical Terms. OUP. .mw-parser-output cite.citation{font-style:inherit}.mw-parser-output .citation q{quotes:"""""""'""'"}.mw-parser-output .citation .cs1-lock-free a{background:url("//upload.wikimedia.org/wikipedia/commons/thumb/6/65/Lock-green.svg/9px-Lock-green.svg.png")no-repeat;background-position:right .1em center}.mw-parser-output .citation .cs1-lock-limited a,.mw-parser-output .citation .cs1-lock-registration a{background:url("//upload.wikimedia.org/wikipedia/commons/thumb/d/d6/Lock-gray-alt-2.svg/9px-Lock-gray-alt-2.svg.png")no-repeat;background-position:right .1em center}.mw-parser-output .citation .cs1-lock-subscription a{background:url("//upload.wikimedia.org/wikipedia/commons/thumb/a/aa/Lock-red-alt-2.svg/9px-Lock-red-alt-2.svg.png")no-repeat;background-position:right .1em center}.mw-parser-output .cs1-subscription,.mw-parser-output .cs1-registration{color:#555}.mw-parser-output .cs1-subscription span,.mw-parser-output .cs1-registration span{border-bottom:1px dotted;cursor:help}.mw-parser-output .cs1-ws-icon a{background:url("//upload.wikimedia.org/wikipedia/commons/thumb/4/4c/Wikisource-logo.svg/12px-Wikisource-logo.svg.png")no-repeat;background-position:right .1em center}.mw-parser-output code.cs1-code{color:inherit;background:inherit;border:inherit;padding:inherit}.mw-parser-output .cs1-hidden-error{display:none;font-size:100%}.mw-parser-output .cs1-visible-error{font-size:100%}.mw-parser-output .cs1-maint{display:none;color:#33aa33;margin-left:0.3em}.mw-parser-output .cs1-subscription,.mw-parser-output .cs1-registration,.mw-parser-output .cs1-format{font-size:95%}.mw-parser-output .cs1-kern-left,.mw-parser-output .cs1-kern-wl-left{padding-left:0.2em}.mw-parser-output .cs1-kern-right,.mw-parser-output .cs1-kern-wl-right{padding-right:0.2em}
    ISBN 0-19-920613-9 (entry for admissible decision function)



  2. ^ Cox & Hinkley 1974, Section 11.8


  3. ^ Cox & Hinkley 1974, Exercise 11.7




References




  • Cox, D. R.; Hinkley, D. V. (1974). Theoretical Statistics. Wiley. ISBN 0-412-12420-3.


  • Berger, James O. (1980). Statistical Decision Theory and Bayesian Analysis (2nd ed.). Springer-Verlag. ISBN 0-387-96098-8.


  • DeGroot, Morris (2004) [1st. pub. 1970]. Optimal Statistical Decisions. Wiley Classics Library. ISBN 0-471-68029-X.


  • Robert, Christian P. (1994). The Bayesian Choice. Springer-Verlag. ISBN 3-540-94296-3.




這個網誌中的熱門文章

Tangent Lines Diagram Along Smooth Curve

Yusuf al-Mu'taman ibn Hud

Zucchini