28  Informação de Fisher

Seja \(U(X,\theta)\) função escore. A informação de Fisher é a variância da função escore: \[ I_1(\theta) = \mathrm{Var}_\theta(U(X,\theta)) \]

A informação de Fisher total é a variância da função escore da amostra: \[ I_n(\theta) = \mathrm{Var}_\theta(U_n(\boldsymbol{X}_n,\theta)) \]

Note que, sob as condições \(C_1:C_4\), temos que \[ \begin{aligned} &E_\theta(U(X,\theta)) = 0,\ \forall \theta \in \Theta \\ &E_\theta(U(\boldsymbol{X}_n, \theta)) = 0,\ \forall \theta \in \Theta \\ \Rightarrow &\left\{\begin{array}{ll} I_1(\theta) = E_\theta(U(X,\theta)\cdot U(X,\theta)^T) \\ I_n(\theta) = E_\theta(U_n(\boldsymbol{X}_n,\theta)\cdot U_n(\boldsymbol{X}_n,\theta)^T)\\ \end{array}\right.\ \forall \theta \in \Theta \\ &\text{Se } \Theta \in \mathbb{R}, \\ \Rightarrow &\left\{\begin{array}{ll} I_1(\theta) = E_\theta(U(X,\theta)^2) \\ I_n(\theta) = E_\theta(U_n(\boldsymbol{X}_n,\theta)^2)\\ \end{array}\right.\ \forall \theta \in \Theta \\ \end{aligned} \]

28.1 Teorema (da informação de Fisher sob as condições)

Sob as condições \(C_1:C_4\), temos que

\[ \left\{\begin{array}{ll} I_1(\theta) = - E_\theta\left(\frac{\partial}{\partial \theta^T} U(X,\theta)\right) \\ I_n(\theta) = - E_\theta\left(\frac{\partial}{\partial \theta^T} U_n(\boldsymbol{X}_n,\theta)\right) \end{array}\right.\ \forall \theta \in \Theta \]

28.1.1 Prova

Sem perda de generalidade, considere apenas o caso contínuo e \(\Theta \subseteq \mathbb{R}\). Para \(\Theta \subseteq \mathbb{R}^P\), basta utilizar da transposição de \(\theta\) nas derivadas e \(U(X,\theta)\).

Sabemos que \(U(X,\theta) = \frac{\partial}{\partial\theta}f_\theta(x)\). Além disso, \[ \int_{\mathfrak{X}} f_\theta(x) dx = 1,\ \forall \theta \in \Theta \] em que \(\mathfrak{X} = \{x : f_\theta(x) > 0\}\) não depende de “\(\theta\)”.

Por \(C_3(b)\), \[ \frac{\partial^2}{\partial \theta \partial \theta^T} \int_{\mathfrak{X}} f_\theta(x) dx = \int_{\mathfrak{X}} \frac{\partial^2}{\partial \theta \partial \theta^T} f_\theta(x) dx = 0, \forall \theta \in \Theta \tag{28.1}\]

Note que \[ \begin{aligned} I_1(\theta) &= E_\theta(U(X,\theta)^2) \\ &= \int_{\mathfrak{X}} U(x,\theta)^2 f_\theta(x)dx \\ &= \int_{\mathfrak{X}} \left[\frac{\partial \ln}{\partial \theta} f_\theta(x)\right]^2 f_\theta(x)dx,\ \forall \theta \in \Theta \end{aligned} \]

Observe que, por (28.1), \[ \begin{aligned} &\int_{\mathfrak{X}}\frac{\partial}{\partial\theta} \left[\frac{\partial f_\theta(x)}{\partial \theta} \frac{f_\theta(x)}{f_\theta(x)}\right]dx = 0\\ \Rightarrow &\int_{\mathfrak{X}}\frac{\partial}{\partial\theta} \left[\frac{\partial \ln f_\theta(x)}{\partial \theta} f_\theta(x)\right]dx = 0 \\ \Rightarrow &\int_{\mathfrak{X}}\frac{\partial}{\partial\theta} \left[\frac{\partial^2 \ln f_\theta(x)}{\partial \theta^2} f_\theta(x) + \frac{\partial\ln f_\theta(x)}{\partial\theta}\frac{\partial f_\theta(x)}{\partial\theta}\right]dx = 0\\ \Rightarrow &\int_{\mathfrak{X}} \frac{\partial^2 \ln f_\theta(x)}{\partial \theta^2} f_\theta(x) + \int_{\mathfrak{X}} \frac{\partial\ln f_\theta(x)}{\partial\theta}\frac{\partial f_\theta(x)}{\partial\theta}dx = 0\\ \Rightarrow &E_\theta\left( \frac{\partial^2 \ln }{\partial \theta^2} f_\theta(X)\right) + \int_{\mathfrak{X}} \frac{\partial\ln f_\theta(x)}{\partial\theta}\frac{\partial\ln f_\theta(x)}{\partial\theta}f_\theta(x)dx = 0\\ \Rightarrow &-E_\theta\left( \frac{\partial^2 \ln }{\partial \theta^2} f_\theta(X)\right) = E_\theta\left(\left[\frac{\partial\ln}{\partial\theta}f_\theta(X)\right]^2\right) \\ \Rightarrow &E_\theta(U(X,\theta)^2)= -E_\theta\left(\frac{\partial U(X,\theta)}{\partial \theta}\right) \\ \Rightarrow &I_1(\theta) = -E_\theta\left(\frac{\partial U(X,\theta)}{\partial \theta}\right),\ \forall \theta \in \Theta. \end{aligned} \]

Ademais, temos que \[ I_n(\theta) = \mathrm{Var}_\theta(U_n(\boldsymbol{X}_n, \theta)) \stackrel{C_1:C_4}{=} E_\theta(U_n(\boldsymbol{X}_n,\theta)^2) \] Sabemos que \(U_n(\boldsymbol{X}_n,\theta) = \sum\limits^n_{i=1} U(X_i,\theta)\). Portanto, \[ \begin{aligned} I_n(\theta) &= E_\theta\left[\left( \sum U(X_i,\theta)\right)^2\right]\\ &= E_\theta(\sum_i\sum_j U(X_i,\theta)U(X_j,\theta)) \\ &= E_\theta\left(\sum_{i=1}^n U(X_i,\theta)^2 + \sum_{i\neq j} U(X_i,\theta) U(X_j,\theta)\right) \\ &= \sum E_\theta (U(X_i,\theta)^2) + \sum_{i\neq j }E_\theta(U(X_i,\theta)U(X_j,\theta)), \forall \theta \in \Theta. \end{aligned} \]

Como \(X_1,\dots,X_n\) são i.i.d, temos que \(U(X_1,\theta),\dots,U(X_n,\theta)\) também são i.i.d. Logo, \[ I_n(\theta) = \sum E_\theta(U(X_i,\theta)^2) + \sum_{i\neq j} E_\theta(\cancelto{0}{U(X_i,\theta)}) E_\theta(\cancelto{0}{U(X_j,\theta)}) \]

Por ser i.i.d., temos portanto que \[ I_n(\theta) = nI_1(\theta) \]

Note também que \[ \begin{aligned} E_\theta\left(\frac{\partial}{\partial\theta}U(\boldsymbol{X}_n, \theta)\right) &= E_\theta\left(\frac{\partial}{\partial \theta} \sum U(X_i,\theta)\right) \\ &= E_\theta\left(\sum \frac{\partial}{\partial \theta} U(X_i,\theta)\right) \\ &= n E_\theta\left(\frac{\partial}{\partial \theta} U(X_i,\theta)\right),\ \forall \theta \in \Theta. \end{aligned} \]

Como \(I_1(\theta) = - E_\theta\left(\frac{\partial}{\partial} U(X,\theta)\right)\). Temos que \[ \begin{aligned} &-E_\theta\left(\frac{\partial}{\partial \theta} U(\boldsymbol{X}_n,\theta)\right) = n I_1(\theta) \\ \Rightarrow &I_n(\theta) = nI_1(\theta) = -E_\theta\left(\frac{\partial}{\partial \theta} U(\boldsymbol{X}_n,\theta)\right), \forall \theta \in \Theta._\blacksquare \end{aligned} \]

28.2 Exemplo

Seja \(\boldsymbol{X}_n\) a.a de \(X\sim f_\theta, \theta\in \mathbb{R}_+\), tal que \[ f_\theta(x) = \left\{\begin{array}{ll} \theta x^{\theta-1},& x \in (0,1) \\ 0,& \mathrm{c.c.} \end{array}\right. \]

Encntre a função escore da amostra e a informação de Fisher total.

28.2.1 Resposta

Note que \[ L_{\boldsymbol{X}_n}(\theta) = \prod^n \left\{ \theta x_i^{\theta-1}\right\} = \theta^n (\prod x_i)^{\theta-1},\ \forall \theta \in \Theta. \]

A função escore da amostra é \[ \begin{aligned} U_n(\boldsymbol{X}_n,\theta) &= \frac{\partial}{\partial \theta} \ln L_{\boldsymbol{X}_n} (\theta) \\ &= \frac{\partial}{\partial \theta} \left[n \ln \theta + (\theta-1) \sum \ln x_i\right] \\ &= \frac{n}{\theta} + \sum \ln x_i, \forall \theta \in \Theta. \end{aligned} \]

A informação de Fisher total é, pelo teorema, \[ I_n(\theta) = - E_\theta\left(\frac{\partial}{\partial \theta} U_n(\boldsymbol{X}_n,\theta)\right), \forall \theta \in \Theta. \]

Note que \[ \frac{\partial}{\partial \theta} U_n(\boldsymbol{X}_n,\theta) = - \frac{n}{\theta^2}, \forall \theta \in \Theta. \]

Portanto, \[ I_n(\theta) = -\frac{n}{\theta^2}, \forall \theta \in \Theta. \]

28.3 Exemplo (Binomial)

Seja \(\boldsymbol{X}_n\) a.a. de \(X \sim \mathrm{Bin}(3,\theta), \theta \in (0,1)\). Encontre a função escore e a informação de Fisher total.

28.3.1 Resposta

Note que \[ \begin{aligned} L_{\boldsymbol{X}_n}(\theta) &= \prod f_\theta(x) = \prod \left\{\binom{3}{x_i} \theta^{x_i} (1-\theta)^{3-x}\right\} \\ \Rightarrow \ln L_{\boldsymbol{X}_n}(\theta) &= \sum \ln \binom{3}{x_i} + \sum x_i \ln \theta + \sum (3-x_i)\ln(1-\theta) \\ \Rightarrow U_n(\boldsymbol{x}_n,\theta) &= \sum \frac{x_i}{\theta} + \sum \frac{(3-x_i)}{1-\theta} (-1) \\ &= \frac{x_i}{\theta} - \sum \frac{(3-x_i)}{1-\theta},\ \forall \theta \in \Theta. \end{aligned} \]

A informação de Fisher total é, pelo teorema, \[ \begin{aligned} I_n(\theta) &= -E_\theta\left(\frac{\partial}{\partial\theta} U_n(\boldsymbol{X}_n,\theta)\right) \\ &= -E_\theta\left( - \sum \frac{X_i}{\theta^2} + \sum \frac{(3-X)}{(1-\theta)^2}(-1)\right) \\ &= E_\theta\left(\sum \frac{X_i}{\theta^2} + \sum \frac{(3-X)}{(1-\theta)^2}\right) \\ &= \sum \frac{E_\theta(X_i)}{\theta^2} + \sum \frac{3 - E_\theta(X)}{(1-\theta)^2} \\ &= \frac{n 3\theta}{\theta^2} + \frac{n(3-3\theta)}{(1-\theta)^2} \\ &= \frac{ 3n}{\theta} + \frac{3n(1-\theta)}{(1-\theta)^2} \\ &= \frac{3n}{\theta} + \frac{3n}{(1-\theta)} \\ &= \frac{3n}{\theta(1-\theta)},\ \forall \theta \in \Theta. \end{aligned} \]

Podemos também fazer pela definição.

Observe que \[ \begin{aligned} E_\theta(U_n(\boldsymbol{X}_n,\theta) &= E_\theta\left(\sum \frac{X_i}{\theta} - \sum \frac{3-X_i}{1-\theta}\right) \\ &= \sum \frac{E_\theta(X_i)}{\theta} - \frac{(3-E_\theta(X_i)}{1-\theta} \\ &= \frac{n 3 \theta}{\theta} - \frac{n (3- 3\theta)}{1-\theta} \\ &= n3 - n3 \frac{1-\theta}{1-\theta} = 0,\ \forall \theta \in \Theta. \end{aligned} \]

\[ \begin{aligned} I_n(\theta) &= \mathrm{Var}_\theta(U_n(\boldsymbol{X}_n,\theta)) \\ &= E_\theta(U_n(\boldsymbol{X},\theta)^2) \\ &= E_\theta\left(\left[\sum \left(\frac{X_i}{\theta} - \frac{(3-X_i)}{1-\theta}\right)\right]^2\right) \\ &= E_\theta\left(\sum_i \sum_j \left(\frac{X_i}{\theta} - \frac{(3-X_i)}{1-\theta}\right)\left(\frac{X_j}{\theta} - \frac{(3-X_j)}{1-\theta}\right)\right) \\ &= \sum_i E_\theta\left(\left(\frac{X_i}{\theta} - \frac{(3-X_i)}{1-\theta}\right)^2\right) \\ &\cdot\underset{i \neq j}{\sum\sum}E_\theta\left(\cancelto{0}{\frac{X_i}{\theta} - \frac{(3-X_i)}{1-\theta}}\right) E_\theta\left(\cancelto{0}{\frac{X_j}{\theta} - \frac{(3-X_j)}{1-\theta}}\right)\\ &= n E_\theta\left[\left( \frac{X}{\theta} - \frac{3-X}{1-\theta}\right)^2\right] \\ &= n E_\theta\left[\left( \frac{X(1-\theta)-(3-x)\theta}{\theta(1-\theta)}\right)^2\right] \\ &= \frac{n E_\theta\left[(X - \theta X - 3\theta + \theta X)^2\right]}{\theta(1-\theta)} \\ &= \frac{n E_\theta\left[(X - 3\theta)^2\right]}{\theta^2(1-\theta)^2} \\ &= \frac{n \mathrm{Var}_\theta}{\theta^2(1-\theta)^2} = \frac{n3\theta(1-\theta)}{\theta^2(1-\theta)^2}\\ &= \frac{3n}{\theta(1-\theta)},\ \forall \theta \in \Theta. \end{aligned} \]

28.4 Teorema (Desigualdade da informação, LICR)

Seja \(\boldsymbol{X}_n\) a.a. de \(X\sim f_\theta, \theta \in \Theta \subseteq \mathbb{R}\), em que \(f_\theta\) satisfaz as condições \(C_1:C_4\). Considere \(T(\boldsymbol{X}_n)\) um estimador não-viciado para \(g(\theta) = \theta\). Então, \[ \mathrm{Var}_\theta(T(\boldsymbol{X}_n)) \geq \frac{1}{n I_1(\theta)} = \frac{1}{I_n(\theta)}, \forall \theta \in \Theta, \] sempre que \(E_\theta(T(\boldsymbol{X}_n)^2)<\infty, \forall \theta \in \Theta\), em que \(I_1(\theta)\) é a informação de Fisher e \(I_n(\theta)\) é a informação de Fisher total.

Observação

A quantidade \(\frac{1}{I_n(\theta)}\) é chamada de Limite Inferior de Crammer Rao (LICR)

28.4.1 Prova

Como \(T(\boldsymbol{X}_n)\) é não-viciado para \(g(\theta) = \theta\), temos que \[ E_\theta(T(\boldsymbol{X}_n)) = \theta, \forall \theta \in \Theta. \]

Por definição de esperança (para o caso contínuo), \[ E_\theta(T(\boldsymbol{X}_n)) = \int_{-\infty}^{\infty} \dots \int_{-\infty}^{\infty} T(x_1,\dots,x_n) f_\theta^{\boldsymbol{X}_n}(x_1,\dots,x_n) dx_1,\dots,dx_n \tag{28.2}\]

Observação

\[ E((X_1+X_2)^2) = \int^\infty_{-\infty} \int^\infty_{-\infty} (x_1 + x_2)^2 f^{X_1, X_2}_\theta(x_1,x_2) dx_1dx_2 \]

Diferenciando (28.2) com relação a “\(\theta\)\[ \begin{aligned} &\frac{\partial}{\partial \theta} \int_{-\infty}^{\infty} \dots \int_{-\infty}^{\infty} T(\boldsymbol{x}_n) f_\theta^{\boldsymbol{X}_n} d\boldsymbol{x}_n = 1 \\ &\stackrel{C_3}{\Rightarrow} \int_{-\infty}^{\infty} \dots \int_{-\infty}^{\infty} T(\boldsymbol{x}_n) \frac{\partial}{\partial \theta} f^{\boldsymbol{X}_n}_\theta(\boldsymbol{x}_n) = 1 \\ &\Rightarrow \int_{-\infty}^{\infty} \dots \int_{-\infty}^{\infty} T(\boldsymbol{x}_n) \frac{\partial\ln}{\partial \theta} f_\theta^{\boldsymbol{X}_n}(\boldsymbol{x}_n) f_\theta^{\boldsymbol{X}_n}(\boldsymbol{x}_n) d\boldsymbol{x}_n= 1,\forall \theta \in \Theta. \\ &\Rightarrow E_\theta(T(\boldsymbol{X}_n) \cdot U_n(\boldsymbol{X}_n,\theta)) = 1, \forall \theta \in \Theta. \end{aligned} \]

Note que \[ \begin{aligned} \mathrm{Cor}_\theta(T(\boldsymbol{X}_n), U_n(\boldsymbol{X}_n,\theta))^2 \leq 1, \forall \theta \in \Theta \\ \iff \frac{\mathrm{Cov}_\theta(T(\boldsymbol{X}_n), U_n(\boldsymbol{X}_n,\theta))^2}{\mathrm{Var}_\theta (T(\boldsymbol{X}_n))\mathrm{Var}_\theta(U_n(\boldsymbol{X}_n,\theta))} \leq 1, \forall \theta \in \Theta. \end{aligned} \tag{28.3}\]

Ademais, \[ \mathrm{Cov}_\theta(T(\boldsymbol{X}_n), U_n(\boldsymbol{X}_n,\theta)) = E_\theta(T(\boldsymbol{X}_n \cdot U_n(\boldsymbol{X}_n,\theta)) - \cancelto{\theta}{E_\theta(T(\boldsymbol{X}_n)} \cdot \cancelto{0}{E_\theta(U_n(\boldsymbol{X}_n,\theta))} = 1, \forall \theta \in \Theta \]

Substituindo em (28.3), temos que

\[ \frac{1^2}{\mathrm{Var}_\theta (T(\boldsymbol{X}_n))\mathrm{Var}_\theta(U_n(\boldsymbol{X}_n,\theta))} \leq 1, \forall \theta \in \Theta. \]

Por definição de informação de Fisher total, \[ \frac{1}{I_n(\theta)} \leq \mathrm{Var}_\theta(T(\boldsymbol{X}_n)), \forall \theta \in \Theta. \]

28.5 Teorema (generalização)

Seja \(\boldsymbol{X}_n\) a.a. de \(X \sim f_\theta, \theta \in \Theta\), em que \(f_\theta\) satisfaz as condições \(C_1,C_4\). Considere \(T(\boldsymbol{X}_n)\) um estimador não-viciado para \(g(\theta) \in \mathbb{R}\) cuja derivada com respeito a “\(\theta\)” existe \(\forall \theta \in \Theta\). Então, \[ \mathrm{Var}_\theta(T(\boldsymbol{X}_n)) \geq \frac{g'(\theta)^2}{I_n(\theta)}, \forall \theta \in \Theta. \] sempre que \(E_\theta(T(\boldsymbol{X}_n)^2) < \infty, \forall \theta \in \Theta\).

28.5.1 Prova

Exercício!!!