38  Normal Multivariada

Dizemos que \(\boldsymbol{X} = \left(\begin{array}{c}X_1 \\ \vdots \\ X_n\end{array}\right)\) tem distribuição multivariada \(N_d(\boldsymbol{\mu}, \Sigma)\) se, e somente se, a sua função densidade de probabilidade é dada por:

\[ f^{\boldsymbol{X}}(\boldsymbol{x}) = \frac{1}{(2\pi)^{\frac{d}{2}} \lvert \Sigma \rvert^{\frac{1}{2}}} \mathrm{Exp}\left\{-\frac{1}{2} (\boldsymbol{x} - \boldsymbol{\mu})^T \Sigma^{-1}(\boldsymbol{x}-\boldsymbol{\mu})\right\} \]

em que \(\mu=\left(\begin{array}{c} \mu_1, \vdots, \mu_n\end{array}\right)\) e \(\Sigma = \left[\begin{array}{cccc} \sigma_{11} & \sigma_{12} & \dots & \sigma_{1d} \\ \sigma_{21} & \sigma_{22} & \dots & \sigma_{2d} \\ \vdots & \vdots & \ddots & \vdots \\ \sigma_{d1} & \dots & \sigma_{dd-1} & \sigma_{dd} \end{array}\right]\) é uma matriz simétrica positiva definida, ou seja, \(\boldsymbol{y}^T \Sigma \boldsymbol{y} > 0, \forall y \in \mathbb{R}^d \setminus \{\boldsymbol{0}\}\)

Notação

\[ \boldsymbol{X} \sim N_d(\boldsymbol{\mu}, \Sigma) \]

Obs 1: Caso particular: Normal univariada

Se \(d=1\), então \(\boldsymbol{X} = X_1\) e \(X_1 \sim N_1(\mu_1,\sigma_{11})\)

Obs 2: Esperança e (co)variância

Pode-se mostrar que: \[ E(\boldsymbol{X}) = \left( \begin{array}{c} E(X_1) \\ \vdots \\ E(X_d) \end{array} \right) = \boldsymbol{\mu} \] e \[ \mathrm{Var}(\boldsymbol{X}) = E((\boldsymbol{X} - \boldsymbol{\mu})(\boldsymbol{X} - \boldsymbol{\mu})^T) = \Sigma \] ou seja, \[ \mathrm{Cov}(X_i, X_j) = \sigma_{i,j}, i, j = 1, \dots, d \]

Obs 3: Covariância implica independência para normais

Se \[ \Sigma = \mathrm{diag}(\sigma_{11}\dots\sigma_{dd}) = \left[\begin{array}{cccc} \sigma_{11} & 0 & \dots & 0 \\ 0 & \sigma_{22} & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & \dots & 0 & \sigma_{dd} \end{array}\right] \] então \(X_1, \dots, X_d\) são independentes.

Ou seja, se \(X_1, \dots, X_d\) forem não correlacionadas (covariância zero), então serão independentes. Em outras distribuições multivariadas, isso não é certo.

Obs 4: Caso de independência

Se \(\Sigma = \mathrm{diag}(\sigma_{11} \dots \sigma_{dd})\), então \[ \lvert \Sigma \rvert = \prod^d_{i=1} \sigma_{ii}, \Sigma^{-1} = \left[\begin{array}{cccc} \sigma_{11}^{-1} & 0 & \dots & 0 \\ 0 & \sigma_{22}^{-1} & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & \dots & 0 & \sigma_{dd}^{-1} \end{array}\right] \] e \[ \begin{aligned} (\boldsymbol{x} - \boldsymbol{\mu})^{T}\Sigma^{-1}(\boldsymbol{x} - \boldsymbol{\mu}) &= \left(\begin{array}{c} x_1 - \mu_1 \\ \vdots \\ x_d - \mu_d \\ \end{array}\right)^T \left[\begin{array}{cccc} \sigma_{11}^{-1} & 0 & \dots & 0 \\ 0 & \sigma_{22}^{-1} & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & \dots & 0 & \sigma_{dd}^{-1} \end{array}\right] \left(\begin{array}{c} x_1 - \mu_1 \\ \vdots \\ x_d - \mu_d \\ \end{array}\right) \\ &= \left( \frac{(x_1 - \mu_1}{\sigma_{11}} \dots \frac{x_d - \mu_d}{\sigma_{dd}} \right) \left(\begin{array}{c} x_1 - \mu_1 \\ \vdots \\ x_d - \mu_d \\ \end{array}\right) \\ &= \frac{(x_1 - \mu_1)^2}{\sigma_{11}} +\dots +\frac{(x_d - \mu_d)^2}{\sigma_{dd}} \end{aligned} \] Portanto, a f.d.p de \(X\) quando \(\Sigma = \mathrm{diag}(\sigma_{11} \dots \sigma_{dd})\) é \[ \begin{aligned} f^{\boldsymbol{X}}(\boldsymbol{x}) &= \frac{1}{(2\pi)^{\frac{d}{2}} \left(\prod \sigma_{ii}\right)^{\frac{1}{2}}} \mathrm{Exp}\left\{-\frac{1}{2}\sum^d \frac{(x_i - \mu_i)^2}{(\sigma_{ii})}\right\} \\ &= \prod \left\{ \frac{1}{(2\pi)^{\frac{d}{2}} \sigma_{ii}^{\frac{1}{2}}} \mathrm{Exp}\left\{-\frac{1}{2} \frac{(x_i - \mu_i)^2 }{(\sigma_{ii})}\right\} \right\} \\ &= \prod^d_{i=1} f^{(i)}(x_i) \end{aligned} \] em que \(f^{(i)}(x_i)\) é a f.d.p de \(N(\mu_i, \sigma_{ii})\)

Obs 5: Função geradora de momentos

A função geradora de momentos de \(\boldsymbol{X} \sim N_d(\boldsymbol{\mu} \Sigma)\) é dada por \[ M_{\boldsymbol{X}}(\boldsymbol{t}) = \mathrm{Exp}\left\{ \boldsymbol{\mu}^T\boldsymbol{t} + \frac{1}{2} \boldsymbol{t}^T \Sigma \boldsymbol{t} \right\}, \forall \boldsymbol{t} \in \mathbb{R}^d \]

Obs 6: Transformações para normais generalizadas

Se \(\boldsymbol{Y} = \boldsymbol{a} + B\boldsymbol{X}\), em que \(\boldsymbol{a} \in \mathbb{R}^m\) e \(B\) é uma matriz \(m\times d\), em que \(m \leq d\) cujas linhas são linearmente independentes, então, \[ \boldsymbol{Y} \sim N_m(\boldsymbol{a} + B \boldsymbol{\mu}, B \Sigma B^T). \] Prova por F.G.M: \[ \begin{aligned} M_{\boldsymbol{Y}}(\boldsymbol{t} &= E(\mathrm{e}^{\boldsymbol{Y}^T\boldsymbol{t}}) \\ &= E(\mathrm{e}^{\boldsymbol{a}^T \boldsymbol{t} + \boldsymbol{X}^T B^T \boldsymbol{t}}) \\ &= \mathrm{e}^{\boldsymbol{a}^T\boldsymbol{t}}E(\mathrm{e}^{\boldsymbol{X}^T B^T \boldsymbol{t}}) \\ &= \mathrm{e}^{\boldsymbol{a}^T\boldsymbol{t}} M_{\boldsymbol{X}}(B^T \boldsymbol{t}) \\ &= \mathrm{e}^{\boldsymbol{a}^T\boldsymbol{t}} \mathrm{e}^{\boldsymbol{\mu}(B^T\boldsymbol{t}) + \frac{1}{2} \boldsymbol{t}^T B \Sigma B^T \boldsymbol{t}}\\ &= \mathrm{e}^{(\boldsymbol{a} + B\boldsymbol{\mu})^T\boldsymbol{t} + \frac{1}{2} \boldsymbol{t}^T B \Sigma B^T \boldsymbol{t}} \end{aligned} \]

Observações adicionais:

Se tomarmos \(B = \Sigma^{-\frac{1}{2}}\), então \(B\cdot B = \Sigma^{-1}\) e \(B \Sigma B^{T} = I\). Ademais, tomando \(\boldsymbol{a} = - \Sigma^{-\frac{1}{2}}\boldsymbol{\mu}\), temos que \[ \boldsymbol{Y} = \Sigma^{-\frac{1}{2}} \boldsymbol{X} - \Sigma^{-\frac{1}{2}}\boldsymbol{\mu} = \Sigma^{-\frac{1}{2}}(\boldsymbol{X} - \boldsymbol{\mu}) \sim N_d(0,I) \]

Como \(\Sigma\) é uma matriz quadrada simétrica e positiva definida, todos seus autovalores são estritamente positivas. Logo, \[ \Sigma = \Gamma \Lambda \Gamma^T \] em que \(\Gamma\) é a matriz diagonal de autovetores de \(\Sigma\) e \(\Lambda\) é a matriz diagonal de autovalores de \(\Sigma\).

Pode-se mostrar que \(\Sigma^{-1} = \Gamma \Lambda^{-1}\Gamma^T\) e \(\Sigma^{-\frac{1}{2}} = \Gamma \Lambda^{-\frac{1}{2}}\Gamma^T\) em que \[ \Lambda^k = \begin{bmatrix} \lambda_1^k & 0 & \dots & 0 \\ 0 & \lambda_2^k & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & \dots & 0 & \lambda_d^k\\ \end{bmatrix} \]

Note ainda que \[ \lvert \Sigma \rvert = \lvert \Gamma \Lambda \Gamma^T \rvert = \lvert \Lambda \Gamma^T \Gamma \rvert \] como \(\Gamma^T \Gamma = I\), \[ \lvert \Sigma \rvert = \lvert \Lambda \rvert = \prod^d_{i=1} \lambda_i \]

Além disso, \[ \mathrm{tr}\{\Sigma\} = \mathrm{tr}\{\Gamma \Lambda \Gamma^T\} = \mathrm{tr}\{\Lambda\Gamma^T\Gamma\} = \mathrm{tr}\{\Lambda\} = \sum^d_{i=1} \lambda_i \]

Obs 7: Normais marginais

Se \(\boldsymbol{X} \sim N_d(\boldsymbol{\mu}, \Sigma)\), então \(X_i \sim N(\mu_i, \sigma_{ii}), i = 1,\dots d\)

Demonstração:

Tomando \(\boldsymbol{a} = 0\) e \(B_i = (0 \dots 1 \dots 0)\) (\(1\) na \(i\)-ésima posição), temos que \[ a + B_i \boldsymbol{X} = X_i \sim N_1(\mu_i, \sigma_{ii}) \] pelo resultado 6. Note que \[ B_i \Sigma B^T_i = \sigma_{ii}. \]

38.1 Normal bivariada \(d=2\)

\[ \boldsymbol{X} = \left(\begin{array}{c} X_1 \\ X_2 \end{array}\right) \sim N_2 (\boldsymbol{\mu}, \Sigma) \] em que \(\boldsymbol{\mu} =\left(\begin{array}{c} \mu_1 \\ \mu_2 \end{array}\right)\) e \(\Sigma = \left(\begin{array}{cc} \sigma_{11} & \sigma_{12} \\ \sigma_{12} & \sigma_{22} \end{array}\right)\) é simétrica positiva definida.

\[ \begin{aligned} \lvert \Sigma \rvert &= \sigma_{11} \sigma_{22} - \sigma_{12}^2 \\ \Sigma^{-1} =& \frac{1}{\sigma_{11} \sigma_{22} - \sigma_{12}^2} \left(\begin{array}{cc} \sigma_{11} & -\sigma_{12} \\ -\sigma_{12} & \sigma_{22} \end{array}\right) \\ \Rightarrow f^{\boldsymbol{X}}(x_1, x_2)\\ = \frac{1}{2\pi \sqrt{\sigma_{11}\sigma_{22} - \sigma_{12}^2}} &\mathrm{Exp}\underbracket{\left\{ -\frac{1}{2} \left(\begin{array}{c} X_1 - \mu_1 \\ X_2 - \mu_2 \end{array}\right)^T \left(\begin{array}{cc} \sigma_{11} & -\sigma_{12} \\ -\sigma_{12} & \sigma_{22} \end{array}\right) \left(\begin{array}{c} X_1 - \mu_1 \\ X_2 - \mu_2 \end{array}\right) \frac{1}{\sigma_{11}\sigma_{22} - \sigma_{12}^2} \right\}}_{*} \\ \\ *&= [(x_1 - \mu_1)\sigma_{22} - (x_2 - \mu_2)\sigma_{12} - (x_1-\mu_1)\sigma_{12}+(x_2 - \mu_2)\sigma_{11}] \left(\begin{array}{c} X_1 - \mu_1 \\ X_2 - \mu_2 \end{array}\right)^T \\ &= (x_1 - \mu_1)^2 \sigma_{22} - 2(x_1 - \mu_1)(x_2 - \mu_2)\sigma_{12} + (x_2 - \mu_2)^2 \sigma_{11} \\ \Rightarrow \frac{*}{\sigma_{11}\sigma_{22} - \sigma_{12}^2} &= \frac{(x_1 - \mu_1)^2}{\frac{\sigma_{11}\sigma_{22} - \sigma_{12}^2}{\sigma_{22}}} - \frac{2(x_1 - \mu_1)(x_2 - \mu_2)}{\frac{\sigma_{11}\sigma_{22} - \sigma{12}^2}{\sigma_{22}}} + \frac{(x_2 - \mu_2)^2}{\frac{\sigma_{11}\sigma_{22} - \sigma_{12}^2}{\sigma_{22}}} \\ \mathrm{Tome}\ \rho = \frac{\sigma_{12}}{\sqrt{\sigma_{11} \sigma_{22}}} \\ \\ A.\ \frac{\sigma_{11}\sigma_{22} - \sigma_{12}^2}{\sigma_{22}} &= \sigma_{11} - \frac{\sigma_{12}^2}{\sigma_{22}} \\ &= \sigma_{11} - \sigma_{11}\frac{\sigma_{12}^2}{\sigma_{11}\sigma_{22}} \\ &= \sigma_{11} - \sigma_{11} \rho^2 \\ &= \sigma_{11}(1-\rho^2)\\ \\ B.\ \frac{\sigma_{11} \sigma_{22} - \sigma_{12}^2}{\sigma_{11}} &= \sigma_{22}(1-\rho^2) \\ \\ C.\ \frac{\sigma_{11}\sigma_{22}}{\sigma_{12}} - \sigma_{12} &= \sigma_{12} \frac{\sigma_{11}\sigma_{22}}{\sigma_{12}^2} - \sigma_{12} \\ &= \sigma_{12}(\rho^{-2} - 1) = \sigma_{12} \left(\frac{1 - \rho^2}{\rho^2}\right) \\ \Rightarrow \frac{*}{\sigma_{11}\sigma_{22}-\sigma_{12}^2} &= \frac{(x_1-\mu_1)^2}{\sigma_{11}(1-\rho^2)} -2\rho^2\frac{(x_1 - \mu_1)(x_2 - \mu_2)}{\sigma_{12}(1-\rho^2)} + \frac{(x_2 - \mu_2)^2}{\sigma_{22}(1-\rho^2)} \end{aligned} \]

Além disso, \[ \sqrt{\sigma_{11}\sigma_{22} - \sigma_{12}^2} = \sqrt{\sigma_{11}\sigma_{22}-\rho^2\sigma_{11}\sigma_{22}} = \sqrt{\sigma_{11} \sigma_{22}} \sqrt{1-\rho^2} \]

Assim temos que \[ \begin{aligned} &f^{\boldsymbol{X}}(x_1, x_2) \\&= \frac{1}{2\pi \sqrt{\sigma_{11}, \sigma_{22}} \sqrt{1-\rho^2}} \mathrm{Exp} \left\{ -\frac{1}{2(1-\rho^2)} \left(\frac{(x_1-\mu_1)^2}{\sigma_{11}} -2\rho\frac{(x_1 - \mu_1)(x_2 - \mu_2)}{\sqrt{\sigma_{11}\sigma_{22}}} + \frac{(x_2 - \mu_2)^2}{\sigma_{22}} \right) \right\} \end{aligned} \]

38.2 Modelo estatístico

Dizemos que \(\boldsymbol{X} = \left(\begin{array}{c} X_1 \\ \vdots \\ X_d \end{array}\right)\) é um vetor aleatório populacional normal multivariado se \(\boldsymbol{X}_n \sim N_d(\boldsymbol{\mu}, \Sigma)\), em que o vetor de parâmetros é \[ \theta = \left[\begin{array}{c} \mu_1 \\ \vdots \\ \mu_d \\ \sigma_{11} \\ \vdots \\ \sigma_{d1} \\ \vdots \\ \sigma_{dd} \end{array}\right] = \left[\begin{array}{c} \boldsymbol{\mu} \\ \mathrm{vech}\Sigma \end{array}\right] = (\boldsymbol{\mu}^T, (\mathrm{vech}\Sigma)^T)^T \]

em que \(\mathrm{vech}\Sigma\) é o vetor coluna com os elementos da triangular superior da matriz removidos: \[ \mathrm{vech} \left(\begin{array}{cc}\sigma_{11} & \sigma_{12} \\ \sigma_{12} & \sigma_{22} \end{array}\right) = \left(\begin{array}{c} \sigma_{11} \\ \sigma_{12} \\ \sigma_{22} \end{array}\right) \]

38.3 Amostra aleatória de uma normal multivariada

Dizemos que \(\boldsymbol{X}_1,\dots, \boldsymbol{X}_n\) é uma amostra aleatória de \(\boldsymbol{X} \sim N_d(\boldsymbol{\mu}, \Sigma)\) se, e somente se, \(\boldsymbol{X}_1,\dots,\boldsymbol{X}_n\) são independentes e \(\boldsymbol{X}_i \sim N_d(\boldsymbol{\mu}, \Sigma), i = 1, \dots, n\).

Notação

\[ \boldsymbol{X}_n^* = (\boldsymbol{X}_1, \dots, \boldsymbol{X}_n)\ \mathrm{a.a.}\ \text{de}\ \boldsymbol{X} \sim N_d(\boldsymbol{\mu}, \Sigma) \]

38.4 Função de verossimilhança

Seja \(\boldsymbol{X}^*\) a.a. de \(\boldsymbol{X} \sim N_d(\boldsymbol{\mu}, \Sigma)\), em que \[ \begin{aligned} \theta &= (\boldsymbol{\mu}^T, \mathrm{vech}(\Sigma)^T)^T \\ \in \Theta &= \left\{(\boldsymbol{\mu}^T,\mathrm{vech}(\Sigma)^T) \in \mathbb{R}^d\times \mathbb{R}^{\frac{d(d+1)}{2}} : \Sigma\ \text{é positiva definida (p.d.)}\right\}\subseteq \mathbb{R}^{d + \frac{d(d+1)}{2}} \end{aligned} \]

A função de verossimilhança é: \[ \begin{aligned} \mathcal{L}_{\boldsymbol{x}^*}(\theta) &= \prod^n_{i=1} f_\theta^{\boldsymbol{X}}(\boldsymbol{x}_i) = \prod \left\{ \frac{1}{(2\pi)^{\frac{d}{2}}\lvert\Sigma\rvert^{\frac{1}{2}}} \mathrm{Exp}\left\{ -\frac{1}{2}(\boldsymbol{x}_i - \boldsymbol{\mu})^T \Sigma^{-1}(\boldsymbol{x}_i - \boldsymbol{\mu}) \right\} \right\} \\ &= \frac{1}{(2\pi)^{\frac{nd}{2}}\lvert\Sigma\rvert^{\frac{1}{2}}} \mathrm{Exp}\left\{ -\frac{1}{2}\underbracket{\sum^n(\boldsymbol{x}_i - \boldsymbol{\mu})^T \Sigma^{-1}(\boldsymbol{x}_i - \boldsymbol{\mu})}_{\dagger} \right\} \end{aligned} \]

Defina \(\bar{\boldsymbol{x}} = \frac{1}{n} \sum^n \boldsymbol{x}_i\), então \[ \begin{aligned} \dagger &= \sum^n (\boldsymbol{x}_i - \bar{\boldsymbol{x}} + \bar{\boldsymbol{x}} - \boldsymbol{\mu})^T \Sigma^{-1} (\boldsymbol{x}_i - \bar{\boldsymbol{x}} + \bar{\boldsymbol{x}} - \boldsymbol{\mu}) \\ &= \sum^n ((\boldsymbol{x}_i - \bar{\boldsymbol{x}}) - (\boldsymbol{\mu} - \bar{\boldsymbol{x}}))^T \Sigma^{-1} ((\boldsymbol{x}_i - \bar{\boldsymbol{x}}) - (\boldsymbol{\mu} - \bar{\boldsymbol{x}})) \\ &= \sum^n \left\{ (\boldsymbol{x} - \bar{\boldsymbol{x}})^T \Sigma^{-1} (\boldsymbol{x}_i - \bar{\boldsymbol{x}}) -\underbracket{(\boldsymbol{\mu} - \bar{\boldsymbol{x}})^T \Sigma^{-1} (\boldsymbol{x}_i - \bar{\boldsymbol{x}})}_{A} \right .\\ &\left.-\underbracket{(\boldsymbol{x}_i - \bar{\boldsymbol{x}})^T \Sigma^{-1} (\boldsymbol{\mu} - \bar{\boldsymbol{x}})}_{B} +(\boldsymbol{\mu} - \bar{\boldsymbol{x}})^T \Sigma^{-1} (\boldsymbol{\mu} - \bar{\boldsymbol{x}}) \right\} \end{aligned} \]

Como \(n\bar{\boldsymbol{x}} = \sum^n \boldsymbol{x}_i\), temos que

\[ \begin{cases} A = \sum^n (\boldsymbol{\mu} - \bar{\boldsymbol{x}})^T \Sigma^{-1} (\boldsymbol{x}_i - \bar{\boldsymbol{x}}) = (\boldsymbol{\mu} - \bar{\boldsymbol{x}})^T \Sigma^{-1} (\sum^n \boldsymbol{x}_i - \bar{\boldsymbol{x}}) = 0 \\ B = \sum^n (\boldsymbol{x}_i - \bar{\boldsymbol{x}})^T \Sigma^{-1} (\boldsymbol{\mu} - \bar{\boldsymbol{x}}) = (\sum^n \boldsymbol{x}_i - \bar{\boldsymbol{x}})^T \Sigma^{-1} (\boldsymbol{\mu} - \bar{\boldsymbol{x}}) = 0 \end{cases} \]

Portanto, \[ \begin{aligned} \dagger &= \sum^n (\boldsymbol{x} - \bar{\boldsymbol{x}})^T \Sigma^{-1} (\boldsymbol{x}_i - \bar{\boldsymbol{x}}) +n(\boldsymbol{\mu} - \bar{\boldsymbol{x}})^T \Sigma^{-1} (\boldsymbol{\mu} - \bar{\boldsymbol{x}}) \\ &= \sum^n \mathrm{tr}\{(\boldsymbol{x} - \bar{\boldsymbol{x}})^T \Sigma^{-1} (\boldsymbol{x}_i - \bar{\boldsymbol{x}})\} +n(\boldsymbol{\mu} - \bar{\boldsymbol{x}})^T \Sigma^{-1} (\boldsymbol{\mu} - \bar{\boldsymbol{x}}) \\ &\stackrel{\text{Prop. tr}}{=} \sum^n \mathrm{tr}\{\Sigma^{-1}(\boldsymbol{x} - \bar{\boldsymbol{x}})(\boldsymbol{x}_i - \bar{\boldsymbol{x}})^T\} +n(\boldsymbol{\mu} - \bar{\boldsymbol{x}})^T \Sigma^{-1} (\boldsymbol{\mu} - \bar{\boldsymbol{x}}) \\ \end{aligned} \]

Note que \(\sum^n \mathrm{tr}\{A \boldsymbol{y}_i\} = \mathrm{tr}\{A \sum \boldsymbol{y}_i\}\), usando essa propriedade, \[ \begin{aligned} \dagger &= n \mathrm{tr} \{\Sigma^{-1}S^2(\boldsymbol{x}_n^*)\} +n(\boldsymbol{\mu} - \bar{\boldsymbol{x}})^T \Sigma^{-1} (\boldsymbol{\mu} - \bar{\boldsymbol{x}}) \\ \end{aligned} \] em que \(S^2(\boldsymbol{x}_n^*) = \frac{1}{n} \sum^n (\boldsymbol{x}_i - \bar{\boldsymbol{x}})(\boldsymbol{x}_i - \bar{\boldsymbol{x}})^T\).

Dessa forma, podemos escrever a função de verossimilhança da seguinte forma:

\[ \mathcal{L}_{\boldsymbol{x}^*}(\theta) = \frac{1}{(2\pi)^{\frac{nd}{2}}\lvert\Sigma\rvert^{\frac{1}{2}}} \mathrm{Exp}\left\{ -\frac{n}{2}\left( \mathrm{tr}\{\Sigma^{-1}S^2(\boldsymbol{x}_n^*)\} +(\boldsymbol{\mu} - \bar{\boldsymbol{x}})^T \Sigma^{-1}(\boldsymbol{\mu} - \bar{\boldsymbol{x}})\right) \right\} \]

Pelo critério da fatoração de Neyman-Fisher, temos que \((\bar{\boldsymbol{X}}, S^2(\boldsymbol{X}_n^*))\) é uma estatística suficiente para o modelo normal multivariado. Como \(S^2(\boldsymbol{X}^*_n)\) é uma matriz simétrica, podemos considerar apenas \(\mathrm{vech}(S^2(\boldsymbol{X}_n^*))\)

Para maximizar a função de verossimilhança em relação a \(\boldsymbol{\mu}\), note que basta minimizar \[ (\boldsymbol{\mu} - \bar{\boldsymbol{x}})^T \Sigma^{-1}(\boldsymbol{\mu} - \bar{\boldsymbol{x}}) \] note que, como \(\Sigma\) é positiva definida (p.d.), \(\Sigma^{-1}\) também é p.d. Portanto, pela definição, \(\boldsymbol{y}^T \Sigma^{-1} \boldsymbol{y} > 0, \forall \boldsymbol{y} \in \mathbb{R}^d \setminus \{0\}\) e \((\boldsymbol{\mu} - \bar{\boldsymbol{x}})^T \Sigma^{-1}(\boldsymbol{\mu} - \bar{\boldsymbol{x}})\) é mínimo quando \(\boldsymbol{\mu} = \bar{\boldsymbol{x}}\).

Para encontrar \(\Sigma\) que maximiza a função de verossimilhança, substituímos \(\boldsymbol{\mu}\) por \(\bar{\boldsymbol{x}}\) e tentamos encontrar seu valor. Portanto, devemos maximizar \[ \frac{1}{(2\pi)^{\frac{nd}{2}}\lvert\Sigma\rvert^{\frac{1}{2}}} \mathrm{Exp}\left\{ -\frac{n}{2} \mathrm{tr}\{\Sigma^{-1}S^2(\boldsymbol{x}_n^*)\} \right\} \] em relação a \(\Sigma\). Aplicando \(\ln\), temos \[ -\frac{nd}{2} \ln(2\pi) - \frac{n}{2} \ln\lvert\Sigma\rvert - \frac{n}{2} \mathrm{tr}\{\Sigma^{-1}S^2(\boldsymbol{x}_n^*)\} = c - \frac{n}{2}\left( \ln\lvert\Sigma\rvert + \mathrm{tr}\{\Sigma^{-1}S^2(\boldsymbol{x}_n^*)\}\right) \]

Note que, para qualquer \(\lambda > 0\), temos \[ \lambda - \ln\lambda \geq 1. \]

Sejam \(\lambda_1, \dots, \lambda_d\) autovalores de uma matriz positiva definida \(M\). \[ \begin{aligned} \lambda_i &- \ln \lambda_i \geq 1 \forall i = 1, \dots, d \\ \Rightarrow &\sum^d \lambda_i - \sum^d \ln_i \geq d \\ \iff& \sum^d \lambda_i - \ln(\prod^d \lambda_i) \geq d \\ \Rightarrow &\mathrm{tr}\{M\} - \ln\lvert M\rvert \geq d \end{aligned} \] Tome \(M = \Sigma^{-1} S^(\boldsymbol{x}_n^*)\). Então, assumindo \(S^2(\boldsymbol{x}_n^*)\) positiva definida, \(M\) é positiva definida e \[ \begin{aligned} &\mathrm{tr}\{\Sigma^{-1}S^2(\boldsymbol{x}_n^*)\} - \ln\lvert\Sigma^{-1}S^2(\boldsymbol{x}_n^*)\rvert \geq d \\ \iff&\mathrm{tr}\{\Sigma^{-1}S^2(\boldsymbol{x}_n^*)\} - \ln(\lvert\Sigma^{-1}\rvert \lvert S^2(\boldsymbol{x}_n^*)\rvert) \geq d \\ \iff&\mathrm{tr}\{\Sigma^{-1}S^2(\boldsymbol{x}_n^*)\} + \ln\lvert\Sigma\rvert - \ln \lvert S^2(\boldsymbol{x}_n^*)\rvert \geq d \\ \iff&\mathrm{tr}\{\Sigma^{-1}S^2(\boldsymbol{x}_n^*)\} + \ln\lvert\Sigma\rvert \geq d + \ln \lvert S^2(\boldsymbol{x}_n^*)\rvert \end{aligned} \]

Note que a igualdade (o menor valor possível) é atingida quando \(\Sigma = S^2(\boldsymbol{x}_n^*)\) pois \[ \mathrm{tr}\{S^2(\boldsymbol{x}_n^*)^{-1}S^2(\boldsymbol{x}_n^*)\} = \mathrm{tr}\{I_d\} = d \] Logo, \[ \begin{aligned} \hat{\mu}_{\mathrm{MV}}(\boldsymbol{X}^*_n) &= \bar{\boldsymbol{X}} = \frac{1}{n} \sum^n_{i=1} \boldsymbol{X}_i \\ \hat{\Sigma}_{\mathrm{MV}}(\boldsymbol{X}^*_n) &= S^2(\boldsymbol{X}_n^*) = \frac{1}{n} \sum^n_{i=1} (\boldsymbol{X}_i - \bar{\boldsymbol{X}})(\boldsymbol{X}_i - \bar{\boldsymbol{X}})^T \end{aligned} \] são os estimadores de máxima verossimilhança para \(\mu\) e \(\Sigma\), respectivamente. Dessa forma, podemos estimar usando a propriedade de invariância qualquer quantidade de interesse \(g(\theta)\)

38.4.1 Exemplo

Seja \(\boldsymbol{X}_n^*\) a.a. de \(\boldsymbol{X} \sim N_2(\boldsymbol{\mu}, \Sigma)\). Encontre o EMV para \(g(\theta)\), em que

  1. \(g(\theta) = E_\theta(\boldsymbol{X})\)

  2. \(g(\theta) = \mathrm{Var}_\theta(\boldsymbol{X})\)

  3. \(g(\theta) = E_\theta(\boldsymbol{X})^T \mathrm{Var}_\theta(\boldsymbol{X})^{-1}E_\theta(\boldsymbol{X})\)

  4. \(g(\theta) = P_\theta(X_1 \geq 4 X_2), \boldsymbol{X} = (X_1, X_2)^T\)

  5. Considerando os dados observados abaixo, apresente as estimativas para \(g(\theta)\) dos itens anteriores:

38.4.1.1 Resposta

Já sabemos que os estimadores são dados por \[ \begin{aligned} \hat{\mu}_{\mathrm{MV}}(\boldsymbol{X}^*_n) &= \bar{\boldsymbol{X}} = \frac{1}{n} \sum^n_{i=1} \boldsymbol{X}_i \\ \hat{\Sigma}_{\mathrm{MV}}(\boldsymbol{X}^*_n) &= S^2(\boldsymbol{X}_n^*) = \frac{1}{n} \sum^n_{i=1} (\boldsymbol{X}_i - \bar{\boldsymbol{X}})(\boldsymbol{X}_i - \bar{\boldsymbol{X}})^T \\ &= \frac{1}{n} \sum \boldsymbol{X}_i\boldsymbol{X}^T - \bar{\boldsymbol{X}}\bar{\boldsymbol{X}}^T \end{aligned} \] e as estimativas por \[ \begin{aligned} \hat{\mu}_{\mathrm{MV}}(\boldsymbol{x}^*_n) &= \bar{\boldsymbol{x}} = \frac{1}{n} \sum^n_{i=1} \boldsymbol{x}_i = \frac{1}{n}\begin{pmatrix} \sum x_{1i} \\ \sum x_{2i} \end{pmatrix}\\ \hat{\Sigma}_{\mathrm{MV}}(\boldsymbol{x}^*_n) &= S^2(\boldsymbol{x}_n^*) = \frac{1}{n} \sum \boldsymbol{x}_i\boldsymbol{x}^T - \bar{\boldsymbol{x}}\bar{\boldsymbol{x}}^T\\ &=\frac{1}{n} \sum^n_{i=1}\begin{bmatrix} (x_{1i} - \bar{x}_1)^2 & (x_{1i} - \bar{x}_1)(x_{2i} - \bar{x}_2) \\ (x_{2i} - \bar{x}_2)(x_{1i} - \bar{x}_1) & (x_{2i} - \bar{x}_2)^2 \end{bmatrix} \end{aligned} \]

  1. \(P_\theta(X2 > 4X_1) = P_\theta(X_2 - 4X_1 > 0)\). Note que \[ X_2 - 4X_1 = B \boldsymbol{X} \] em que \(B = (-4, 1)\). Então, \[ X_2 - 4X_1 \sim N(B\boldsymbol{\mu}, B\Sigma B^T) \] Desenvolvendo, \[ \begin{aligned} B\boldsymbol{\mu} &= -4\mu_1 + \mu_2 = \mu_2 - 4\mu_1 \\ B\Sigma B^T &= (-4,1) \begin{bmatrix}\sigma_{11} & \sigma_{12} \\ \sigma_{21} & \sigma_{22} \end{bmatrix} \begin{pmatrix}-4 \\ 1 \end{pmatrix} \\ &\stackrel{\sigma_{12} = \sigma_{21}}{=} 16\sigma_{11} + \sigma_{22} - 8 \sigma_{12} \end{aligned} \]

Assim, \[ \begin{aligned} g(\theta) &= P_\theta(X_2 - 4 X_1 > 0) \\ &= P_\theta\left(\frac{(X_2 - 4X_1) - (\mu_2 -4\mu_1)}{\sqrt{16\sigma_{11} + \sigma_{22} - 8 \sigma_{12}}} > -\frac{\mu_2 -4\mu_1}{\sqrt{16\sigma_{11} + \sigma_{22} - 8 \sigma_{12}}}\right) \\ &= P\left( Z > -\frac{\mu_2 -4\mu_1}{\sqrt{16\sigma_{11} + \sigma_{22} - 8 \sigma_{12}}}\right) \\ &= 1 - \Phi\left(-\frac{\mu_2 -4\mu_1}{\sqrt{16\sigma_{11} + \sigma_{22} - 8 \sigma_{12}}}\right) \end{aligned} \\ \]

38.5 Aplicações da normal multivariada

38.5.1 Teorema do limite central para o EMV com variáveis reais

Seja \(\boldsymbol{X}_n\) a.a. de \(X\sim f_\theta, \theta \in \Theta \subseteq \mathbb{R}^P, p \in \mathbb{N}\). Se as condições de regularidade estiverem satisfeitas, então, \[ \sqrt{n} (\hat{\theta}_{\mathrm{MV}}(\boldsymbol{X}_n) - \theta) \stackrel{\mathcal{D}}{\rightarrow} N_p(\boldsymbol{0}, I_1(\theta)^{-1}), \] ou, de outra forma, \[ I_n(\theta)^{\frac{1}{2}}(\hat{\theta}_{\mathrm{MV}}(\boldsymbol{X}_n) - \theta) \stackrel{\mathcal{D}}{\rightarrow} N_p(\boldsymbol{0}, \boldsymbol{I}). \] Além disso, pelo Teorema de Slutsky, \[ I_n(\hat{\theta}_{\mathrm{MV}}(\boldsymbol{X}_n))^{\frac{1}{2}}(\hat{\theta}_{\mathrm{MV}}(\boldsymbol{X}_n) - \theta) \stackrel{\mathcal{D}}{\rightarrow} N_p(\boldsymbol{0}, \boldsymbol{I}) \] em que \(\boldsymbol{I}\) é a matriz identidade.

Notação

\[ \hat{\theta}_{\mathrm{MV}}(\boldsymbol{X}_n) \stackrel{a}{\approx} N_p(\theta, I_n(\theta)^{-1}) \]

Note que, se \(\theta = (\theta_1, \theta_2)^T\), então \[ \hat{\theta}_{\mathrm{MV}} = \begin{pmatrix}\hat{\theta}_{\mathrm{MV}}^{(1)} \\ \hat{\theta}_{\mathrm{MV}}^{(2)} \end{pmatrix} \stackrel{a}{\approx} N_2\left[\begin{pmatrix}\theta_1 \\ \theta_2 \end{pmatrix}; I_n(\theta)^{-1}\right], \] logo, \[ \begin{aligned} \hat{\theta}_{\mathrm{MV}}(\boldsymbol{X}_n)^{(i)} \stackrel{a}{\approx} N_p(\theta_i, V_i(\theta)), i = 1,2 \end{aligned} \] em que \(V_i(\theta)\) é o elemento \((i,i)\) da matriz \(I_n(\theta)^{-1}, i = 1, 2\).

Observação

Observe que \(V_i(\theta)\) pode depender do vetor “\(\theta\)” inteiro. Veja o caso normal: \[ \bar{X} \sim N_1\left(\mu,\frac{\sigma^2}{n}\right) \]

38.5.2 Teorema do limite central para o EMM com variáveis reais

Seja \(\boldsymbol{X}_n\) a.a. de \(X\sim f_\theta, \theta \in \Theta \subseteq \mathbb{R}^P, p \in \mathbb{N}\) tal que \[ E_\theta(|X|^k) < \infty, k = 1, \dots, 2p. \] Seja \(h(\theta) = \begin{bmatrix}\frac{\partial E_\theta(X)}{\partial \theta^T} \\ \vdots \\ \frac{\partial E_\theta(X^p)}{\partial \theta^T} \end{bmatrix}\) e considere que \(\det h(\theta) \neq 0\) e cada componente de \(h(\theta)\) é uma função contínua. Então, \[ \sqrt{n} (\hat{\theta}_{\mathrm{MM}}(\boldsymbol{X}_n) - \theta) \stackrel{\mathcal{D}}{\rightarrow} N_p(\boldsymbol{0}, V_\theta) \] em que \[ \begin{aligned} V_\theta &= h(\theta)^{-1} \mathrm{Var}_\theta(\boldsymbol{Y})[h(\theta)^{-1}]^T \\ \boldsymbol{Y} &= \begin{bmatrix} X \\ X^2 \\ \vdots \\ X^p \end{bmatrix} \end{aligned} \]

Notação

\[ \hat{\theta}_{\mathrm{MM}}(\boldsymbol{X}_n) \stackrel{a}{\approx} N_p\left(\theta, \frac{h(\theta)^{-1} \mathrm{Var}_\theta(\boldsymbol{Y})[h(\theta)^{-1}]^T}{n}\right) \]

Generalização do Método de Momentos

O método de momentos pode ser generalizado da seguinte forma:

Se \(E_\theta(|X|^k) < \infty\) para \(k \in \{k_1, k_2, \dots, k_p\} \subseteq N\) ou, se \(X\) for não negativa, \(k \in \{k_1, \dots, k_p\} \subseteq \mathrm{Q}\). O EMM é obtido igualando \[ E_\theta(X^k) = \frac{1}{n} \sum X_i^k, k \in \{k_1, \dots, k_n\} \] Seja \[ h(\theta) = \begin{bmatrix}\frac{\partial E_\theta(X^{k_1})}{\partial \theta^T} \\ \vdots \\ \frac{\partial E_\theta(X^{k_p})}{\partial \theta^T} \end{bmatrix} \]

Considere que

  1. \(E_\theta(|X|^{2k}) < \infty, \forall k \in \{k_1, \dots, k_n\}\)

  2. \(\det h(\theta) \neq 0\)

  3. Cada componente de \(h(\theta)\) é uma função contínua.

Então, \[\sqrt{n} (\hat{\theta}_{\mathrm{MM}}(\boldsymbol{X}_n) - \theta) \stackrel{\mathcal{D}}{\rightarrow} N_p(\boldsymbol{0}, V_\theta)\] em que \[ \begin{aligned} V_\theta &= h(\theta)^{-1} \mathrm{Var}_\theta(\boldsymbol{Y})[h(\theta)^{-1}]^T \\ \boldsymbol{Y} &= \begin{bmatrix} X^{k_1} \\ X^{k_2} \\ \vdots \\ X^{k_p} \end{bmatrix} \end{aligned} \]

38.5.3 Exemplos

38.5.3.1 Exemplo Beta

Seja \(\boldsymbol{X}_n\) a.a. de \(X \sim f_\theta, \theta = (\alpha, \beta) \in \Theta = \mathbb{R}^2_+\) tal que \[ f_\theta(x) = \begin{cases} \frac{1}{B(\alpha, \beta)} x^{\alpha-1}(1-x)^{\beta -1}, x \in (0,1) \\ 0, \mathrm{c.c.} \end{cases} \]

Encontre o EMV, EMM e suas distribuições assinstóticas para \(\theta\).

38.5.3.1.1 Resposta

Note que \(B(\alpha, \beta) = \frac{\Gamma(\alpha) \Gamma(\beta)}{\Gamma(\alpha + \beta)}\) \[ \begin{aligned} \mathcal{L}_{\boldsymbol{x}_n}(\theta) &= \frac{1}{B(\alpha, \beta)^n} (\prod x_i)^{\alpha-1} (\prod(1-x_i))^{\beta - 1} \\ \mathcal{l}_{\boldsymbol{x}_n}(\theta) &= -n \ln \Gamma(\alpha) - n\ln\Gamma(\beta) + n \ln \Gamma(\alpha+\beta) \\ &+ (\alpha-1)\sum \ln x_i + (\beta -1)\sum \ln (1 - x_i), \\ \Rightarrow \frac{\partial \mathcal{l}_{\boldsymbol{x}_n}}{\partial \alpha}(\theta)& = -n \psi_1(\alpha) + n \psi_1(\alpha + \beta) +\sum \ln x_i; \\ \Rightarrow \frac{\partial \mathcal{l}_{\boldsymbol{x}_n}}{\partial \beta}(\theta)& = -n \psi_1(\beta) + n \psi_1(\alpha + \beta) +\sum \ln (1-x_i) \end{aligned} \] em que \[ \psi_k(a) = \frac{\partial^k \ln \Gamma(a)}{\partial a^k} \] continuando, \[ \begin{aligned} \Rightarrow \frac{\partial^2 \mathcal{l}_{\boldsymbol{x}_n}}{\partial \alpha^2}(\theta) = -n \psi_2(\alpha) + n \psi_2(\alpha + \beta); \\ \Rightarrow \frac{\partial^2 \mathcal{l}_{\boldsymbol{x}_n}}{\partial \beta^2}(\theta) = -n \psi_2(\beta) + n \psi_2(\alpha + \beta); \\ \Rightarrow \frac{\partial^2 \mathcal{l}_{\boldsymbol{x}_n}}{\partial \beta \partial \alpha}(\theta) = n \psi_2(\alpha + \beta). \\ \\ \Rightarrow \frac{\partial^2 \mathcal{l}_{\boldsymbol{x}_n}}{\partial \theta \partial \theta^T}(\theta) = -n \begin{bmatrix} \psi_2(\alpha) - \psi_2(\alpha + \beta) & -\psi_2(\alpha+\beta) \\ -\psi_2(\alpha+\beta) & \psi_2(\beta) - \psi_2(\alpha+\beta) \end{bmatrix} \\ \Rightarrow I_n(\theta) = n \begin{bmatrix} \psi_2(\alpha) - \psi_2(\alpha + \beta) & -\psi_2(\alpha+\beta) \\ -\psi_2(\alpha+\beta) & \psi_2(\beta) - \psi_2(\alpha+\beta) \end{bmatrix}. \end{aligned} \]

A estimativa de MV pode ser obtida numericamente via algoritmo de Newton-Raphson. Neste caso, o método é equivalente ao de escore de Fisher: \[ \hat{\theta}^{(j+1)} = \hat{\theta}^{(j)} + I_n(\hat{\theta}^{(j)})^{-1} U_n(\boldsymbol{x}_n, \hat{\theta}^{(j)}) \] em que \[ \begin{aligned} U_n(\boldsymbol{x}_n, \theta) = \begin{bmatrix} -n \psi_1(\alpha) + n \psi_1(\alpha + \beta) +\sum \ln x_i \\ -n \psi_1(\beta) + n \psi_1(\alpha + \beta) +\sum \ln(1-x_i) \end{bmatrix} \end{aligned} \]

Além disso, \[ \sqrt{n}(\hat{\theta}_{\mathrm{MV}} - \theta) \stackrel{\mathcal{D}}{\rightarrow} N_2(\boldsymbol{0}, I_1(\theta)^{-1}) \] em que \[ \begin{aligned} I_1(\theta)^{-1} &= \frac{1}{(\psi_2(\alpha) - \psi_2(\alpha+\beta))(\psi_2(\beta)-\psi_2(\alpha+\beta))-\psi_2(\alpha+\beta)^2} \\ &\cdot \begin{bmatrix} \psi_2(\alpha) - \psi_2(\alpha + \beta) & \psi_2(\alpha+\beta) \\ \psi_2(\alpha+\beta) & \psi_2(\beta) - \psi_2(\alpha+\beta) \end{bmatrix}. \end{aligned} \]