34 Método de Newton-Raphson

using RCall, PythonCall

Nem sempre existem soluções fechadas para estimadores de máxima verossimilhança ou obtidos pelo método de momentos. Por exemplo, \(U_n(\boldsymbol{X}_n, \theta) = 0\) não tem solução fechada para o EMV e \(E_\theta(X^k) \frac{1}{n} \sum X_i^k\) para o EMM.

Nos dois casos, estamos interessados em encontrar os valores de “\(\theta\)” que “zeram” uma função \(G(\theta)\),

\(G(\theta) = U_n(\boldsymbol{X}_n,\theta)\)

\[ \begin{aligned} 2.& \ \ \ \ \ \ \ \ \ \ \ \ G(\theta) = \left( \begin{array}{c} E_\theta(X) - \frac{1}{n} \sum X_i \\ \vdots \\ E_\theta(X^p) - \frac{1}{n} \sum X_i^p \end{array}\right) \end{aligned} \]

Consideraremos a seguir apenas o caso \(p=1\)

Seja \(G : \Theta \rightarrow \mathbb{R}\) uma função contínua e diferenciável, com derivadas contínuas, tal que \(G'(\theta) \neq 0,\forall \theta \in \Theta\). Temos como objetivo encontrar \(\hat{\theta} \in \Theta\) tal que \(G(\hat{\theta}) = 0\).

Inicia-se o processo com um valor qualquer \(\theta_0 \in \Theta\) e calculamos o valor seguinte por meio de: \[ \theta^{(j)} = \theta^{(j-1)} - \frac{G(\theta^{(j-1)})}{G'(\theta^{(j-1)})}, j = 1, 2, \dots \]

Continua-se até que \(\lvert \theta^{(j)} - \theta^{(j-1)}\rvert \leq \epsilon\) em que \(\epsilon\) é um valor de erro pequeno fixado.

34.1 Exemplo

Quando tentamos encontrar o EMV para \(\theta\) de \(X \sim f_\theta, \theta \in \Theta\) com \[ f_\theta(x) = \begin{cases} \frac{1}{\Gamma(\theta)} x^{\theta-1} \mathrm{e}^{-x}, & x > 0 \\ 0, & \mathrm{c.c.} \end{cases} \] não conseguimos finalizar a expressão. Vamos relembrar o escore da amostra igualado a zero:

\[ U_n(\boldsymbol{x}_n,\theta) = \sum \ln x_i -\frac{n}{\Gamma(\theta)}\Gamma'(\theta) + \sum \ln x_i = 0 \]

Considere os \(n=7\) valores observados, \(\boldsymbol{x}_7 = (3.1, 4.2, 5.7, 2.3, 7.7, 5.1, 3.5)\) \[ \Rightarrow U_n(\boldsymbol{x}_n,\theta) = \underbracket{\frac{7}{\Gamma(\theta)}\Gamma'(\theta)}_{\frac{\partial \ln}{\partial \theta} \Gamma(\theta)} +\sum \ln x_i = 0 \]

Utilizando \(\theta^{(0)} = 1\), \[ \begin{aligned} \theta^{(j)} &= \theta^{(j-1)} - \frac{U_n(\boldsymbol{x}_n,\theta^{(j-1)})}{U'_n(\boldsymbol{x}_n, \theta^{(j-1)})} \\ &= \theta^{(j-1)} - \frac{-7 \psi_1(\theta^{(j-1)})+ 7 \sum \ln x_i}{-7\psi_2(\theta^{(j-1)})} \\ &= \theta^{(j-1)} - \frac{-\psi_1(\theta^{(j-1)})+ \sum \ln x_i}{-\psi_2(\theta^{(j-1)})} \end{aligned} \]

em que \(\psi_j(\theta) = \frac{\partial^j \ln \Gamma(\theta)}{\partial \theta^j}\)

34.2 Algoritmos e outros métodos

Seja \(\boldsymbol{X}_n\) a.a. de \(X \sim \Gamma(\theta, 1)\) (exemplo anterior). Ainda considerando os mesmos valores amostrados, \(\boldsymbol{x}_7 = (3.1, 4.2, 5.7, 2.3, 7.7, 5.1, 3.5)\), utilize os métodos numéricos para encontrar a estimativa de máxima verossimilhança.

34.2.1 Algoritmo por Newton-Raphson

Considere um erro \(\epsilon = 10^{-5}\) e \(\hat{\theta}^{(0)} = 1.\) \[ \theta^{(j+1)} = \theta^{(j)} - \frac{U_n(\boldsymbol{x}_n,\theta^{(j)})}{U'_n(\boldsymbol{x}_n, \theta^{(j)})} \] em que \(U_n(\boldsymbol{x}_n,\theta) = -n\psi_1(\theta^{(j-1)})+ \sum \ln x_i\) e \(U_n'(\boldsymbol{x}_n,\theta) = -n\psi_2(\theta^{(j-1)})\) com \(\psi_j(\theta) = \frac{\partial^j \ln \Gamma(\theta)}{\partial \theta^j}\) e \(n=7\).

Logo, \[ \theta^{(j+1)} = \theta^{(j)} - \frac{U_n(\boldsymbol{x}_n,\theta^{(j)})}{U'_n(\boldsymbol{x}_n, \theta^{(j)})} \]

using SpecialFunctions # Funções di e trigamma (derivadas do log da gama)


function main()

    x = [3.1, 4.2, 5.7, 2.3, 7.7, 5.1, 3.5]
    n = length(x)

    theta::Vector{Float64} = []
    theta0 = 1
    append!(theta, theta0)


    erromax = 10^(-5)
    erro = Inf
    i = 1
    # iteracoesMax = 6 # Podemos também definir apenas um erro máximo
    while erro > erromax # && i < iteracoesMax
        append!(
            theta, theta[i] - (sum(log.(x)) - n * digamma(theta[i])) /
                (-n * trigamma(theta[i]))
        )
        erro = abs(theta[i + 1] - theta[i])
        println("Erro na iteração $i: $erro")
        i += 1
    end
    println("Theta final: $(theta[length(theta)])")
    return println("Total de iterações: $i")

end

main()

Erro na iteração 1: 1.2248511275743903
Erro na iteração 2: 1.5558203731733413
Erro na iteração 3: 0.8122385697386227
Erro na iteração 4: 0.10638335531193821
Erro na iteração 5: 0.0013809916201203976
Erro na iteração 6: 2.2502724394968254e-7
Theta final: 4.700674642445657
Total de iterações: 7

print("Teste Python")
print(1 + 1)

print("Teste Python")
print(1 + 1)

Teste Python
2

cat("Teste R")
dados <- c(1, 2, 3, 4, 5)
plot(dados)

cat("Teste R")
dados <- c(1, 2, 3, 4, 5)
plot(dados)

Teste R

Note que, se \(U'_n(\pmb{x}_n, \hat{\theta}^{(j+1)}) < 0\), a estimativa \(\hat{\theta}^{(j+1)}\) será a de MV.

Seja \(\boldsymbol{X}_n\) a.a. de \(X\sim \mathrm{Exp}(\theta), \theta \in \Theta = (0,\infty)\). Encontre a estimativa de MV para \(\theta\) pelo método de Newton-Raphson considerando \(\boldsymbol{x}_n = (2, 2.5, 3, 3.5, 2, 1.5)\) e \(\hat{\theta}^{(0)} = 0.5\) Com erro máximo \(\epsilon = 10^{-5}\).

Note que \[ \begin{aligned} f_\theta(x) &= \theta \mathrm{e}^{-\theta x} \\ \mathcal{L}_{\boldsymbol{x}_n}(\theta) &= \theta^n \mathrm{e}^{-\theta\sum x_i} \\ \ln \mathcal{L}_{\boldsymbol{x}_n}(\theta) &= n \ln\theta -\theta\sum x_i \\ U_n(\boldsymbol{x}_n,\theta) &= \frac{n}{\theta} - \sum x_i \\ U_n'(\boldsymbol{x}_n,\theta) &= -\frac{n}{\theta^2} \end{aligned} \]

Logo \[ \begin{aligned} \hat{\theta}^{(j+1)} &= \hat{\theta}^{(j)} - \frac{\frac{n}{\hat{\theta}^{(j)}}- \sum x_i}{-\frac{n}{(\hat{\theta}^{(j)})^2}} \\ &= \hat{\theta}^{(j)} - \frac{[n - \hat{\theta}^{(j)}\sum x_i]\hat{\theta}^{(j)}}{-n} \end{aligned} \]

function main()

    x = [2, 2.5, 3, 3.5, 2, 1.5]
    n = length(x)

    theta::Vector{Float64} = []
    theta0 = 0.5
    append!(theta, theta0)


    erromax = 10^(-5)
    erro = Inf
    i = 1
    while erro > erromax
        append!(theta, theta[i] - ((n - sum(x) * theta[i]) * theta[i]) / (-n))
        erro = abs(theta[i + 1] - theta[i])
        println("Erro na iteração $i: $erro")
        i += 1
    end
    println("Theta final: $(theta[length(theta)])")
    return println("Total de iterações: $i")

end

main()

Erro na iteração 1: 0.10416666666666669
Erro na iteração 2: 0.01718026620370372
Erro na iteração 3: 0.0007780354808986645
Erro na iteração 4: 1.468425129103057e-6
Theta final: 0.4137931034430648
Total de iterações: 5

Note que se \(\Theta \subseteq \mathbb{R}^P\), o método é dado por \[ \theta^{(j+1)} = \theta^{(j)} - \left[U_n'(\boldsymbol{x}_n,\theta^{(j)})\right]^{-1} \cdot U_n(\boldsymbol{x}_n, \hat{\theta}^{(j)}), j = 0, 1, \dots \]

34.2.2 Método de Scoring de Fisher

Para \(\Theta \subseteq \mathbb{R}\) \[ \theta^{(j+1)} = \theta^{(j)} + \frac{U_n(\boldsymbol{x}_n,\theta^{(j)})}{I_n(\theta^{(j)})} \] Para \(\Theta \subseteq \mathbb{R}^P\) \[ \theta^{(j+1)} = \theta^{(j)} + \left[I_n(\boldsymbol{x}_n,\theta^{(j)})\right]^{-1} \cdot U_n(\boldsymbol{x}_n, \hat{\theta}^{(j)}), j = 0, 1, \dots \]

34.2.3 Método de descida do gradiente

\[ \theta^{(j+1)} = \theta^{(j)} + \delta \cdot U_n(\boldsymbol{x}_n, \hat{\theta}^{(j)}), j = 0, 1, \dots \] em que \(\delta \in (0,1)\) é a taxa de “aprendizado”.

Para o exemplo da Gama,

using SpecialFunctions # Funções di e trigamma (derivadas do log da gama)


function main()

    x = [3.1, 4.2, 5.7, 2.3, 7.7, 5.1, 3.5]
    n = length(x)

    theta::Vector{Float64} = []
    theta0 = 1
    append!(theta, theta0)
    delta = 0.01


    erromax = 10^(-5)
    erro = Inf
    i = 1
    while erro > erromax
        append!(theta, theta[i] + delta * (sum(log.(x)) - n * digamma(theta[i])))
        erro = abs(theta[i + 1] - theta[i])
        if i % 100 == 0
            println("Erro na iteração $i: $erro")
        end
        i += 1
    end
    println("Theta final: $(theta[end])")
    return println("Total de iterações: $i")

end

main()

Erro na iteração 100: 0.007983394465004956
Erro na iteração 200: 0.0013675177845966502
Erro na iteração 300: 0.0002527165874308679
Erro na iteração 400: 4.731072948604975e-5
Theta final: 4.700082915774669
Total de iterações: 494

34.2.4 Descida estocástica do gradiente

\[ \theta^{(j+1, i)} = \theta^{(j, i)} + \delta \cdot U(x_i, \hat{\theta}^{(j, i)}), j = 0, 1, \dots, i \in \{1,\dots,n\} \]

Em que \(U(x, \hat{\theta}^{(j)})\) é a função escore para uma observação. Usaremos o mesmo modelo gama.

using SpecialFunctions, Distributions, Plots

function descida_estocastica()

    x = [3.1, 4.2, 5.7, 2.3, 7.7, 5.1, 3.5]
    n = length(x)

    theta::Vector{Float64} = []
    append!(theta, 1)
    delta = 0.0001


    erro = Inf
    erromax = 10^(-5)
    j = 1
    epoca = 1
    while erro > erromax
        for i in 1:n
            append!(theta, theta[j] + delta * (log(x[i]) - digamma(theta[j])))
            erro = abs(theta[j + 1] - theta[j])
            j += 1
        end
        epoca += 1
        if epoca % 500 == 0
            println("Erro na época $epoca: $erro")
        end
    end
    epoca -= 1
    println("Theta final: $(theta[end])")
    println("Total de epocas: $epoca \n \n")

    println("Verificando se é máximo local")
    println("-n * trigamma(theta calculado) = $(-n * trigamma(theta[end]))")

    U(y) = sum(log.(x)) - n * digamma(y)
    p = plot(U, xlims = (0.1, 100), label = "U")
    vline!(theta[1:4000:end], label = "(Alguns dos) thetas calculados")
    hline!([0], label = "", color = :black)
    return display(p)
end

descida_estocastica()

Erro na época 500: 0.00011562567419964864
Erro na época 1000: 8.455622767256088e-5
Erro na época 1500: 6.523785153733641e-5
Erro na época 2000: 5.164846886707153e-5
Erro na época 2500: 4.1416753598255696e-5
Erro na época 3000: 3.337361869748534e-5
Erro na época 3500: 2.685988163664277e-5
Erro na época 4000: 2.146877836084471e-5
Erro na época 4500: 1.6932425193516565e-5
Erro na época 5000: 1.306548867541224e-5
Theta final: 3.6540367477421754
Total de epocas: 5457 
 

Verificando se é máximo local
-n * trigamma(theta calculado) = -2.201394773690926

Note que esse método é falível. Caímos em um valor que não é máximo global e diferente dos obtidos pelos outros métodos. Aumentaremos o valor de \(\delta\) e adicionaremos um máximo de 5000 épocas. Também escolheremos um valor mais alto para \(\theta_0\), como \(4\).

using SpecialFunctions, Distributions, Plots

function descida_estocastica_limitada()

    x = [3.1, 4.2, 5.7, 2.3, 7.7, 5.1, 3.5]
    n = length(x)

    theta::Vector{Float64} = []
    append!(theta, 4)
    delta = 0.01


    erro = Inf
    erromax = 10^(-5)
    j = 1
    epoca = 1
    while erro > erromax && epoca <= 5000
        for i in 1:n
            append!(theta, theta[j] + delta * (log(x[i]) - digamma(theta[j])))
            erro = abs(theta[j + 1] - theta[j])
            j += 1
        end
        epoca += 1
        if epoca % 1000 == 0
            println("Erro na época $epoca: $erro")
        end
    end
    epoca -= 1
    println("Theta final: $(theta[end])")
    println("Total de epocas: $epoca \n \n")

    println("Verificando se é máximo local")
    println("-n * trigamma(theta calculado) = $(-n * trigamma(theta[end]))")

    U(y) = sum(log.(x)) - n * digamma(y)
    p = plot(U, xlims = (0.1, 100), label = "U")
    vline!(theta[1:500:end], label = "(Alguns dos) thetas calculados")
    hline!([0], label = "", color = :black)
    return display(p)
end

descida_estocastica_limitada()

Erro na época 1000: 0.0018561619919994499
Erro na época 2000: 0.001856162087572777
Erro na época 3000: 0.001856162087572777
Erro na época 4000: 0.001856162087572777
Erro na época 5000: 0.001856162087572777
Theta final: 4.70217698128991
Total de epocas: 5000 
 

Verificando se é máximo local
-n * trigamma(theta calculado) = -1.6580912861780042