Parameters
Support: \(\frac{x - \mu}{\sigma} \ge 1\)
PDF: \[ f(x, b, \mu, \sigma) = \frac{b\sigma^b}{\left(x - \mu\right)^{b+1}} \] log PDF: \[ \log f(x, b, \mu, \sigma) = \log(b) + b\log(\sigma) - (b+1)\log(x - \mu) \] The log-likelihood function: \begin{equation} \ell(\textbf{x}, b, \mu, \sigma) = N \log(b) + N b \log(\sigma) - (b+1)\sum_{i=1}^{N} \log(x_i - \mu) \label{eq:pareto:ll} \end{equation} The first order derivative with respect to \(b\) is \begin{equation} \frac{\partial \ell}{\partial b}(\textbf{x}, b, \mu, \sigma) = \frac{N}{b} + N\log(\sigma) - \sum_{i=1}^{N} \log(x_i - \mu) \label{eq:pareto:bderiv} \end{equation}
Without loss of generality, assume that \(x_1\) is the smallest element in \(\textbf{x}\). The condition that each data point remain in the support can be written \begin{equation} x_1 \ge \mu + \sigma \label{eq:pareto:support} \end{equation}
The parameter \(\sigma\) appears in \(\ell(\textbf{x}, a, \mu, \sigma)\) in just one place, in the term \(N b \log(\sigma)\). \(N b > 0\), so we can make the log-likelihood as large as we like by making \(\sigma\) large. But we have to ensure that each point remains in the support, so the estimate for \( \sigma \) is determined by the boundary condition \eqref{eq:pareto:support}. In particular, if \(\mu\) is fixed, then \(\sigma = x_1 - \mu\). The location parameter \(\mu\) appears in the sum in \eqref{eq:pareto:ll}. To maximize the log-likelihood, we should make \(\mu\) as large as possible, subject to the support constraint \eqref{eq:pareto:support}. So, if either \(\mu\) or \(\sigma\) are free, the following constraint must hold at the maximum likelihood estimate: \begin{equation} \sigma + \mu = x_1 \label{eq:pareto:constraint} \end{equation} for the MLE. When both \(\mu\) and \(\sigma\) are free, we can use \eqref{eq:pareto:constraint} to eliminate \(\mu\) from the log-likelihood function, resulting in the reduced log-likelihood function: \begin{equation} \tilde{\ell}(\textbf{x}, b, \sigma) = N \log(b) + \left((N - 1) b - 1 \right) \log(\sigma) - (b+1)\sum_{i=2}^{N} \log(x_i - x_1 + \sigma) \label{eq:pareto:llreduced} \end{equation} The first order conditions for the reduced log-likelihood function are \begin{equation} \frac{\partial \tilde{\ell}}{\partial b}(\textbf{x}, b, \sigma) = \frac{N}{b} + (N - 1)\log(\sigma) - \sum_{i=2}^{N}\log(x_i - x_1 + \sigma) = 0 \label{eq:pareto:bderiv0} \end{equation} and \begin{equation} \frac{\partial \tilde{\ell}}{\partial \sigma}(\textbf{x}, b, \sigma) = \frac{(N - 1) b - 1}{\sigma} - (b + 1)\sum_{i=2}^{N}\frac{1}{x_i - x_1 + \sigma} = 0 \label{eq:pareto:sigmaderiv0} \end{equation}
Setting \eqref{eq:pareto:bderiv} equal to zero gives \begin{equation} b = \frac{N}{-N\log(\sigma) + \sum_{i=1}^{N} \log(x_i - \mu) } = \frac{1}{-\log(\sigma) + \frac{1}{N}\sum_{i=1}^{N} \log(x_i - \mu) } \label{eq:pareto:b} \end{equation}
To compute the MLE for the Pareto distribution, we have the following procedure.
Parameters
Support: \(\mu < x \le \mu + \sigma\)
PDF: \[ f(x, a, \mu, \sigma) = \frac{a}{\sigma}\left(\frac{x - \mu}{\sigma}\right)^{a-1} = \frac{a}{\sigma^a}\left(x - \mu\right)^{a-1} \] log PDF: \[ \log f(x, a, \mu, \sigma) = \log a - a\log\sigma + (a - 1)\log(x - \mu) \] The likelihood function for the vector \(\textbf{x} = \{x_1, x_2, \ldots, x_N\}\): \[ L(\textbf{x}, a, \mu, \sigma) = \prod_{i=1}^{N} \frac{a}{\sigma^a}\left(x_i - \mu\right)^{a-1} = \frac{a^{N}}{\sigma^{Na}}\prod_{i=1}^{N} \left(x_i - \mu\right)^{a-1} \] The log-likelihood function: \[ \ell(\textbf{x}, a, \mu, \sigma) = N\log a - N a \log\sigma + (a - 1)\sum_{i=1}^{N}\log(x_i - \mu) \]
The power law distribution has some technical issues that might require that we impose additional constraints on the inputs to the MLE fit procedure.
Note that as \(\mu\) varies, the only term in \(\ell(\textbf{x}, a, \mu, \sigma)\) that changes is \((a - 1)\sum_{i=1}^{N}\log(x_i - \mu)\).
If \(a > 1\), decreasing \(\mu\) causes that term to increase, so for the MLE, make \(\mu\) as small as possible. For each \(x_{i}\) to be in the support of the distribution, we require \(x_{\max} \le \mu + \sigma\), or \(\mu \ge x_{\max} - \sigma\), we have the constraint \[ \mu = x_{\max} - \sigma \]
If \(0 < a < 1\), the sign of that term changes, so for the MLE, we want to make \(\mu\) as large as possible. This leads to \[ \mu = x_{\min} \]
Note that as \(\sigma\) varies, the only term in \(\ell(\textbf{x}, a, \mu, \sigma)\) that changes is \(-N a \log \sigma\). To maximize this term, make \(\sigma\) as small as possible. To ensure that each \(x_{i}\) is in the support of the distribution, the smallest that \(\sigma\) can be is \(x_{\max} - x_{\min}\), but the general constraint is that \[ \sigma = x_{\max} - \mu \]
We can find the MLE for \(a\) through the first order condition for the extremum of \(\ell(\textbf{x}, a, \mu, \sigma)\). We have \[ \frac{\partial \ell}{\partial a} = \frac{N}{a} + \sum_{i=1}^{N} \log\left(\frac{x_i - \mu}{\sigma}\right) \] By setting \(\frac{\partial \ell}{\partial a} = 0\), we obtain \[ a = \frac{-N}{\sum_{i=1}^{N} \log\left(\frac{x_i - \mu}{\sigma}\right)} \]
Suppose we know a priori that \(a > 1\).
If we know a priori that \(0 < a < 1\), a problem arises. The discussion of \(\sigma\) still applies: if \(\sigma > x_{\max} - \mu\) (and \(\mu < x_{\min}\)), we can increase the likelihood by decreasing \(\sigma\). So, for a fixed \(\mu < x_{\min}\), the MLE for \(\sigma\) is \(\sigma = x_{\max} - \mu\).
The earlier discussion of \(\mu\) above shows that, if \(\mu < x_{\min}\), we can increase the likelihood by increasing \(\mu\). That suggests that the MLE for \(\mu\) is \(\mu = x_{\min}\).
So it looks like the MLE for \(\mu\) and \(\sigma\) is \[ \mu = x_{\min} \] \[ \sigma = x_{\max} - x_{\min} \] The problem with this solution is that setting \(\mu = x_{\min}\) means that the term \((x_i - \mu)\) in the likelihood function is zero when \(x_i\) is \(x_{\min}\). Then \((x_i - \mu)^{a - 1}\) "blows up" (i.e. the PDF blows up at \(x = x_{\min})\). Stated more carefully, the likelihood function diverges to infinity as \(\mu\) approaches \(x_{\min}\). Should we then just accept this as the MLE? The problem then is that it leaves \(a\) undetermined, because the blow up happens for any \(a\) in the interval \(0 < a < 1\).
Parameters
The PDF: \[f(x, \mu, \sigma) = \frac{1}{\sigma}\left(\frac{x-\mu}{\sigma}\right) \exp\left(-\frac{1}{2}\left(\frac{x - \mu}{\sigma}\right)^{2}\right) \] The likelihood function for the vector \(\textbf{x} = \{x_1, x_2, \ldots, x_N\} \): \[ L(\textbf{x}, \mu, \sigma) = \prod_{i=1}^{N}\frac{1}{\sigma}\left(\frac{x_i-\mu}{\sigma}\right) \exp\left(-\frac{1}{2}\left(\frac{x_i - \mu}{\sigma}\right)^{2}\right) \] The log-likelihood function: \[ \begin{split} \ell(\textbf{x}, \mu, \sigma) & = \sum_{i=1}^{N} \left[ -2\log\sigma + \log(x_i-\mu) - \frac{1}{2}\left(\frac{x_i - \mu}{\sigma}\right)^{2} \right] \\ & = -2N\log\sigma + \sum_{i=1}^{N} \left[ \log(x_i-\mu) - \frac{1}{2}\left(\frac{x_i - \mu}{\sigma}\right)^{2} \right] \end{split} \] Equations for the critical points: \begin{equation} \frac{\partial \ell}{\partial \sigma} = \frac{-2N}{\sigma} + \frac{1}{\sigma^3}\sum_{i=1}^{N} (x_i - \mu)^2 \end{equation} Setting \(\frac{\partial \ell}{\partial \sigma} = 0\) gives \begin{equation} \sigma^2 = \frac{1}{2N}\sum_{i=1}^{N} (x_i - \mu)^2 \end{equation} If the location parameter \(\mu\) is fixed, we're done. If \(\mu\) is not fixed, we need \(\frac{\partial \ell}{\partial \mu} \) \begin{equation} \frac{\partial \ell}{\partial \mu} = \sum_{i=1}^{N} \left[ \frac{-1}{x_i - \mu} + \frac{x_i - \mu}{\sigma^2} \right] \end{equation} Setting \(\frac{\partial \ell}{\partial \mu} = 0\) gives \begin{equation} \sum_{i=1}^{N}(x_i - \mu) - \sigma^2 \sum_{i=1}^{N}\frac{1}{x_i - \mu} = 0 \end{equation} With neither \(\mu\) nor \(\sigma\) fixed, we have two equations to solve simultaneously. There isn't an explicit solution, but we can use the expression for \(\sigma^2\) to reduce the problem to a single equation for \(\mu\) that must be solved numerically: \begin{equation} \sum_{i=1}^{N}(x_i - \mu) - \left(\frac{1}{2N}\sum_{i=1}^{N} (x_i - \mu)^2\right) \sum_{i=1}^{N}\frac{1}{x_i - \mu} = 0 \end{equation}