References
Shape parameters are \(a\) and \(b\), which define the truncation interval \([a, b]\).
We assume \(a \lt b\).
Let \(\phi(x)\) and \(\Phi(x)\) the PDF and CDF, respectively, of the standard normal distribution. Let \(\bar{\Phi}(x)\) be the complementary CDF of the standard normal function. This function, also known as the survival function, is \(1 - \Phi(x)\).
For future reference, \begin{equation} \phi(x) = \frac{1}{\sqrt{2\pi}}e^{-\frac{x^2}{2}} \label{eq:phi} \end{equation} \begin{align} \Phi(x) &= \frac{1}{2} \left[1 + \textrm{erf}\left(\frac{x}{\sqrt{2}}\right)\right] \\ &= 1 - \frac{1}{2}\textrm{erfc}\left(\frac{x}{\sqrt{2}}\right) \end{align} \begin{align} \bar{\Phi}(x) &= \frac{1}{2} \left[1 - \textrm{erf}\left(\frac{x}{\sqrt{2}}\right)\right] \\ &= \frac{1}{2}\textrm{erfc}\left(\frac{x}{\sqrt{2}}\right) \end{align} where \(\textrm{erf}(x)\) is the error function, and \(\textrm{erfc}\) is the complementary error function, \(\textrm{erfc}(x) = 1 - \textrm{erf}(x)\). We will also use the scaled complementary error function \(\textrm{erfcx}(x)\), \begin{equation} \textrm{erfcx}(x) = e^{x^2}\textrm{erfc}(x) \end{equation}
We will frequently use the expression \(\Phi(b) - \Phi(a)\) (and several other mathematically equivalent expressions), so for convenience, define \begin{equation} \Delta(a, b) = \Phi(b) - \Phi(a) = \bar{\Phi}(a) - \bar{\Phi}(b) \end{equation} \(\Delta(a, b)\) can be expressed many ways: \begin{align} \Delta(a, b) &= \frac{1}{2}\left(\textrm{erf}\left(\frac{b}{\sqrt{2}}\right) - \textrm{erf}\left(\frac{a}{\sqrt{2}}\right)\right) \\ &= \frac{1}{2}\left(\textrm{erfc}\left(\frac{a}{\sqrt{2}}\right) - \textrm{erfc}\left(\frac{b}{\sqrt{2}}\right)\right) \\ &= \frac{1}{2}\left( e^{-a^2/2}\textrm{erfcx}\left(\frac{a}{\sqrt{2}}\right) -e^{-b^2/2}\textrm{erfcx}\left(\frac{b}{\sqrt{2}}\right)\right) \\ &= \frac{1}{2}e^{-a^2/2}\left(\textrm{erfcx}\left(\frac{a}{\sqrt{2}}\right) -e^{(a^2-b^2)/2}\textrm{erfcx}\left(\frac{b}{\sqrt{2}}\right)\right) \\ &= \frac{1}{\sqrt{2\pi}}\int_a^b e^{-t^2/2}\,dt \label{eq:delta_integral} \end{align}
In the following formulas, \( a \le x \le b \) is assumed.
The PDF of truncated normal distribution \begin{equation} f(x, a, b) = \frac{\phi(x)}{\Delta(a, b)} \end{equation}
The CDF of the truncated normal distribution \begin{equation} F(x, a, b) = \frac{\Delta(a, x)}{\Delta(a, b)} \end{equation}
The complementary CDF (i.e. the survival function) of the truncated normal distribution \begin{equation} S(x, a, b) = \bar{F}(x, a, b) = \frac{\Delta(x, b)}{\Delta(a, b)} \end{equation}
In the following, \(X\) refers to a random variable with the truncated standard normal distribution, and \(E\) is the expectation operator.
Noncentral momemts (moments about 0) \begin{split} \mu_0' & = 1 \\ \mu_1' & = f(a, a, b) - f(b, a, b) \\ \mu_2' & = 1 + a f(a, a, b) - b f(b, a, b) \\ \mu_{k+1}' & = k\mu_{k-1}' + a^k f(a, a, b) - b^k f(b, a, b) \end{split}
Mean \begin{equation} \mu = E(X) = f(a, a, b) - f(b, a, b) = \frac{\phi(a) - \phi(b)}{\Delta(a, b)} \end{equation}
Variance \begin{align} \mu_2 = E((X - \mu)^2) &= 1 + af(a, a, b) - bf(b, a, b) - \left(f(a, a, b) - f(b, a, b)\right)^2 \\ &= 1 + \frac{a\phi(a) - b\phi(b)}{\Delta(a, b)} - \left(\frac{\phi(a) - \phi(b)}{\Delta(a, b)}\right)^2 \end{align}
Skewness (standard formula in terms of noncentral moments) \begin{equation} \tilde{\mu}_3 = \frac{\mu_3}{\sigma^3} = \frac{\mu_3' + \mu_1'(-3\mu_2' + 2\mu_1'^2)} {\left(\mu_2' - \mu_1'^2\right)^\frac{3}{2}} \end{equation}
Kurtosis (standard formula in terms of noncentral moments) \begin{equation} \tilde{\mu}_4 = \frac{\mu_4}{\sigma^4} = \frac{\mu_4' + \mu_1' (-4\mu_3' + 3\mu_1(2\mu_2' - \mu_1'^2))} {\left(\mu_2' - \mu_1'^2\right)^2} \end{equation} The excess kurtosis is \(\tilde{\mu}_4 - 3\).
The naive implementations of the PDF, CDF and moment functions run into numerical problems. Before dealing with those, we first look at the most basic problem, that of computing \(\Delta(a, b)\).
The naive method for computing \(\Delta(a, b)\) is to use a function
that computes \(\Phi(x)\) such as scipy.special.ndtr
and write
ndtr(b) - ndtr(a)
. The problem with this implementation
occurs when both \(a\) and \(b\) are positive and of sufficient magnitude that
\(\Phi(a) \approx \Phi(b) \approx 1\). The subtraction results in
catastrophic loss of precision.
For example, \(\Delta(9, 9.5)\) is approximately 1.118093890878478e-19, as can
be verified with mpmath.ncdf
:
>>> import mpmath >>> mpmath.mp.dps = 100 >>> float(mpmath.ncdf(9.5) - mpmath.ncdf(9)) 1.118093890878478e-19With the naive calculation, we get 0:
>>> from scipy.special import ndtr >>> ndtr(9.5) - ndtr(9) 0.0The standard method to compute \(\Delta(a, b)\) is to switch to the equivalent expression \(\bar{\Phi}(a) - \bar{\Phi}(b)\) when both \(a\) and \(b\) are positive. This expression can be implemented using
ndtr
by taking advantage of the symmetry of \(\Phi(x)\):
>>> ndtr(-9) - ndtr(-9.5) 1.1180938908784698e-19Loss of precision also occurs when \(a\) and \(b\) have "moderate" values (i.e. not far into the tails of the distribution) but are very close together. For example, suppose \(b\) is -0.1, and \(a = b - \textrm{1e-7}\). With
mpmath
we compute
>>> b = -0.1 >>> a = b - 1e-7 >>> float(mpmath.ncdf(b) - mpmath.ncdf(a)) 3.96952545503663e-08With
ndtr
,
>>> ndtr(b) - ndtr(a) 3.96952545278495e-08which has a relative error of approximately 6e-10. That might be acceptable in some contexts, but when this result is used as a starting point for further calculations, the error might be amplified.
The PDF can be implemented in a straightforward way: use the standard formula for \(\phi(x)\), and use a "smart" implementation of \(\Delta(a, b)\). There are a few problems that arise:
nan
,
because both the numerator and denominator are 0:
>>> from scipy.stats import norm >>> a = 39 >>> b = 40 >>> x = a >>> norm.pdf(x) # phi(x) 0.0 >>> norm.sf(a) - norm.sf(b) # Delta(a, b) 0.0We can compute the true value with
truncnorm
from mpsci.distributions
(mpmath.mp.dps
is still 100, as above):
>>> from mpsci.distributions import truncnorm >>> float(truncnorm.pdf(x, a, b)) 39.02560741993011
>>> a = 1 >>> b = a + 1e-8 >>> x = a >>> float(truncnorm.pdf(x, a, b)) 100000001.10774711The calculation using the straightforward formula gives only about 9 significant digits:
>>> norm.pdf(x) / (norm.sf(a) - norm.sf(b)) 100000001.35619442
import numpy as np from scipy.special import erfcx SQRT2 = np.sqrt(2) SQRT2_PI = np.sqrt(2 / np.pi) def truncnorm_pdf(x, a, b): if abs(a) > abs(b): a, b = -b, -a d = (a - b)*((a + b)/2)) # (a**2 - b**2)/2 t = (a - x)*((a + x)/2)) # (a**2 - x**2)/2 a2 = a / SQRT2 b2 = b / SQRT2 return SQRT2_PI * np.exp(t)/(erfcx(a2) - np.exp(d)*erfcx(b2))With this function, the result for the example given above is computed accurately:
>>> truncnorm_pdf(39, 39, 40) # Should be 39.02560741993011 39.025607419930104For large \(x\), \(\textrm{erfcx(x)}\) behaves like \(\frac{1}{\sqrt{\pi} x}\), so the truncation interval would have to extremely far into the tail of the normal distribution to encounter an underflow problem in the denominator. The numerator can still underflow, but that is acceptable; it just means that the best floating point representation of the PDF is, in fact, 0.
The above reformulation does not fix the problem of loss of precision when the truncation interval is small.
>>> a = 1 >>> b = 1 + 1e-8 >>> x = a >>> truncnorm_pdf(x, a, b) # Should be 100000001.10774711 100000003.08114655(The relative error in that example is actually worse than that of the straightforward calculation shown earlier.)
A possible solution to the loss of precision is to detect when it is likely to happen, and use an alternative method to compute the PDF. Possible alternatives include:
Use higher precision numbers for the intermediate calculation. Ultimately there would still be loss of precision for even smaller intervals, but something like quad-precision or double-double would cover most conceivably useful edge cases.
It would be nice if this could be implemented in C or C++ by just upgrading the
type of the variables from double
to long double
.
This won't work, however, because long double
does not have
a standard number of bits of precision. Depending on the platform and
compiler, long double
might be 80 bit "extended precision", true 128 bit quad
precision, or the same 64 bit precision as double
.
GCC provides the type __float128
, but that is a GCC extension
that is not available with other compilers.
Alternatives include using Boost's multiprecision (since we now have boost available), or using some other high precision floating point library.
As an experiment, I implemented a relatively simple double-double class in C++
called DoubleDouble
, and created a function truncnorm_pdf(x, a, b)
for the DoubleDouble
type. This main program
int main(void) { DoubleDouble a, b, pa; a = DoubleDouble{1.0, 0.0}; b = DoubleDouble{1.00000001, 0.0}; // 1 + 1e-8 pa = truncnorm_pdf(a, a, b); disp("pa ", pa); a = DoubleDouble{39.0, 0.0}; b = DoubleDouble{40.0, 0.0}; pa = truncnorm_pdf(a, a, b); disp("pa ", pa); return 0; }prints
pa 1.00000001107747108e+08 + -4.45843655745240463e-09 pa 3.90256074199301111e+01 + -2.34094507259581111e-15The first component of each result matches the expected double precision value exactly; the discrepancies in the last few digits are the result of the
disp
function
printing a digit or two more than necessary.
Instead of computing the denominator with subtraction, use the integral (\ref{eq:delta_integral})
to compute the denominator. This could be done using any numerical quadrature
method, such as those used in scipy.integrate.quad
or in
boost::math::quadrature::gauss_kronrod
. We still need to avoid simultaneuous
underflow of the numerator and denominator; we can do this by writing \(f(x, a, b)\)
as
\begin{equation}
f(x, a, b) = \frac{e^{-x^2/2}}{\int_a^b e^{-t^2/2}\,dt} = \frac{1}{\int_a^b e^{(x^2 - t^2)/2}\,dt}
\end{equation}
A Python prototype using scipy.integrate.quad
is
from scipy.integrate import quad def truncnorm_pdf_quad(x, a, b): def integrand(t, x): z = (x - t)*((x + t)/2) # (x**2 - t**2)/2 return np.exp(z) return 1/quad(integrand, a, b, args=(x,), epsrel=1e-13, epsabs=2e-15)[0]With this function, we get an accurate value for the above example:
>>> truncnorm_pdf_quad(1, 1, 1+1e-8) # Should be 100000001.10774711 100000001.10774708We can verify that this method also avoids underflow:
>>> truncnorm_pdf_quad(39, 39, 40) # Should be 39.02560741993011 39.02560741993146A drawback of this method is that it is potentially very slow; preliminary tests with a C++ implementation based on
boost::math::quadrature::gauss_kronrod
were not encouraging.
If this approach is to be used, there must be a way to detect that the subtraction will result in a significant loss of precision. The code currently does this by computing the ratio of the numbers to be subtracted, and checking if the ratio is close to 1. We know that both terms are positive, and that the second term is smaller than the first, so a condition for potential loss of precision in the expression \(s_1 - s_2\) is \(s_2/s_1 \gt \frac{1}{2} \). (A higher threshold might also give acceptable results.)
The mean is \begin{align} \mu &= \frac{\phi(a) - \phi(b)}{\Delta(a, b)} \\ &= \sqrt{\frac{2}{\pi}} \frac{e^{-a^2/2} - e^{-b^2/2}}{\textrm{erfc}\left(\frac{a}{\sqrt{2}}\right) - \textrm{erfc}\left(\frac{b}{\sqrt{2}}\right)} \\ &= \sqrt{\frac{2}{\pi}} \frac{e^{-a^2/2} - e^{-b^2/2}}{e^{-a^2/2}\textrm{erfcx}\left(\frac{a}{\sqrt{2}}\right) - e^{-b^2/2}\textrm{erfcx}\left(\frac{b}{\sqrt{2}}\right)} \\ &= \sqrt{\frac{2}{\pi}} \frac{1 - e^{(a^2 -b^2)/2}}{\textrm{erfcx}\left(\frac{a}{\sqrt{2}}\right) - e^{(a^2-b^2)/2}\textrm{erfcx}\left(\frac{b}{\sqrt{2}}\right)} \end{align}