Truncated standard normal distribution

Notes for the truncated standard normal distribution

References

Truncated normal distribution on wikipedia
The Truncated Normal Distribution (PDF), by John Burkardt
mpsci.distributions.truncnorm provides an implementation of the truncated standard normal distribution based on mpmath.

Shape parameters are \(a\) and \(b\), which define the truncation interval \([a, b]\).

We assume \(a \lt b\).

The standard normal distribution

Let \(\phi(x)\) and \(\Phi(x)\) the PDF and CDF, respectively, of the standard normal distribution. Let \(\bar{\Phi}(x)\) be the complementary CDF of the standard normal function. This function, also known as the survival function, is \(1 - \Phi(x)\).

For future reference, \begin{equation} \phi(x) = \frac{1}{\sqrt{2\pi}}e^{-\frac{x^2}{2}} \label{eq:phi} \end{equation} \begin{align} \Phi(x) &= \frac{1}{2} \left[1 + \textrm{erf}\left(\frac{x}{\sqrt{2}}\right)\right] \\ &= 1 - \frac{1}{2}\textrm{erfc}\left(\frac{x}{\sqrt{2}}\right) \end{align} \begin{align} \bar{\Phi}(x) &= \frac{1}{2} \left[1 - \textrm{erf}\left(\frac{x}{\sqrt{2}}\right)\right] \\ &= \frac{1}{2}\textrm{erfc}\left(\frac{x}{\sqrt{2}}\right) \end{align} where \(\textrm{erf}(x)\) is the error function, and \(\textrm{erfc}\) is the complementary error function, \(\textrm{erfc}(x) = 1 - \textrm{erf}(x)\). We will also use the scaled complementary error function \(\textrm{erfcx}(x)\), \begin{equation} \textrm{erfcx}(x) = e^{x^2}\textrm{erfc}(x) \end{equation}

The truncated standard normal distribution

We will frequently use the expression \(\Phi(b) - \Phi(a)\) (and several other mathematically equivalent expressions), so for convenience, define \begin{equation} \Delta(a, b) = \Phi(b) - \Phi(a) = \bar{\Phi}(a) - \bar{\Phi}(b) \end{equation} \(\Delta(a, b)\) can be expressed many ways: \begin{align} \Delta(a, b) &= \frac{1}{2}\left(\textrm{erf}\left(\frac{b}{\sqrt{2}}\right) - \textrm{erf}\left(\frac{a}{\sqrt{2}}\right)\right) \\ &= \frac{1}{2}\left(\textrm{erfc}\left(\frac{a}{\sqrt{2}}\right) - \textrm{erfc}\left(\frac{b}{\sqrt{2}}\right)\right) \\ &= \frac{1}{2}\left( e^{-a^2/2}\textrm{erfcx}\left(\frac{a}{\sqrt{2}}\right) -e^{-b^2/2}\textrm{erfcx}\left(\frac{b}{\sqrt{2}}\right)\right) \\ &= \frac{1}{2}e^{-a^2/2}\left(\textrm{erfcx}\left(\frac{a}{\sqrt{2}}\right) -e^{(a^2-b^2)/2}\textrm{erfcx}\left(\frac{b}{\sqrt{2}}\right)\right) \\ &= \frac{1}{\sqrt{2\pi}}\int_a^b e^{-t^2/2}\,dt \label{eq:delta_integral} \end{align}

In the following formulas, \( a \le x \le b \) is assumed.

Probability density function

The PDF of truncated normal distribution \begin{equation} f(x, a, b) = \frac{\phi(x)}{\Delta(a, b)} \end{equation}

The CDF of the truncated normal distribution \begin{equation} F(x, a, b) = \frac{\Delta(a, x)}{\Delta(a, b)} \end{equation}

The complementary CDF (i.e. the survival function) of the truncated normal distribution \begin{equation} S(x, a, b) = \bar{F}(x, a, b) = \frac{\Delta(x, b)}{\Delta(a, b)} \end{equation}

In the following, \(X\) refers to a random variable with the truncated standard normal distribution, and \(E\) is the expectation operator.

Moments

Noncentral momemts (moments about 0) \begin{split} \mu_0' & = 1 \\ \mu_1' & = f(a, a, b) - f(b, a, b) \\ \mu_2' & = 1 + a f(a, a, b) - b f(b, a, b) \\ \mu_{k+1}' & = k\mu_{k-1}' + a^k f(a, a, b) - b^k f(b, a, b) \end{split}

Mean \begin{equation} \mu = E(X) = f(a, a, b) - f(b, a, b) = \frac{\phi(a) - \phi(b)}{\Delta(a, b)} \end{equation}

Variance \begin{align} \mu_2 = E((X - \mu)^2) &= 1 + af(a, a, b) - bf(b, a, b) - \left(f(a, a, b) - f(b, a, b)\right)^2 \\ &= 1 + \frac{a\phi(a) - b\phi(b)}{\Delta(a, b)} - \left(\frac{\phi(a) - \phi(b)}{\Delta(a, b)}\right)^2 \end{align}

Skewness (standard formula in terms of noncentral moments) \begin{equation} \tilde{\mu}_3 = \frac{\mu_3}{\sigma^3} = \frac{\mu_3' + \mu_1'(-3\mu_2' + 2\mu_1'^2)} {\left(\mu_2' - \mu_1'^2\right)^\frac{3}{2}} \end{equation}

Kurtosis (standard formula in terms of noncentral moments) \begin{equation} \tilde{\mu}_4 = \frac{\mu_4}{\sigma^4} = \frac{\mu_4' + \mu_1' (-4\mu_3' + 3\mu_1(2\mu_2' - \mu_1'^2))} {\left(\mu_2' - \mu_1'^2\right)^2} \end{equation} The excess kurtosis is \(\tilde{\mu}_4 - 3\).

Computational issues

The naive implementations of the PDF, CDF and moment functions run into numerical problems. Before dealing with those, we first look at the most basic problem, that of computing \(\Delta(a, b)\).

Computing \(\Delta(a, b)\)

The naive method for computing \(\Delta(a, b)\) is to use a function that computes \(\Phi(x)\) such as scipy.special.ndtr and write ndtr(b) - ndtr(a). The problem with this implementation occurs when both \(a\) and \(b\) are positive and of sufficient magnitude that \(\Phi(a) \approx \Phi(b) \approx 1\). The subtraction results in catastrophic loss of precision. For example, \(\Delta(9, 9.5)\) is approximately 1.118093890878478e-19, as can be verified with mpmath.ncdf:

  >>> import mpmath
  >>> mpmath.mp.dps = 100
  >>> float(mpmath.ncdf(9.5) - mpmath.ncdf(9))
  1.118093890878478e-19

With the naive calculation, we get 0:

  >>> from scipy.special import ndtr
  >>> ndtr(9.5) - ndtr(9)
  0.0

The standard method to compute \(\Delta(a, b)\) is to switch to the equivalent expression \(\bar{\Phi}(a) - \bar{\Phi}(b)\) when both \(a\) and \(b\) are positive. This expression can be implemented using ndtr by taking advantage of the symmetry of \(\Phi(x)\):

  >>> ndtr(-9) - ndtr(-9.5)
  1.1180938908784698e-19

Loss of precision also occurs when \(a\) and \(b\) have "moderate" values (i.e. not far into the tails of the distribution) but are very close together. For example, suppose \(b\) is -0.1, and \(a = b - \textrm{1e-7}\). With mpmath we compute

  >>> b = -0.1
  >>> a = b - 1e-7
  >>> float(mpmath.ncdf(b) - mpmath.ncdf(a))
  3.96952545503663e-08

With ndtr,

  >>> ndtr(b) - ndtr(a)
  3.96952545278495e-08

which has a relative error of approximately 6e-10. That might be acceptable in some contexts, but when this result is used as a starting point for further calculations, the error might be amplified.

Computing the PDF

The PDF can be implemented in a straightforward way: use the standard formula for \(\phi(x)\), and use a "smart" implementation of \(\Delta(a, b)\). There are a few problems that arise:

If the truncation interval is far in the tail of the normal distribution, both \(\phi(x)\) and \(\Delta(a, b)\) could underflow to 0, even though the ratio \(\phi(x)/\Delta(a, b)\) is a value that is representable as, say, a double precision value. For example, consider \(a = 39\), \(b = 40\) and \(x = a = 39\). Attempting to compute \(f(x, a, b)\) with the straightforward formula would result in nan, because both the numerator and denominator are 0:
```
  >>> from scipy.stats import norm
  >>> a = 39
  >>> b = 40
  >>> x = a
  >>> norm.pdf(x)              # phi(x)
  0.0
  >>> norm.sf(a) - norm.sf(b)  # Delta(a, b)
  0.0
  
```
We can compute the true value with truncnorm from mpsci.distributions (mpmath.mp.dps is still 100, as above):
```
  >>> from mpsci.distributions import truncnorm
  >>> float(truncnorm.pdf(x, a, b))
  39.02560741993011
  
```
Precision is lost when the truncation interval is small. For example, let \(a = 1\) and \(b = a + 10^{-8}\). The value of the PDF at \(x = a\) is 100000001.10774711:
```
  >>> a = 1
  >>> b = a + 1e-8
  >>> x = a
  >>> float(truncnorm.pdf(x, a, b))
  100000001.10774711
  
```
The calculation using the straightforward formula gives only about 9 significant digits:
```
  >>> norm.pdf(x) / (norm.sf(a) - norm.sf(b))
  100000001.35619442
  
```

We can avoid the underflow problem by reformulating the expression that is used to compute the PDF, as follows: \begin{align} f(x, a, b) &= \frac{\phi(x)}{\Delta(a, b)} \\ &= \frac{1}{\sqrt{2\pi}} \frac{e^{-x^2/2}} {\frac{1}{2}e^{-a^2/2}\left(\textrm{erfcx}\left(\frac{a}{\sqrt{2}}\right) -e^{(a^2-b^2)/2}\textrm{erfcx}\left(\frac{b}{\sqrt{2}}\right)\right)} \\ &= \sqrt{\frac{2}{\pi}} \frac{e^{(a^2-x^2)/2}} {\textrm{erfcx}\left(\frac{a}{\sqrt{2}}\right) -e^{(a^2-b^2)/2}\textrm{erfcx}\left(\frac{b}{\sqrt{2}}\right)} \end{align} A Python implementation might look like

  import numpy as np
  from scipy.special import erfcx

  SQRT2 = np.sqrt(2)
  SQRT2_PI = np.sqrt(2 / np.pi)

  def truncnorm_pdf(x, a, b):
      if abs(a) > abs(b):
          a, b = -b, -a
      d = (a - b)*((a + b)/2))  # (a**2 - b**2)/2
      t = (a - x)*((a + x)/2))  # (a**2 - x**2)/2
      a2 = a / SQRT2
      b2 = b / SQRT2
      return SQRT2_PI * np.exp(t)/(erfcx(a2) - np.exp(d)*erfcx(b2))

With this function, the result for the example given above is computed accurately:

  >>> truncnorm_pdf(39, 39, 40)  # Should be 39.02560741993011
  39.025607419930104

For large \(x\), \(\textrm{erfcx(x)}\) behaves like \(\frac{1}{\sqrt{\pi} x}\), so the truncation interval would have to extremely far into the tail of the normal distribution to encounter an underflow problem in the denominator. The numerator can still underflow, but that is acceptable; it just means that the best floating point representation of the PDF is, in fact, 0.

The above reformulation does not fix the problem of loss of precision when the truncation interval is small.

  >>> a = 1
  >>> b = 1 + 1e-8
  >>> x = a
  >>> truncnorm_pdf(x, a, b)  # Should be 100000001.10774711
  100000003.08114655

(The relative error in that example is actually worse than that of the straightforward calculation shown earlier.)

A possible solution to the loss of precision is to detect when it is likely to happen, and use an alternative method to compute the PDF. Possible alternatives include:

Use higher precision numbers for the intermediate calculation. Ultimately there would still be loss of precision for even smaller intervals, but something like quad-precision or double-double would cover most conceivably useful edge cases.

It would be nice if this could be implemented in C or C++ by just upgrading the type of the variables from double to long double. This won't work, however, because long double does not have a standard number of bits of precision. Depending on the platform and compiler, long double might be 80 bit "extended precision", true 128 bit quad precision, or the same 64 bit precision as double. GCC provides the type __float128, but that is a GCC extension that is not available with other compilers.
Alternatives include using Boost's multiprecision (since we now have boost available), or using some other high precision floating point library.

As an experiment, I implemented a relatively simple double-double class in C++ called DoubleDouble, and created a function truncnorm_pdf(x, a, b) for the DoubleDouble type. This main program
```
  
  int main(void)
  {
      DoubleDouble a, b, pa;

      a = DoubleDouble{1.0, 0.0};
      b = DoubleDouble{1.00000001, 0.0};  // 1 + 1e-8
      pa = truncnorm_pdf(a, a, b);
      disp("pa ", pa);

      a = DoubleDouble{39.0, 0.0};
      b = DoubleDouble{40.0, 0.0};
      pa = truncnorm_pdf(a, a, b);
      disp("pa ", pa);

      return 0;
  }
    
```
prints
```
  pa     1.00000001107747108e+08 +   -4.45843655745240463e-09
  pa     3.90256074199301111e+01 +   -2.34094507259581111e-15
    
```
The first component of each result matches the expected double precision value exactly; the discrepancies in the last few digits are the result of the disp function printing a digit or two more than necessary.
Instead of computing the denominator with subtraction, use the integral (\ref{eq:delta_integral}) to compute the denominator. This could be done using any numerical quadrature method, such as those used in scipy.integrate.quad or in boost::math::quadrature::gauss_kronrod. We still need to avoid simultaneuous underflow of the numerator and denominator; we can do this by writing \(f(x, a, b)\) as \begin{equation} f(x, a, b) = \frac{e^{-x^2/2}}{\int_a^b e^{-t^2/2}\,dt} = \frac{1}{\int_a^b e^{(x^2 - t^2)/2}\,dt} \end{equation}

A Python prototype using scipy.integrate.quad is
```
    from scipy.integrate import quad

    def truncnorm_pdf_quad(x, a, b):
        def integrand(t, x):
            z = (x - t)*((x + t)/2)  # (x**2 - t**2)/2
            return np.exp(z)
        return 1/quad(integrand, a, b, args=(x,), epsrel=1e-13, epsabs=2e-15)[0]
    
```
With this function, we get an accurate value for the above example:
```
    >>> truncnorm_pdf_quad(1, 1, 1+1e-8)  # Should be 100000001.10774711
    100000001.10774708
    
```
We can verify that this method also avoids underflow:
```
    >>> truncnorm_pdf_quad(39, 39, 40)  # Should be 39.02560741993011
    39.02560741993146
    
```
A drawback of this method is that it is potentially very slow; preliminary tests with a C++ implementation based on boost::math::quadrature::gauss_kronrod were not encouraging.

If this approach is to be used, there must be a way to detect that the subtraction will result in a significant loss of precision. The code currently does this by computing the ratio of the numbers to be subtracted, and checking if the ratio is close to 1. We know that both terms are positive, and that the second term is smaller than the first, so a condition for potential loss of precision in the expression \(s_1 - s_2\) is \(s_2/s_1 \gt \frac{1}{2} \). (A higher threshold might also give acceptable results.)

Computing the mean

The mean is \begin{align} \mu &= \frac{\phi(a) - \phi(b)}{\Delta(a, b)} \\ &= \sqrt{\frac{2}{\pi}} \frac{e^{-a^2/2} - e^{-b^2/2}}{\textrm{erfc}\left(\frac{a}{\sqrt{2}}\right) - \textrm{erfc}\left(\frac{b}{\sqrt{2}}\right)} \\ &= \sqrt{\frac{2}{\pi}} \frac{e^{-a^2/2} - e^{-b^2/2}}{e^{-a^2/2}\textrm{erfcx}\left(\frac{a}{\sqrt{2}}\right) - e^{-b^2/2}\textrm{erfcx}\left(\frac{b}{\sqrt{2}}\right)} \\ &= \sqrt{\frac{2}{\pi}} \frac{1 - e^{(a^2 -b^2)/2}}{\textrm{erfcx}\left(\frac{a}{\sqrt{2}}\right) - e^{(a^2-b^2)/2}\textrm{erfcx}\left(\frac{b}{\sqrt{2}}\right)} \end{align}