Reductions

Following the same design principles, $Reduction$ operators are implemented in the keops/core/reductions/*.h headers. Taking as input an arbitrary symbolic formula F, Reduction templates encode generic Map-Reduce schemes and should implement a few standard routines.

Summation

In the case of the simple Sum reduction (Sum_Reduction.h header), these can be described as:

An InitializeReduction method, which fills up the running buffer “ $a$ ” of our Map-Reduce algorithm – a vector of size F::DIM – with zeros before the start of the loop on the reduction index $j$ .
A ReducePair method, which takes as input a pointer to the running buffer $a$ , a pointer to the result $F_{i, j} = F (p^{1}, \dots, x_{i}^{1}, \dots, y_{j}^{1}, \dots)$ and implements the in-place reduction:

$\begin{array}{r} a \leftarrow a + F_{i, j} . \end{array}$
A FinalizeOutput method, which post-processes the buffer $a$ before saving its value in the output array. This is a useful step for argmin-like reductions; but in the case of the sum, no post-processing is needed.

The online Log-Sum-Exp trick

More interestingly, the Max_SumShiftExp_Reduction.h header implements an online version of the well-known Log-Sum-Exp trick: a factorization of the maximum in the computation of

\begin{array}{r} \log \sum_{j = 1}^{N} \exp (F_{i, j}) = m_{i} + \log \sum_{j = 1}^{N} \exp (F_{i, j} - m_{i}), with m_{i} = {max}_{j = 1}^{N} F_{i, j} \end{array}

that ensures the computation of this important quantity – the linchpin of maximum likelihood estimators and entropic Optimal Transport solvers – without numeric overflows.

Merging the content of our C++ header and of the Python post-processing step implemented in pykeops/common/operations.py, assuming that $F_{i, j} = F (p^{1}, \dots, x_{i}^{1}, \dots, y_{j}^{1}, \dots)$ is a scalar quantity, we may describe its behaviour as follows:

The InitializeReduction method ensures that our running buffer $a$ is a vector of size 2 that encodes the current value of the inner summation as an explicit (exponent, mantissa) or “(maximum, residual)” pair of float numbers: at any stage of the computation, the pair $(m, r)$ encodes the positive number $e^{m} \cdot r$ with the required precision. We initially set the value of $a$ to $(- \infty, 0) ≃ e^{- \infty} \cdot 0$ .
The ReducePair method takes as input a pointer to the result $F_{i, j}$ of the computation, a pointer to the running buffer $a = (m, r) ≃ e^{m} \cdot r$ and implements the in-place update:

$\begin{array}{r} \begin{matrix} (m, r) \leftarrow {\begin{cases} (m, r + e^{F_{i, j} - m}) & if m ⩾ F_{i, j} \\ (F_{i, j}, 1 + r \cdot e^{m - F_{i, j}}) & otherwise. \end{cases} \end{matrix} \end{array}$

This is a numerically stable way of writing the sum reduction:

$\begin{array}{r} \begin{matrix} e^{m} \cdot r \leftarrow e^{m} \cdot r + e^{F_{i, j}} = {\begin{cases} e^{m} \cdot (r + e^{F_{i, j} - m}) & if m ⩾ F_{i, j} \\ e^{F_{i, j}} \cdot (1 + r \cdot e^{m - F_{i, j}}) & otherwise. \end{cases} \end{matrix} \end{array}$
FinalizeOutput post-processes the buffer $a = (m, r) ≃ e^{m} \cdot r$ by applying the final “ $\log$ ” operation, returning a value of $m + \log (r)$ for the full reduction.