PyTorch API

geomloss - Geometric Loss functions, with full support of PyTorch’s autograd engine:

SamplesLoss([loss, p, blur, reach, …])

Creates a criterion that computes distances between sampled measures on a vector space.

class geomloss.SamplesLoss(loss='sinkhorn', p=2, blur=0.05, reach=None, diameter=None, scaling=0.5, truncate=5, cost=None, kernel=None, cluster_scale=None, debias=True, potentials=False, verbose=False, backend='auto')[source]

Creates a criterion that computes distances between sampled measures on a vector space.


If loss is "sinkhorn" and reach is None (balanced Optimal Transport), the resulting routine will expect measures whose total masses are equal with each other.

  • loss (string, default = "sinkhorn") –

    The loss function to compute. The supported values are:

    • "sinkhorn": (Un-biased) Sinkhorn divergence, which interpolates between Wasserstein (blur=0) and kernel (blur= \(+\infty\) ) distances.

    • "hausdorff": Weighted Hausdorff distance, which interpolates between the ICP loss (blur=0) and a kernel distance (blur= \(+\infty\) ).

    • "energy": Energy Distance MMD, computed using the kernel \(k(x,y) = -\|x-y\|_2\).

    • "gaussian": Gaussian MMD, computed using the kernel \(k(x,y) = \exp \big( -\|x-y\|_2^2 \,/\, 2\sigma^2)\) of standard deviation \(\sigma\) = blur.

    • "laplacian": Laplacian MMD, computed using the kernel \(k(x,y) = \exp \big( -\|x-y\|_2 \,/\, \sigma)\) of standard deviation \(\sigma\) = blur.

  • p (int, default=2) –

    If loss is "sinkhorn" or "hausdorff", specifies the ground cost function between points. The supported values are:

    • p = 1: \(~~C(x,y) ~=~ \|x-y\|_2\).

    • p = 2: \(~~C(x,y) ~=~ \tfrac{1}{2}\|x-y\|_2^2\).

  • blur (float, default=.05) –

    The finest level of detail that should be handled by the loss function - in order to prevent overfitting on the samples’ locations.

    • If loss is "gaussian" or "laplacian", it is the standard deviation \(\sigma\) of the convolution kernel.

    • If loss is "sinkhorn" or "haudorff", it is the typical scale \(\sigma\) associated to the temperature \(\varepsilon = \sigma^p\). The default value of .05 is sensible for input measures that lie in the unit square/cube.

    Note that the Energy Distance is scale-equivariant, and won’t be affected by this parameter.

  • reach (float, default=None= \(+\infty\)) – If loss is "sinkhorn" or "hausdorff", specifies the typical scale \(\tau\) associated to the constraint strength \(\rho = \tau^p\).

  • diameter (float, default=None) – A rough indication of the maximum distance between points, which is used to tune the \(\varepsilon\)-scaling descent and provide a default heuristic for clustering multiscale schemes. If None, a conservative estimate will be computed on-the-fly.

  • scaling (float, default=.5) – If loss is "sinkhorn", specifies the ratio between successive values of \(\sigma=\varepsilon^{1/p}\) in the \(\varepsilon\)-scaling descent. This parameter allows you to specify the trade-off between speed (scaling < .4) and accuracy (scaling > .9).

  • truncate (float, default=None= \(+\infty\)) – If backend is "multiscale", specifies the effective support of a Gaussian/Laplacian kernel as a multiple of its standard deviation. If truncate is not None, kernel truncation steps will assume that \(\exp(-x/\sigma)\) or \(\exp(-x^2/2\sigma^2) are zero when :math:\)|x| ,>, text{truncate}cdot sigma`.

  • cost (function or string, default=None) –

    if loss is "sinkhorn" or "hausdorff", specifies the cost function that should be used instead of \(\tfrac{1}{p}\|x-y\|^p\):

    • If backend is "tensorized", cost should be a python function that takes as input a (B,N,D) torch Tensor x, a (B,M,D) torch Tensor y and returns a batched Cost matrix as a (B,N,M) Tensor.

    • Otherwise, if backend is "online" or "multiscale", cost should be a KeOps formula, given as a string, with variables X and Y. The default values are "Norm2(X-Y)" (for p = 1) and "(SqDist(X,Y) / IntCst(2))" (for p = 2).

  • cluster_scale (float, default=None) – If backend is "multiscale", specifies the coarse scale at which cluster centroids will be computed. If None, a conservative estimate will be computed from diameter and the ambient space’s dimension, making sure that memory overflows won’t take place.

  • debias (bool, default=True) – If loss is "sinkhorn", specifies if we should compute the unbiased Sinkhorn divergence instead of the classic, entropy-regularized “SoftAssign” loss.

  • potentials (bool, default=False) – When this parameter is set to True, the SamplesLoss layer returns a pair of optimal dual potentials \(F\) and \(G\), sampled on the input measures, instead of differentiable scalar value. These dual vectors \((F(x_i))\) and \((G(y_j))\) are encoded as Torch tensors, with the same shape as the input weights \((\alpha_i)\) and \((\beta_j)\).

  • verbose (bool, default=False) – If backend is "multiscale", specifies whether information on the clustering and \(\varepsilon\)-scaling descent should be displayed in the standard output.

  • backend (string, default = "auto") –

    The implementation that will be used in the background; this choice has a major impact on performance. The supported values are:

    • "auto": Choose automatically, using a simple heuristic based on the inputs’ shapes.

    • "tensorized": Relies on a full cost/kernel matrix, computed once and for all and stored on the device memory. This method is fast, but has a quadratic memory footprint and does not scale beyond ~5,000 samples per measure.

    • "online": Computes cost/kernel values on-the-fly, leveraging online map-reduce CUDA routines provided by the pykeops library.

    • "multiscale": Fast implementation that scales to millions of samples in dimension 1-2-3, relying on the block-sparse reductions provided by the pykeops library.


Computes the loss between sampled measures.

Documentation and examples: Soon! Until then, please check the tutorials :-)