# PyTorch API¶

geomloss - Geometric Loss functions, with full support of PyTorch’s autograd engine:

 SamplesLoss([loss, p, blur, reach, …]) Creates a criterion that computes distances between sampled measures on a vector space.
class geomloss.SamplesLoss(loss='sinkhorn', p=2, blur=0.05, reach=None, diameter=None, scaling=0.5, truncate=5, cost=None, kernel=None, cluster_scale=None, debias=True, potentials=False, verbose=False, backend='auto')[source]

Creates a criterion that computes distances between sampled measures on a vector space.

Warning

If loss is "sinkhorn" and reach is None (balanced Optimal Transport), the resulting routine will expect measures whose total masses are equal with each other.

Parameters
• loss (string, default = "sinkhorn") –

The loss function to compute. The supported values are:

• "sinkhorn": (Un-biased) Sinkhorn divergence, which interpolates between Wasserstein (blur=0) and kernel (blur= $$+\infty$$ ) distances.

• "hausdorff": Weighted Hausdorff distance, which interpolates between the ICP loss (blur=0) and a kernel distance (blur= $$+\infty$$ ).

• "energy": Energy Distance MMD, computed using the kernel $$k(x,y) = -\|x-y\|_2$$.

• "gaussian": Gaussian MMD, computed using the kernel $$k(x,y) = \exp \big( -\|x-y\|_2^2 \,/\, 2\sigma^2)$$ of standard deviation $$\sigma$$ = blur.

• "laplacian": Laplacian MMD, computed using the kernel $$k(x,y) = \exp \big( -\|x-y\|_2 \,/\, \sigma)$$ of standard deviation $$\sigma$$ = blur.

• p (int, default=2) –

If loss is "sinkhorn" or "hausdorff", specifies the ground cost function between points. The supported values are:

• p = 1: $$~~C(x,y) ~=~ \|x-y\|_2$$.

• p = 2: $$~~C(x,y) ~=~ \tfrac{1}{2}\|x-y\|_2^2$$.

• blur (float, default=.05) –

The finest level of detail that should be handled by the loss function - in order to prevent overfitting on the samples’ locations.

• If loss is "gaussian" or "laplacian", it is the standard deviation $$\sigma$$ of the convolution kernel.

• If loss is "sinkhorn" or "haudorff", it is the typical scale $$\sigma$$ associated to the temperature $$\varepsilon = \sigma^p$$. The default value of .05 is sensible for input measures that lie in the unit square/cube.

Note that the Energy Distance is scale-equivariant, and won’t be affected by this parameter.

• reach (float, default=None= $$+\infty$$) – If loss is "sinkhorn" or "hausdorff", specifies the typical scale $$\tau$$ associated to the constraint strength $$\rho = \tau^p$$.

• diameter (float, default=None) – A rough indication of the maximum distance between points, which is used to tune the $$\varepsilon$$-scaling descent and provide a default heuristic for clustering multiscale schemes. If None, a conservative estimate will be computed on-the-fly.

• scaling (float, default=.5) – If loss is "sinkhorn", specifies the ratio between successive values of $$\sigma=\varepsilon^{1/p}$$ in the $$\varepsilon$$-scaling descent. This parameter allows you to specify the trade-off between speed (scaling < .4) and accuracy (scaling > .9).

• truncate (float, default=None= $$+\infty$$) – If backend is "multiscale", specifies the effective support of a Gaussian/Laplacian kernel as a multiple of its standard deviation. If truncate is not None, kernel truncation steps will assume that $$\exp(-x/\sigma)$$ or $$\exp(-x^2/2\sigma^2) are zero when :math:$$|x| ,>, text{truncate}cdot sigma.

• cost (function or string, default=None) –

if loss is "sinkhorn" or "hausdorff", specifies the cost function that should be used instead of $$\tfrac{1}{p}\|x-y\|^p$$:

• If backend is "tensorized", cost should be a python function that takes as input a (B,N,D) torch Tensor x, a (B,M,D) torch Tensor y and returns a batched Cost matrix as a (B,N,M) Tensor.

• Otherwise, if backend is "online" or "multiscale", cost should be a KeOps formula, given as a string, with variables X and Y. The default values are "Norm2(X-Y)" (for p = 1) and "(SqDist(X,Y) / IntCst(2))" (for p = 2).

• cluster_scale (float, default=None) – If backend is "multiscale", specifies the coarse scale at which cluster centroids will be computed. If None, a conservative estimate will be computed from diameter and the ambient space’s dimension, making sure that memory overflows won’t take place.

• debias (bool, default=True) – If loss is "sinkhorn", specifies if we should compute the unbiased Sinkhorn divergence instead of the classic, entropy-regularized “SoftAssign” loss.

• potentials (bool, default=False) – When this parameter is set to True, the SamplesLoss layer returns a pair of optimal dual potentials $$F$$ and $$G$$, sampled on the input measures, instead of differentiable scalar value. These dual vectors $$(F(x_i))$$ and $$(G(y_j))$$ are encoded as Torch tensors, with the same shape as the input weights $$(\alpha_i)$$ and $$(\beta_j)$$.

• verbose (bool, default=False) – If backend is "multiscale", specifies whether information on the clustering and $$\varepsilon$$-scaling descent should be displayed in the standard output.

• backend (string, default = "auto") –

The implementation that will be used in the background; this choice has a major impact on performance. The supported values are:

• "auto": Choose automatically, using a simple heuristic based on the inputs’ shapes.

• "tensorized": Relies on a full cost/kernel matrix, computed once and for all and stored on the device memory. This method is fast, but has a quadratic memory footprint and does not scale beyond ~5,000 samples per measure.

• "online": Computes cost/kernel values on-the-fly, leveraging online map-reduce CUDA routines provided by the pykeops library.

• "multiscale": Fast implementation that scales to millions of samples in dimension 1-2-3, relying on the block-sparse reductions provided by the pykeops library.

forward`(*args)[source]

Computes the loss between sampled measures.

Documentation and examples: Soon! Until then, please check the tutorials :-)