.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "_auto_examples/performances/plot_benchmarks_ot_3D.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download__auto_examples_performances_plot_benchmarks_ot_3D.py>`
        to download the full example code

.. rst-class:: sphx-glr-example-title

.. _sphx_glr__auto_examples_performances_plot_benchmarks_ot_3D.py:


Wasserstein distances between large point clouds
==========================================================

Let's compare the performances of several OT solvers
on subsampled versions of the `Stanford dragon <http://graphics.stanford.edu/data/3Dscanrep/>`_,
a standard test surface made up of more than **870,000 triangles**.
In this benchmark, we measure timings on a simple registration task:
the **optimal transport of a sphere onto the (subsampled) dragon**, using
a quadratic ground cost 
:math:`\text{C}(x,y) = \tfrac{1}{2}\|x-y\|^2`
in the ambient space :math:`\mathbb{R}^3`.

.. GENERATED FROM PYTHON SOURCE LINES 17-46

More precisely: having loaded and represented our 3D meshes
as discrete probability measures

.. math::
  \alpha ~=~ \sum_{i=1}^N \alpha_i\,\delta_{x_i}, ~~~
  \beta  ~=~ \sum_{j=1}^M \beta_j\,\delta_{y_j},

with **one weighted Dirac mass per triangle**, we will strive to solve the primal-dual entropic OT problem:

.. math::
  \text{OT}_\varepsilon(\alpha,\beta)~&=~
      \min_{0 \leqslant \pi \ll \alpha\otimes\beta} ~\langle\text{C},\pi\rangle
          ~+~\varepsilon\,\text{KL}(\pi,\alpha\otimes\beta) \quad\text{s.t.}~~
       \pi\,\mathbf{1} = \alpha ~~\text{and}~~ \pi^\intercal \mathbf{1} = \beta\\
   &=~ \max_{f,g} ~~\langle \alpha,f\rangle + \langle \beta,g\rangle
        - \varepsilon\langle \alpha\otimes\beta,
          \exp \tfrac{1}{\varepsilon}[ f\oplus g - \text{C} ] - 1 \rangle

as fast as possible, optimizing on **dual vectors**:

.. math::
  F_i ~=~ f(x_i), ~~~ G_j ~=~ g(y_j)

that encode an implicit transport plan:

.. math::
  \pi ~&=~ \exp \tfrac{1}{\varepsilon}( f\oplus g - \text{C})~\cdot~ \alpha\otimes\beta,\\
  \text{i.e.}~~\pi_{x_i \leftrightarrow y_j}~&=~ \exp \tfrac{1}{\varepsilon}( F_i + G_j - \text{C}(x_i,y_j))~\cdot~ \alpha_i \beta_j.


.. GENERATED FROM PYTHON SOURCE LINES 48-52

Comparing OT solvers with each other
--------------------------------------

First, let's make some standard imports:

.. GENERATED FROM PYTHON SOURCE LINES 52-64

.. code-block:: Python


    import numpy as np
    import torch

    use_cuda = torch.cuda.is_available()
    tensor = torch.cuda.FloatTensor if use_cuda else torch.FloatTensor
    numpy = lambda x: x.detach().cpu().numpy()

    from matplotlib import pyplot as plt
    from mpl_toolkits.mplot3d import Axes3D









.. GENERATED FROM PYTHON SOURCE LINES 65-68

This tutorial is all about highlighting the differences between
the GeomLoss solvers, packaged in the :mod:`SamplesLoss <geomloss.SamplesLoss>`
module, and a standard Sinkhorn (or *soft-Auction*) loop.

.. GENERATED FROM PYTHON SOURCE LINES 68-71

.. code-block:: Python


    from geomloss import SamplesLoss








.. GENERATED FROM PYTHON SOURCE LINES 72-77

Our baseline is provided by a simple **Sinkhorn loop**, implemented
in the **log-domain** for the sake of numerical stability.
Using the same code, we provide two backends:
a **tensorized** PyTorch implementation (which has a quadratic memory footprint)
and a **scalable** KeOps code (which has a **linear** memory footprint).

.. GENERATED FROM PYTHON SOURCE LINES 78-130

.. code-block:: Python


    from pykeops.torch import LazyTensor


    def sinkhorn_loop(a_i, x_i, b_j, y_j, blur=0.01, nits=100, backend="keops"):
        """Straightforward implementation of the Sinkhorn-IPFP-SoftAssign loop in the log domain."""

        # Compute the logarithm of the weights (needed in the softmin reduction) ---
        loga_i, logb_j = a_i.log(), b_j.log()
        loga_i, logb_j = loga_i[:, None, None], logb_j[None, :, None]

        # Compute the cost matrix C_ij = (1/2) * |x_i-y_j|^2 -----------------------
        if backend == "keops":  # C_ij is a *symbolic* LazyTensor
            x_i, y_j = LazyTensor(x_i[:, None, :]), LazyTensor(y_j[None, :, :])
            C_ij = ((x_i - y_j) ** 2).sum(-1) / 2  # (N,M,1) LazyTensor

        elif (
            backend == "pytorch"
        ):  # C_ij is a *full* Tensor, with a quadratic memory footprint
            # N.B.: The separable implementation below is slightly more efficient than:
            # C_ij = ((x_i[:,None,:] - y_j[None,:,:]) ** 2).sum(-1) / 2

            D_xx = (x_i**2).sum(-1)[:, None]  # (N,1)
            D_xy = x_i @ y_j.t()  # (N,D)@(D,M) = (N,M)
            D_yy = (y_j**2).sum(-1)[None, :]  # (1,M)
            C_ij = (D_xx + D_yy) / 2 - D_xy  # (N,M) matrix of halved squared distances

            C_ij = C_ij[:, :, None]  # reshape as a (N,M,1) Tensor

        # Setup the dual variables -------------------------------------------------
        eps = blur**2  # "Temperature" epsilon associated to our blurring scale
        F_i, G_j = torch.zeros_like(loga_i), torch.zeros_like(
            logb_j
        )  # (scaled) dual vectors

        # Sinkhorn loop = coordinate ascent on the dual maximization problem -------
        for _ in range(nits):
            F_i = -((-C_ij / eps + (G_j + logb_j))).logsumexp(dim=1)[:, None, :]
            G_j = -((-C_ij / eps + (F_i + loga_i))).logsumexp(dim=0)[None, :, :]

        # Return the dual vectors F and G, sampled on the x_i's and y_j's respectively:
        return eps * F_i, eps * G_j


    # Create a sinkhorn_solver "layer" with the same signature as SamplesLoss:
    from functools import partial

    sinkhorn_solver = lambda blur, nits, backend: partial(
        sinkhorn_loop, blur=blur, nits=nits, backend=backend
    )









.. GENERATED FROM PYTHON SOURCE LINES 131-139

Benchmarking loops
------------------------

As usual, writing up a proper benchmark requires a lot of verbose,
not-so-interesting code. For the sake of readabiliity, we abstracted such routines
in a separate :doc:`file <./benchmarks_ot_solvers>`
where error functions, timers and Wasserstein distances are properly defined.
Feel free to have a look!

.. GENERATED FROM PYTHON SOURCE LINES 139-146

.. code-block:: Python



    from geomloss.examples.performances.benchmarks_ot_solvers import (
        benchmark_solver,
        benchmark_solvers,
    )








.. GENERATED FROM PYTHON SOURCE LINES 147-153

The GeomLoss routines rely on a **scaling** parameter to tune
the tradeoff between **speed** (scaling :math:`\rightarrow` 0)
and **accuracy** (scaling :math:`\rightarrow` 1).
Meanwhile, the Sinkhorn loop is directly controlled
by a **number of iterations** that should be chosen with respect to
the available time budget.

.. GENERATED FROM PYTHON SOURCE LINES 153-227

.. code-block:: Python



    def full_benchmark(source, target, blur, maxtime=None):
        # Compute a suitable "ground truth" ----------------------------------------
        OT_solver = SamplesLoss(
            "sinkhorn",
            p=2,
            blur=blur,
            backend="online",
            scaling=0.999,
            debias=False,
            potentials=True,
        )
        _, _, ground_truth = benchmark_solver(OT_solver, blur, sources[0], targets[0])

        results = {}  # Dict of "timings vs errors" arrays

        # Compute statistics for the three backends of GeomLoss: -------------------

        for name in ["multiscale-1", "multiscale-5", "online", "tensorized"]:
            if name == "multiscale-1":
                backend, truncate = "multiscale", 1  # Aggressive "kernel truncation" scheme
            elif name == "multiscale-5":
                backend, truncate = "multiscale", 5  # Safe, default truncation rule
            else:
                backend, truncate = name, None

            OT_solvers = [
                SamplesLoss(
                    "sinkhorn",
                    p=2,
                    blur=blur,
                    scaling=scaling,
                    truncate=truncate,
                    backend=backend,
                    debias=False,
                    potentials=True,
                )
                for scaling in [0.5, 0.6, 0.7, 0.8, 0.9, 0.95, 0.99]
            ]

            results[name] = benchmark_solvers(
                "GeomLoss - " + name,
                OT_solvers,
                source,
                target,
                ground_truth,
                blur=blur,
                display=False,
                maxtime=maxtime,
            )

        # Compute statistics for a naive Sinkhorn loop -----------------------------

        for backend in ["pytorch", "keops"]:
            OT_solvers = [
                sinkhorn_solver(blur, nits=nits, backend=backend)
                for nits in [5, 10, 20, 50, 100, 200, 500, 1000, 2000, 5000, 10000]
            ]

            results[backend] = benchmark_solvers(
                "Sinkhorn loop - " + backend,
                OT_solvers,
                source,
                target,
                ground_truth,
                blur=blur,
                display=False,
                maxtime=maxtime,
            )

        return results, ground_truth









.. GENERATED FROM PYTHON SOURCE LINES 228-231

Having solved the entropic OT problem with dozens of configurations,
we will display our results in an "error vs timing" log-log plot:


.. GENERATED FROM PYTHON SOURCE LINES 231-280

.. code-block:: Python



    def display_statistics(title, results, ground_truth, maxtime=None):
        """Displays a "error vs timing" plot in log-log scale."""

        curves = [
            ("pytorch", "Sinkhorn loop - PyTorch backend"),
            ("keops", "Sinkhorn loop - KeOps backend"),
            ("tensorized", "Sinkhorn with Îµ-scaling - PyTorch backend"),
            ("online", "Sinkhorn with Îµ-scaling - KeOps backend"),
            ("multiscale-5", "Sinkhorn multiscale - truncate=5 (safe)"),
            ("multiscale-1", "Sinkhorn multiscale - truncate=1 (fast)"),
        ]

        fig = plt.figure(figsize=(12, 8))
        ax = fig.subplots()
        ax.set_title(title)
        ax.set_ylabel("Relative error made on the entropic Wasserstein distance")
        ax.set_yscale("log")
        ax.set_ylim(top=1e-1, bottom=1e-3)
        ax.set_xlabel("Time (s)")
        ax.set_xscale("log")
        ax.set_xlim(left=1e-3, right=maxtime)

        ax.grid(True, which="major", linestyle="-")
        ax.grid(True, which="minor", linestyle="dotted")

        for key, name in curves:
            timings, errors, costs = results[key]
            ax.plot(timings, np.abs(costs - ground_truth), label=name)

        ax.legend(loc="upper right")


    def full_statistics(source, target, blur=0.01, maxtime=None):
        results, ground_truth = full_benchmark(source, target, blur, maxtime=maxtime)

        display_statistics(
            "Solving a {:,}-by-{:,} OT problem, with a blurring scale Ïƒ = {:}".format(
                len(source[0]), len(target[0]), blur
            ),
            results,
            ground_truth,
            maxtime=maxtime,
        )

        return results, ground_truth









.. GENERATED FROM PYTHON SOURCE LINES 281-287

Building our dataset
----------------------------

Our **source measures**: unit spheres, sampled with (roughly) the same number of points
as the target meshes:


.. GENERATED FROM PYTHON SOURCE LINES 287-292

.. code-block:: Python


    from geomloss.examples.performances.benchmarks_ot_solvers import create_sphere

    sources = [create_sphere(npoints) for npoints in [1e4, 5e4, 2e5, 8e5]]








.. GENERATED FROM PYTHON SOURCE LINES 293-294

Then, we fetch our target models from the Stanford repository:

.. GENERATED FROM PYTHON SOURCE LINES 294-309

.. code-block:: Python


    import os

    if not os.path.exists("data/dragon_recon/dragon_vrip_res4.ply"):
        import urllib.request

        urllib.request.urlretrieve(
            "http://graphics.stanford.edu/pub/3Dscanrep/dragon/dragon_recon.tar.gz",
            "data/dragon.tar.gz",
        )

        import shutil

        shutil.unpack_archive("data/dragon.tar.gz", "data")








.. GENERATED FROM PYTHON SOURCE LINES 310-312

To read the raw ``.ply`` ascii files, we rely on the
`plyfile <https://github.com/dranjan/python-plyfile>`_ package:

.. GENERATED FROM PYTHON SOURCE LINES 312-319

.. code-block:: Python


    from geomloss.examples.performances.benchmarks_ot_solvers import (
        load_ply_file,
        display_cloud,
    )









.. GENERATED FROM PYTHON SOURCE LINES 320-325

Our meshes are encoded using **one weighted Dirac mass per triangle**.
To keep things simple, we use as **targets** the subsamplings provided
in the reference Stanford archive. Feel free to re-run
this script with your own models!


.. GENERATED FROM PYTHON SOURCE LINES 325-338

.. code-block:: Python


    # N.B.: Since Plyfile is far from being optimized, this may take some time!
    targets = [
        load_ply_file(fname, offset=[-0.011, 0.109, -0.008], scale=0.04)
        for fname in [
            "data/dragon_recon/dragon_vrip_res4.ply",  # ~ 10,000 triangles
            "data/dragon_recon/dragon_vrip_res3.ply",  # ~ 50,000 triangles
            "data/dragon_recon/dragon_vrip_res2.ply",  # ~200,000 triangles
            #'data/dragon_recon/dragon_vrip.ply',     # ~800,000 triangles
        ]
    ]






.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    File loaded, and encoded as the weighted sum of 11,102 atoms in 3D.
    File loaded, and encoded as the weighted sum of 47,794 atoms in 3D.
    File loaded, and encoded as the weighted sum of 202,520 atoms in 3D.




.. GENERATED FROM PYTHON SOURCE LINES 339-341

Finally, if we don't have access to a GPU, we subsample point clouds
while making sure that weights still sum up to one:

.. GENERATED FROM PYTHON SOURCE LINES 341-355

.. code-block:: Python



    def subsample(measure, decimation=500):
        weights, locations = measure
        weights, locations = weights[::decimation], locations[::decimation]
        weights = weights / weights.sum()
        return weights.contiguous(), locations.contiguous()


    if not use_cuda:
        sources = [subsample(s) for s in sources]
        targets = [subsample(t) for t in targets]









.. GENERATED FROM PYTHON SOURCE LINES 356-359

In this simple benchmark, we will only use the **coarse** and **medium** resolutions
of our meshes: 200,000 points should be more than enough to compute
sensible approximations of the Wasserstein distance between the Stanford dragon and a unit sphere!

.. GENERATED FROM PYTHON SOURCE LINES 359-387

.. code-block:: Python



    fig = plt.figure(figsize=(12, 12))
    ax = fig.add_subplot(1, 1, 1, projection="3d")
    display_cloud(ax, sources[0], "red")
    display_cloud(ax, targets[0], "blue")
    ax.set_title(
        "Low resolution dataset:\n"
        + "Source (N={:,}) and target (M={:,}) point clouds".format(
            len(sources[0][0]), len(targets[0][0])
        )
    )
    plt.tight_layout()

    # sphinx_gallery_thumbnail_number = 2
    fig = plt.figure(figsize=(12, 12))
    ax = fig.add_subplot(1, 1, 1, projection="3d")
    display_cloud(ax, sources[2], "red")
    display_cloud(ax, targets[2], "blue")
    ax.set_title(
        "Medium resolution dataset:\n"
        + "Source (N={:,}) and target (M={:,}) point clouds".format(
            len(sources[2][0]), len(targets[2][0])
        )
    )
    plt.tight_layout()





.. rst-class:: sphx-glr-horizontal


    *

      .. image-sg:: /_auto_examples/performances/images/sphx_glr_plot_benchmarks_ot_3D_001.png
         :alt: Low resolution dataset: Source (N=10,000) and target (M=11,102) point clouds
         :srcset: /_auto_examples/performances/images/sphx_glr_plot_benchmarks_ot_3D_001.png
         :class: sphx-glr-multi-img

    *

      .. image-sg:: /_auto_examples/performances/images/sphx_glr_plot_benchmarks_ot_3D_002.png
         :alt: Medium resolution dataset: Source (N=200,000) and target (M=202,520) point clouds
         :srcset: /_auto_examples/performances/images/sphx_glr_plot_benchmarks_ot_3D_002.png
         :class: sphx-glr-multi-img





.. GENERATED FROM PYTHON SOURCE LINES 388-459

Benchmarks
----------------------------

**Choosing a temperature.**
Understood as a smooth generalization of the standard theory
of `auctions, <https://en.wikipedia.org/wiki/Auction_algorithm>`_
entropic regularization allows us to compute **tractable approximations of the Wasserstein distance**
on the GPU.

The level of approximation is set using a single parameter,
the **temperature** :math:`\varepsilon > 0` which is homogeneous to the cost function :math:`\text{C}`:
with a number of iterations that scales roughly in

.. math::
  \begin{cases}
      O \Big( \frac{ \max_{i,j}\text{C}(x_i,y_j) }{ \varepsilon } \Big)
      &  \text{with the Sinkhorn and Auction algorithms} \\
      O \Big( \log \Big( \frac{ \max_{i,j}\text{C}(x_i,y_j) }{ \varepsilon } \Big) \Big)
      &  \text{using an $\varepsilon$-scaling annealing strategy,}
  \end{cases}

we may compute an approximation :math:`\text{OT}_\varepsilon` of the transport
cost with precision :math:`\simeq \varepsilon`.

**Choosing a blurring scale.**
In practice, when :math:`\text{C}(x,y) = \tfrac{1}{p}\|x-y\|^p` is the standard Wasserstein cost,
the temperature :math:`\varepsilon` is best understood through its p-th root:

.. math::
  \sigma ~=~ \sqrt[p]{\varepsilon},

the **blurring scale** of the (Laplacian if p=1, Gaussian if p=2) **Gibbs kernel**

.. math::
  k_\varepsilon(x,y) ~=~ \exp(-\text{C}(x,y)/\varepsilon)

through which the Sinkhorn algorithm interacts with our weighted point clouds.
According to the heuristics presented above, we may thus expect to solve a regularized
:math:`\text{OT}_\varepsilon` problem in

.. math::
  \begin{cases}
      O \big( ( \text{D} / \sigma )^p \big)
      &  \text{with the Sinkhorn and Auction algorithms} \\
      O \big( \log ( \text{D} / \sigma ) \big)
      &  \text{using an $\varepsilon$-scaling annealing strategy,}
  \end{cases}

with :math:`\text{D} = \max_{i,j}\|x_i-y_j\|` the **diameter** of our configuration.
We now focus on the case where **p=2**, which provides the most useful gradients
in geometric shape analysis, and discuss the performances of our routines as we change
the blurring scale :math:`\sigma = \sqrt{\varepsilon}` and the
number of samples :math:`\sqrt{MN}`.


High-temperature OT
~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Cuturi-like setting.** A `current trend <https://arxiv.org/abs/1306.0895>`_ in Machine Learning
is to rely on **large blurring scales** to compute low-resolution gradients:
**giving up on precision** is understood as a way of
`becoming robust <https://arxiv.org/abs/1810.02733>`_ to **sampling noise**
in high dimensions.

Judging from the pictures above, the Wasserstein distance between our unit sphere
and the Stanford dragon should be **of order 1** and most likely close to 0.5.
Consequently, a blurring scale set to :math:`\sigma = \texttt{0.1}`,
that corresponds to a temperature :math:`\varepsilon = \sigma^p = \texttt{0.01}`,
should allow us to emulate the typical regime of the current Machine Learning literature.



.. GENERATED FROM PYTHON SOURCE LINES 459-466

.. code-block:: Python



    maxtime = 100 if use_cuda else 1

    full_statistics(sources[0], targets[0], blur=0.10, maxtime=maxtime)





.. image-sg:: /_auto_examples/performances/images/sphx_glr_plot_benchmarks_ot_3D_003.png
   :alt: Solving a 10,000-by-11,102 OT problem, with a blurring scale Ïƒ = 0.1
   :srcset: /_auto_examples/performances/images/sphx_glr_plot_benchmarks_ot_3D_003.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Benchmarking the "GeomLoss - multiscale-1" family of OT solvers - ground truth = 0.560432:
    1-th solver : t = 0.0105, error on the constraints = 0.261, cost = 0.557364
    2-th solver : t = 0.0110, error on the constraints = 0.295, cost = 0.557379
    3-th solver : t = 0.0133, error on the constraints = 0.090, cost = 0.558416
    4-th solver : t = 0.0172, error on the constraints = 0.071, cost = 0.559113
    5-th solver : t = 0.0284, error on the constraints = 0.046, cost = 0.559866
    6-th solver : t = 0.0514, error on the constraints = 0.027, cost = 0.560243
    7-th solver : t = 0.2345, error on the constraints = 0.006, cost = 0.560432

    Benchmarking the "GeomLoss - multiscale-5" family of OT solvers - ground truth = 0.560432:
    1-th solver : t = 0.0110, error on the constraints = 0.123, cost = 0.557271
    2-th solver : t = 0.0116, error on the constraints = 0.113, cost = 0.557242
    3-th solver : t = 0.0145, error on the constraints = 0.090, cost = 0.558409
    4-th solver : t = 0.0184, error on the constraints = 0.071, cost = 0.559108
    5-th solver : t = 0.0295, error on the constraints = 0.046, cost = 0.559849
    6-th solver : t = 0.0540, error on the constraints = 0.027, cost = 0.560227
    7-th solver : t = 0.2462, error on the constraints = 0.006, cost = 0.560422

    Benchmarking the "GeomLoss - online" family of OT solvers - ground truth = 0.560432:
    1-th solver : t = 0.0125, error on the constraints = 0.260, cost = 0.555675
    2-th solver : t = 0.0150, error on the constraints = 0.169, cost = 0.556500
    3-th solver : t = 0.0186, error on the constraints = 0.120, cost = 0.557636
    4-th solver : t = 0.0258, error on the constraints = 0.083, cost = 0.558731
    5-th solver : t = 0.0479, error on the constraints = 0.048, cost = 0.559797
    6-th solver : t = 0.0929, error on the constraints = 0.026, cost = 0.560236
    7-th solver : t = 0.4479, error on the constraints = 0.006, cost = 0.560422

    Benchmarking the "GeomLoss - tensorized" family of OT solvers - ground truth = 0.560432:
    1-th solver : t = 0.1144, error on the constraints = 0.260, cost = 0.555675
    2-th solver : t = 0.1351, error on the constraints = 0.169, cost = 0.556500
    3-th solver : t = 0.1664, error on the constraints = 0.120, cost = 0.557635
    4-th solver : t = 0.2291, error on the constraints = 0.083, cost = 0.558731
    5-th solver : t = 0.4174, error on the constraints = 0.048, cost = 0.559797
    6-th solver : t = 0.8040, error on the constraints = 0.026, cost = 0.560236
    7-th solver : t = 3.8460, error on the constraints = 0.006, cost = 0.560422

    Benchmarking the "Sinkhorn loop - pytorch" family of OT solvers - ground truth = 0.560432:
    1-th solver : t = 0.0667, error on the constraints = 0.141, cost = 0.552788
    2-th solver : t = 0.1296, error on the constraints = 0.080, cost = 0.557057
    3-th solver : t = 0.2554, error on the constraints = 0.038, cost = 0.559497
    4-th solver : t = 0.6329, error on the constraints = 0.008, cost = 0.560371
    5-th solver : t = 1.2620, error on the constraints = 0.002, cost = 0.560429
    6-th solver : t = 2.5202, error on the constraints = 0.000, cost = 0.560432
    7-th solver : t = 6.2953, error on the constraints = 0.000, cost = 0.560432
    8-th solver : t = 12.5874, error on the constraints = 0.000, cost = 0.560432
    9-th solver : t = 25.1715, error on the constraints = 0.000, cost = 0.560432
    10-th solver : t = 64.1822, error on the constraints = 0.000, cost = 0.560432
    11-th solver : t = 129.5262, error on the constraints = 0.000, cost = 0.560432

    Benchmarking the "Sinkhorn loop - keops" family of OT solvers - ground truth = 0.560432:
    1-th solver : t = 0.0086, error on the constraints = 0.141, cost = 0.552788
    2-th solver : t = 0.0158, error on the constraints = 0.080, cost = 0.557057
    3-th solver : t = 0.0300, error on the constraints = 0.038, cost = 0.559497
    4-th solver : t = 0.0730, error on the constraints = 0.008, cost = 0.560371
    5-th solver : t = 0.1446, error on the constraints = 0.002, cost = 0.560429
    6-th solver : t = 0.2878, error on the constraints = 0.000, cost = 0.560432
    7-th solver : t = 0.7197, error on the constraints = 0.000, cost = 0.560432
    8-th solver : t = 1.4372, error on the constraints = 0.000, cost = 0.560432
    9-th solver : t = 2.8761, error on the constraints = 0.000, cost = 0.560432
    10-th solver : t = 7.1898, error on the constraints = 0.000, cost = 0.560432
    11-th solver : t = 14.3731, error on the constraints = 0.000, cost = 0.560432


    ({'multiscale-1': (array([0.01053047, 0.01100445, 0.01330638, 0.01717877, 0.02838874,
           0.05144119, 0.23454189]), array([0.26076141, 0.2952345 , 0.08960719, 0.07076596, 0.04605596,
           0.02667859, 0.0062393 ]), array([0.5573644 , 0.55737931, 0.55841631, 0.5591132 , 0.55986607,
           0.56024343, 0.56043231])), 'multiscale-5': (array([0.0109694 , 0.01157594, 0.01447248, 0.01842093, 0.02949882,
           0.05403519, 0.24620175]), array([0.12280507, 0.1125337 , 0.08969046, 0.07090177, 0.04620221,
           0.02672148, 0.00594249]), array([0.55727118, 0.55724192, 0.5584088 , 0.55910838, 0.55984944,
           0.56022739, 0.56042194])), 'online': (array([0.01245022, 0.01499605, 0.01864505, 0.02582312, 0.04789257,
           0.09294987, 0.44788909]), array([0.25995392, 0.16854912, 0.12009196, 0.08318655, 0.04838723,
           0.02601784, 0.00584987]), array([0.55567539, 0.5564999 , 0.55763555, 0.55873132, 0.55979651,
           0.56023645, 0.56042194])), 'tensorized': (array([0.11442351, 0.13506746, 0.16642499, 0.2291019 , 0.41735315,
           0.80398893, 3.84603977]), array([0.25995362, 0.16854897, 0.12009189, 0.08318652, 0.04838721,
           0.02601783, 0.00584986]), array([0.55567539, 0.5564999 , 0.55763549, 0.55873132, 0.55979657,
           0.56023645, 0.56042194])), 'pytorch': (array([6.67490959e-02, 1.29572630e-01, 2.55368233e-01, 6.32892847e-01,
           1.26195574e+00, 2.52017069e+00, 6.29529166e+00, 1.25873525e+01,
           2.51714835e+01, 6.41821637e+01, 1.29526191e+02]), array([1.40665218e-01, 8.03463757e-02, 3.80394422e-02, 7.57312402e-03,
           1.53424859e-03, 1.33747730e-04, 7.94345283e-07, 7.11446148e-07,
           7.06059154e-07, 7.05596847e-07, 7.02678108e-07]), array([0.55278832, 0.55705744, 0.55949718, 0.56037122, 0.56042886,
           0.56043214, 0.5604322 , 0.5604322 , 0.5604322 , 0.5604322 ,
           0.5604322 ])), 'keops': (array([8.57734680e-03, 1.58038139e-02, 2.99594402e-02, 7.29801655e-02,
           1.44600630e-01, 2.87828684e-01, 7.19702721e-01, 1.43718958e+00,
           2.87614989e+00, 7.18981481e+00, 1.43730516e+01]), array([1.40665025e-01, 8.03461745e-02, 3.80392149e-02, 7.57289119e-03,
           1.53401680e-03, 1.33516514e-04, 4.35923482e-07, 2.98747324e-07,
           2.98590635e-07, 2.96990834e-07, 2.95383785e-07]), array([0.55278832, 0.55705744, 0.55949718, 0.56037116, 0.56042892,
           0.5604322 , 0.5604322 , 0.56043226, 0.5604322 , 0.5604322 ,
           0.5604322 ]))}, 0.5604320764541626)



.. GENERATED FROM PYTHON SOURCE LINES 467-492

**Breakdown of the results.**
When the diameter-to-blur ratio :math:`D/\sigma` is of order 10, as is often the case in ML,
the baseline Sinkhorn algorithm **works just fine**.

As discussed in our `AiStats 2019 paper <https://arxiv.org/abs/1810.08278>`_,
improvements in this regime mostly come down to a clever low-level implementation of the
SoftMin reduction, abstracted in the `KeOps library <https://www.kernel-operations.io>`_:
**Switching from PyTorch to KeOps allows us to get a x10 speed-up and break the memory bottleneck,
but scaling strategies are overkill for this simple, low-resolution problem.**

.. note::
  When

Low-temperature OT
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Graphics-like setting.**
Keep in mind, though, that the performances of the baseline Sinkhorn loop
**completely break down** as we try to reduce our blurring scale :math:`\sigma`.
In Computer Graphics and Medical Imaging, a realistic use-case is to pick
a diameter-to-blur ratio :math:`D/\sigma` of order 100, which
lets us take into account the **detailed features of our shapes**:
for normalized point clouds, a value of :math:`\sigma = \texttt{0.01}`
-- that corresponds to a temperature :math:`\varepsilon = \sigma^p = \texttt{0.0001}` --
is a sensible pick.

.. GENERATED FROM PYTHON SOURCE LINES 492-495

.. code-block:: Python


    full_statistics(sources[0], targets[0], blur=0.01, maxtime=maxtime)




.. image-sg:: /_auto_examples/performances/images/sphx_glr_plot_benchmarks_ot_3D_004.png
   :alt: Solving a 10,000-by-11,102 OT problem, with a blurring scale Ïƒ = 0.01
   :srcset: /_auto_examples/performances/images/sphx_glr_plot_benchmarks_ot_3D_004.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Benchmarking the "GeomLoss - multiscale-1" family of OT solvers - ground truth = 0.468990:
    1-th solver : t = 0.0132, error on the constraints = nan, cost = 0.462160
    2-th solver : t = 0.0143, error on the constraints = nan, cost = 0.462864
    3-th solver : t = 0.0190, error on the constraints = 0.239, cost = 0.464819
    4-th solver : t = 0.0266, error on the constraints = 0.156, cost = 0.466339
    5-th solver : t = 0.0486, error on the constraints = 16.109, cost = 0.467900
    6-th solver : t = 0.0919, error on the constraints = 3.776, cost = 0.468620
    7-th solver : t = 0.4464, error on the constraints = 0.016, cost = 0.468973

    Benchmarking the "GeomLoss - multiscale-5" family of OT solvers - ground truth = 0.468990:
    1-th solver : t = 0.0147, error on the constraints = 1.757, cost = 0.462076
    2-th solver : t = 0.0156, error on the constraints = 0.391, cost = 0.462769
    3-th solver : t = 0.0217, error on the constraints = 0.239, cost = 0.464817
    4-th solver : t = 0.0305, error on the constraints = 0.156, cost = 0.466336
    5-th solver : t = 0.0541, error on the constraints = 0.096, cost = 0.467896
    6-th solver : t = 0.1031, error on the constraints = 0.058, cost = 0.468618
    7-th solver : t = 0.5025, error on the constraints = 0.016, cost = 0.468973

    Benchmarking the "GeomLoss - online" family of OT solvers - ground truth = 0.468990:
    1-th solver : t = 0.0161, error on the constraints = 39.025, cost = 0.454610
    2-th solver : t = 0.0197, error on the constraints = 0.656, cost = 0.459198
    3-th solver : t = 0.0259, error on the constraints = 0.301, cost = 0.462893
    4-th solver : t = 0.0379, error on the constraints = 0.170, cost = 0.465662
    5-th solver : t = 0.0745, error on the constraints = 0.097, cost = 0.467822
    6-th solver : t = 0.1467, error on the constraints = 0.058, cost = 0.468628
    7-th solver : t = 0.7279, error on the constraints = 0.016, cost = 0.468972

    Benchmarking the "GeomLoss - tensorized" family of OT solvers - ground truth = 0.468990:
    1-th solver : t = 0.1455, error on the constraints = 39.001, cost = 0.454610
    2-th solver : t = 0.1769, error on the constraints = 0.656, cost = 0.459198
    3-th solver : t = 0.2291, error on the constraints = 0.301, cost = 0.462893
    4-th solver : t = 0.3336, error on the constraints = 0.170, cost = 0.465662
    5-th solver : t = 0.6473, error on the constraints = 0.097, cost = 0.467822
    6-th solver : t = 1.2640, error on the constraints = 0.058, cost = 0.468628
    7-th solver : t = 6.2393, error on the constraints = 0.016, cost = 0.468972

    Benchmarking the "Sinkhorn loop - pytorch" family of OT solvers - ground truth = 0.468990:
    1-th solver : t = 0.0667, error on the constraints = 1.452, cost = 0.425475
    2-th solver : t = 0.1296, error on the constraints = 1.114, cost = 0.427754
    3-th solver : t = 0.2554, error on the constraints = 0.801, cost = 0.431099
    4-th solver : t = 0.6329, error on the constraints = 0.518, cost = 0.437559
    5-th solver : t = 1.2620, error on the constraints = 0.377, cost = 0.444029
    6-th solver : t = 2.5204, error on the constraints = 0.261, cost = 0.451269
    7-th solver : t = 6.2951, error on the constraints = 0.144, cost = 0.460108
    8-th solver : t = 12.5871, error on the constraints = 0.083, cost = 0.464981
    9-th solver : t = 25.1729, error on the constraints = 0.039, cost = 0.467867
    10-th solver : t = 68.9220, error on the constraints = 0.008, cost = 0.468915
    11-th solver : t = 129.5199, error on the constraints = 0.002, cost = 0.468986

    Benchmarking the "Sinkhorn loop - keops" family of OT solvers - ground truth = 0.468990:
    1-th solver : t = 0.0097, error on the constraints = 1.452, cost = 0.425475
    2-th solver : t = 0.0178, error on the constraints = 1.114, cost = 0.427754
    3-th solver : t = 0.0319, error on the constraints = 0.801, cost = 0.431099
    4-th solver : t = 0.0756, error on the constraints = 0.518, cost = 0.437559
    5-th solver : t = 0.1484, error on the constraints = 0.377, cost = 0.444029
    6-th solver : t = 0.2884, error on the constraints = 0.261, cost = 0.451269
    7-th solver : t = 0.7195, error on the constraints = 0.144, cost = 0.460108
    8-th solver : t = 1.4384, error on the constraints = 0.083, cost = 0.464981
    9-th solver : t = 2.8767, error on the constraints = 0.039, cost = 0.467867
    10-th solver : t = 7.1952, error on the constraints = 0.008, cost = 0.468915
    11-th solver : t = 14.3821, error on the constraints = 0.002, cost = 0.468986


    ({'multiscale-1': (array([0.01318431, 0.01434302, 0.01900959, 0.02657843, 0.04857612,
           0.09189606, 0.44637704]), array([           nan,            nan, 2.38995165e-01, 1.56047165e-01,
           1.61094246e+01, 3.77587056e+00, 1.55187026e-02]), array([0.46216011, 0.4628644 , 0.46481892, 0.46633884, 0.46789974,
           0.46862018, 0.46897256])), 'multiscale-5': (array([0.01467609, 0.01564527, 0.02169228, 0.03050208, 0.054075  ,
           0.10308123, 0.5025456 ]), array([1.75680923, 0.39094821, 0.23930968, 0.15618071, 0.0958772 ,
           0.05807081, 0.01552534]), array([0.46207568, 0.46276882, 0.46481746, 0.46633554, 0.4678964 ,
           0.46861836, 0.46897256])), 'online': (array([0.01614213, 0.01974177, 0.02586937, 0.03794026, 0.07445025,
           0.1467483 , 0.72789216]), array([3.90252037e+01, 6.55852556e-01, 3.01485211e-01, 1.70307517e-01,
           9.72798020e-02, 5.77256940e-02, 1.55339371e-02]), array([0.45461038, 0.45919755, 0.4628931 , 0.46566194, 0.46782231,
           0.46862802, 0.46897218])), 'tensorized': (array([0.14554787, 0.17689705, 0.22912288, 0.33364987, 0.64725423,
           1.26395082, 6.23932481]), array([3.90007858e+01, 6.55819058e-01, 3.01475048e-01, 1.70304835e-01,
           9.72797275e-02, 5.77267855e-02, 1.55369919e-02]), array([0.45461038, 0.45919755, 0.4628931 , 0.46566191, 0.46782231,
           0.46862802, 0.46897218])), 'pytorch': (array([6.67350292e-02, 1.29570246e-01, 2.55406380e-01, 6.32869244e-01,
           1.26200891e+00, 2.52043915e+00, 6.29508018e+00, 1.25871425e+01,
           2.51729498e+01, 6.89219708e+01, 1.29519886e+02]), array([1.45177937, 1.11441219, 0.80101037, 0.51840085, 0.37667969,
           0.26148969, 0.14429922, 0.08290579, 0.03945083, 0.00801208,
           0.00176246]), array([0.42547518, 0.42775351, 0.43109876, 0.43755883, 0.44402876,
           0.4512693 , 0.46010789, 0.46498135, 0.46786663, 0.46891472,
           0.46898565])), 'keops': (array([9.72986221e-03, 1.77657604e-02, 3.18520069e-02, 7.55918026e-02,
           1.48354053e-01, 2.88381100e-01, 7.19487190e-01, 1.43836379e+00,
           2.87672758e+00, 7.19517279e+00, 1.43820741e+01]), array([1.45174754, 1.11434114, 0.8009215 , 0.51827741, 0.3765409 ,
           0.26135418, 0.14416543, 0.08277068, 0.03931752, 0.00787629,
           0.00162181]), array([0.42547518, 0.42775351, 0.43109873, 0.43755886, 0.44402876,
           0.45126933, 0.46010789, 0.46498132, 0.46786663, 0.46891475,
           0.46898565]))}, 0.4689895510673523)



.. GENERATED FROM PYTHON SOURCE LINES 496-509

**Breakdown of the results.** As expected, dividing by ten the blurring scale :math:`\sigma`
leads to a **100-fold increase** in the number of iterations needed by the (simple) Sinkhorn loop...
whereas routines that relied on :math:`\varepsilon`-scaling
only experienced a 2-fold slow-down!
Well documented for entropic OT since the
`90's <https://www.ics.uci.edu/~welling/teaching/271fall09/InvidibleHandAlg.pdf>`_,
the use of annealing strategies
is thus **critical** as soon as some level of accuracy is required.

Going further, **adaptive clustering strategies** allow us to break the
:math:`O(NM)` complexity of exact SoftMin reductions, as discussed in
:doc:`previous tutorials <../sinkhorn_multiscale/plot_kernel_truncation>`:


.. GENERATED FROM PYTHON SOURCE LINES 509-513

.. code-block:: Python



    full_statistics(sources[2], targets[2], blur=0.01, maxtime=maxtime)




.. image-sg:: /_auto_examples/performances/images/sphx_glr_plot_benchmarks_ot_3D_005.png
   :alt: Solving a 200,000-by-202,520 OT problem, with a blurring scale Ïƒ = 0.01
   :srcset: /_auto_examples/performances/images/sphx_glr_plot_benchmarks_ot_3D_005.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Benchmarking the "GeomLoss - multiscale-1" family of OT solvers - ground truth = 0.468990:
    1-th solver : t = 0.1416, error on the constraints = nan, cost = 0.460679
    2-th solver : t = 0.1176, error on the constraints = nan, cost = 0.461384
    3-th solver : t = 0.2324, error on the constraints = nan, cost = 0.463323
    4-th solver : t = 0.3330, error on the constraints = nan, cost = 0.464781
    5-th solver : t = 0.5262, error on the constraints = nan, cost = 0.466266
    6-th solver : t = 1.0184, error on the constraints = nan, cost = 0.466942
    7-th solver : t = 5.1198, error on the constraints = nan, cost = 0.467265

    Benchmarking the "GeomLoss - multiscale-5" family of OT solvers - ground truth = 0.468990:
    1-th solver : t = 0.3016, error on the constraints = 1.283, cost = 0.460613
    2-th solver : t = 0.2210, error on the constraints = 0.321, cost = 0.461309
    3-th solver : t = 0.4529, error on the constraints = 0.211, cost = 0.463319
    4-th solver : t = 0.6446, error on the constraints = 0.140, cost = 0.464776
    5-th solver : t = 0.9687, error on the constraints = 0.081, cost = 0.466257
    6-th solver : t = 1.8939, error on the constraints = 0.045, cost = 0.466935
    7-th solver : t = 9.8059, error on the constraints = 0.010, cost = 0.467263

    Benchmarking the "GeomLoss - online" family of OT solvers - ground truth = 0.468990:
    1-th solver : t = 1.4962, error on the constraints = 14.011, cost = 0.452836
    2-th solver : t = 1.8434, error on the constraints = 0.591, cost = 0.457689
    3-th solver : t = 2.4284, error on the constraints = 0.275, cost = 0.461453
    4-th solver : t = 3.5823, error on the constraints = 0.155, cost = 0.464145
    5-th solver : t = 7.0574, error on the constraints = 0.083, cost = 0.466189
    6-th solver : t = 14.0128, error on the constraints = 0.045, cost = 0.466943
    7-th solver : t = 69.2284, error on the constraints = 0.010, cost = 0.467263

    Benchmarking the "GeomLoss - tensorized" family of OT solvers - ground truth = 0.468990:
    ** Memory overflow ! **

    Benchmarking the "Sinkhorn loop - pytorch" family of OT solvers - ground truth = 0.468990:
    ** Memory overflow ! **

    Benchmarking the "Sinkhorn loop - keops" family of OT solvers - ground truth = 0.468990:
    1-th solver : t = 0.5937, error on the constraints = 1.493, cost = 0.421998
    2-th solver : t = 1.1950, error on the constraints = 1.072, cost = 0.424401
    3-th solver : t = 2.3842, error on the constraints = 0.767, cost = 0.427872
    4-th solver : t = 5.9511, error on the constraints = 0.516, cost = 0.434530
    5-th solver : t = 11.9135, error on the constraints = 0.379, cost = 0.441180
    6-th solver : t = 23.8156, error on the constraints = 0.265, cost = 0.448629
    7-th solver : t = 59.5841, error on the constraints = 0.148, cost = 0.457788
    8-th solver : t = 119.1237, error on the constraints = 0.086, cost = 0.462946


    ({'multiscale-1': (array([0.14156318, 0.1175878 , 0.23240829, 0.33296037, 0.52623773,
           1.01844049, 5.11978173]), array([nan, nan, nan, nan, nan, nan, nan]), array([0.46067935, 0.46138442, 0.46332321, 0.46478114, 0.46626621,
           0.46694231, 0.46726528])), 'multiscale-5': (array([0.30157232, 0.22103357, 0.45292592, 0.64463544, 0.96865988,
           1.89390874, 9.80593014]), array([1.28348196, 0.32087681, 0.21082613, 0.1398233 , 0.08148509,
           0.04524117, 0.00959368]), array([0.4606134 , 0.46130911, 0.46331885, 0.46477553, 0.46625695,
           0.46693456, 0.46726319])), 'online': (array([ 1.49621391,  1.84342909,  2.42844558,  3.58226895,  7.05740047,
           14.01280451, 69.22838378]), array([1.40106440e+01, 5.90897143e-01, 2.74631143e-01, 1.54590920e-01,
           8.32118616e-02, 4.48429361e-02, 9.59691778e-03]), array([0.45283607, 0.45768943, 0.46145332, 0.46414524, 0.46618912,
           0.4669432 , 0.46726292])), 'tensorized': (array([nan, nan, nan, nan, nan, nan, nan]), array([nan, nan, nan, nan, nan, nan, nan]), array([nan, nan, nan, nan, nan, nan, nan])), 'pytorch': (array([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]), array([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]), array([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan])), 'keops': (array([  0.5936873 ,   1.19501662,   2.38423157,   5.95111203,
            11.91352129,  23.81562448,  59.58405256, 119.12374878,
                    nan,          nan,          nan]), array([1.49341023, 1.07193971, 0.76744741, 0.51586437, 0.37869647,
           0.26466152, 0.14763255, 0.08564569,        nan,        nan,
                  nan]), array([0.42199785, 0.4244009 , 0.42787209, 0.43452966, 0.4411799 ,
           0.4486292 , 0.45778841, 0.46294636,        nan,        nan,
                  nan]))}, 0.4689895510673523)



.. GENERATED FROM PYTHON SOURCE LINES 514-527

Relying on a coarse subsampling of the input measures,
**our 2-scale routines outperform the "online" backend** as soon as the number
of points per shape exceeds ~50,000.

All-in-all, in a typical shape analysis setting, **the GeomLoss routines**
**thus allow us to benefit from a x1,000+ speed-up compared with off-the-shelf**
**implementations of the Sinkhorn and Auction algorithms.**
Combining three distinct ideas (the switch from tensorized to **online** GPU routines;
simulated **annealing** strategies; adaptive **clustering** schemes) in a
single PyTorch layer, this implementation will hopefully ease the computational
burden on researchers and allow them
to focus on high-level models.


.. GENERATED FROM PYTHON SOURCE LINES 527-529

.. code-block:: Python


    plt.show()








.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (15 minutes 46.599 seconds)


.. _sphx_glr_download__auto_examples_performances_plot_benchmarks_ot_3D.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_benchmarks_ot_3D.ipynb <plot_benchmarks_ot_3D.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_benchmarks_ot_3D.py <plot_benchmarks_ot_3D.py>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_