PK 2MXR>c c dataset_utils.ipynb{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n# Datasets for the benchmarks\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import os\nimport numpy as np\nimport urllib.request\n\nsynthetic = {\n # Key : metric, Ntrain, Ntest, D,\n \"R^D a\": (\"euclidean\", 10**4, 10**4, 3),\n \"R^D b\": (\"euclidean\", 10**6, 10**4, 3),\n \"R^D c\": (\"euclidean\", 10**6, 10**4, 10),\n \"R^D d\": (\"euclidean\", 10**6, 10**4, 100),\n \"R^D e\": (\"euclidean\", 10**7, 10**4, 100),\n \"R^D f\": (\"manhattan\", 10**6, 10**4, 10),\n \"R^D g\": (\"manhattan\", 10**6, 10**4, 100),\n \"S^{D-1}\": (\"angular\", 10**6, 10**4, 10),\n \"H^D\": (\"hyperbolic\", 10**6, 10**4, 10),\n}\n\ndownloaded = {\n # Key : metric, filename, url\n \"MNIST a\": (\n \"euclidean\",\n \"mnist-784-euclidean.hdf5\",\n \"http://ann-benchmarks.com/mnist-784-euclidean.hdf5\",\n ),\n \"MNIST b\": (\n \"manhattan\",\n \"mnist-784-euclidean.hdf5\",\n \"http://ann-benchmarks.com/mnist-784-euclidean.hdf5\",\n ),\n \"GloVe25\": (\n \"angular\",\n \"glove-25-angular.hdf5\",\n \"http://ann-benchmarks.com/glove-25-angular.hdf5\",\n ),\n \"GloVe100\": (\n \"angular\",\n \"glove-100-angular.hdf5\",\n \"http://ann-benchmarks.com/glove-100-angular.hdf5\",\n ),\n}\n\n\ndef get_dataset(key):\n data_folder = \"benchmark_datasets/\"\n\n filename = None\n\n true_indices = None\n true_values = None\n\n if key in synthetic.keys():\n metric, Ntrain, Ntest, D = synthetic[key]\n\n if metric == \"hyperbolic\":\n x_train = 0.5 + np.random.rand(Ntrain, D)\n x_test = 0.5 + np.random.rand(Ntest, D)\n else:\n x_train = np.random.randn(Ntrain, D)\n x_test = np.random.randn(Ntest, D)\n\n else:\n import h5py\n\n metric, filename, url = downloaded[key]\n filename = data_folder + filename\n\n if not os.path.isfile(filename):\n os.makedirs(os.path.dirname(filename), exist_ok=True)\n urllib.request.urlretrieve(url, filename)\n\n f = h5py.File(filename, \"r\")\n\n x_train = f[\"train\"][()]\n x_test = f[\"test\"][()]\n true_indices = f[\"neighbors\"][()]\n true_values = f[\"distances\"][()]\n\n # \u00a0With the angular metric, all the points are normalized:\n if metric == \"angular\":\n x_train /= np.linalg.norm(x_train, axis=1, keepdims=True)\n x_test /= np.linalg.norm(x_test, axis=1, keepdims=True)\n\n x_train = x_train.astype(\"float32\")\n x_test = x_test.astype(\"float32\")\n\n return {\n \"train\": x_train,\n \"test\": x_test,\n \"metric\": metric,\n \"output\": true_indices,\n # \"true_distances\": true_values,\n }\n\n\nfrom benchmark_utils import tensor\nfrom pykeops.torch import LazyTensor\n\n\ndef ground_truth(x_train, x_test, K, metric):\n # Setup the K-NN estimator:\n x_train = tensor(x_train)\n x_test = tensor(x_test)\n\n # Encoding as KeOps LazyTensors:\n X_i = LazyTensor(x_test[:, None, :])\n X_j = LazyTensor(x_train[None, :, :])\n\n # Symbolic distance matrix:\n if metric == \"euclidean\":\n D_ij = ((X_i - X_j) ** 2).sum(-1)\n elif metric == \"manhattan\":\n D_ij = (X_i - X_j).abs().sum(-1)\n elif metric == \"angular\":\n D_ij = -(X_i | X_j)\n elif metric == \"hyperbolic\":\n D_ij = ((X_i - X_j) ** 2).sum(-1) / (X_i[0] * X_j[0])\n\n # K-NN query:\n indices = D_ij.argKmin(K, dim=1)\n return indices.cpu().numpy()\n\n\nfrom copy import deepcopy\n\n\ndef generate_samples(key):\n dataset = get_dataset(key)\n\n def samples(K):\n KNN_dataset = deepcopy(dataset)\n KNN_dataset[\"output\"] = ground_truth(\n KNN_dataset[\"train\"], KNN_dataset[\"test\"], K, KNN_dataset[\"metric\"]\n )\n return KNN_dataset\n\n return samples"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.7"
}
},
"nbformat": 4,
"nbformat_minor": 0
}PK 2MXE{K benchmark_KNN.ipynb{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n# K-Nearest Neighbors search\n\nWe compare the performances of PyTorch, JAX, KeOps, Scikit-Learn and FAISS (when applicable) \nfor K-NN queries on random samples and standard datasets.\nA detailed discussion of these results can be found in Section 5.2\nof our [NeurIPS 2020 paper](https://www.jeanfeydy.com/Papers/KeOps_NeurIPS_2020.pdf).\nGenerally speaking, generic KeOps routines are orders\nof magnitude faster and more memory efficient than\ntheir PyTorch and JAX counterparts.\nThey perform on par\nwith the handcrafted CUDA kernels of the FAISS-Flat (bruteforce) method for problems\nwith **up to 1M samples in dimension 1 to 50-100**,\nbut are sub-optimal on larger datasets.\nCrucially, KeOps is easy to use with **any metric**:\nit provides the only competitive run times in the many settings\nthat are not supported by existing C++ libraries.\n\nIn this demo, we often use exact **bruteforce** computations \n(tensorized for PyTorch/JAX, on-the-fly for KeOps) and do not leverage any\nquantization scheme or multiscale\ndecomposition of the distance matrix.\nFirst support for these approximation strategies with KeOps is scheduled for\nMay-June 2021. Going forward, a major priority for KeOps\nis to get closer to the reference run times of the FAISS library\nin all settings.\nWe intend to provide a versatile, generic and pythonic code that is easy to\nmodify and integrate in other projects.\nHopefully, this will **stimulate research on non-Euclidean metrics**,\nsuch as hyperbolic or discrete spaces.\n\n
Note
Note that timings are always subject to change:\n libraries and hardware get better with time.\n If you find a way of improving these benchmarks, please \n [let us know](https://github.com/getkeops/keops/issues)!
\n \n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup\n\nFirst, we load some utility routines to run and display benchmarks:\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import numpy as np\nimport torch\nfrom matplotlib import pyplot as plt\nfrom functools import partial\n\nfrom benchmark_utils import (\n full_benchmark,\n timer,\n tensor,\n int_tensor,\n jax_tensor,\n globalize,\n)\nfrom dataset_utils import generate_samples\n\nuse_cuda = torch.cuda.is_available()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We then specify the values of K that we will inspect:\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"Ks = [1, 10, 50, 100] # Numbers of neighbors to find"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## PyTorch bruteforce implementation\n\nAs a first baseline, we benchmark a PyTorch K-NN routine on the GPU.\nWe implement a collection of distance-like matrices between\npoint clouds $(x_i)$ and $(y_j)$:\n\n- The squared **Euclidean** distance $\\|x-y\\|^2 = \\sum_k (x[k] - y[k])^2$.\n- The **Manhattan** distance $\\|x-y\\|_{L^1} = \\sum_k |x[k] - y[k]|$.\n- The **cosine similarity** $\\langle x, y\\rangle = \\sum_k (x[k] \\cdot y[k])$.\n- The **hyperbolic** distance on the Poincare half-space $\\mathbb{H}$\n of vectors $x$ such that $x[0] > 0$,\n $\\text{d}_{\\mathbb{H}}(x, y)= \\text{arcosh}(1+ \\|x-y\\|^2 / (2 \\,x[0]y[0]))$.\n Since $d \\mapsto \\text{arcosh}(1+d/2)$ is increasing,\n we only compute the pseudo distance\n $\\|x-y\\|^2 / x[0]y[0]$.\n\nNote
Expanding the squared norm $\\|x-y\\|^2$ as a sum\n $\\|x\\|^2 - 2 \\langle x, y \\rangle + \\|y\\|^2$ allows us\n to leverage the fast matrix-matrix product of the BLAS/cuBLAS\n libraries. We rely on this identity whenever possible.
\n\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"def KNN_torch_fun(x_train, x_train_norm, x_test, K, metric):\n largest = False # Default behaviour is to look for the smallest values\n\n if metric == \"euclidean\":\n x_test_norm = (x_test**2).sum(-1)\n diss = (\n x_test_norm.view(-1, 1)\n + x_train_norm.view(1, -1)\n - 2 * x_test @ x_train.t() # Rely on cuBLAS for better performance!\n )\n\n elif metric == \"manhattan\":\n diss = (x_test[:, None, :] - x_train[None, :, :]).abs().sum(dim=2)\n\n elif metric == \"angular\":\n diss = x_test @ x_train.t()\n largest = True\n\n elif metric == \"hyperbolic\":\n x_test_norm = (x_test**2).sum(-1)\n diss = (\n x_test_norm.view(-1, 1)\n + x_train_norm.view(1, -1)\n - 2 * x_test @ x_train.t()\n )\n diss /= x_test[:, 0].view(-1, 1) * x_train[:, 0].view(1, -1)\n else:\n raise NotImplementedError(f\"The '{metric}' distance is not supported.\")\n\n return diss.topk(K, dim=1, largest=largest).indices"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We rely on the **tensorized**\nimplementation above to define a simple K-NN query operator.\nWe follow the scikit-learn API with \"train\" and \"test\" methods:\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"def KNN_torch(K, metric=\"euclidean\", **kwargs):\n def fit(x_train):\n # Setup the K-NN estimator:\n x_train = tensor(x_train)\n start = timer()\n # The \"training\" time here should be negligible:\n x_train_norm = (x_train**2).sum(-1)\n elapsed = timer() - start\n\n def f(x_test):\n x_test = tensor(x_test)\n start = timer()\n\n # Actual K-NN query:\n out = KNN_torch_fun(x_train, x_train_norm, x_test, K, metric)\n\n elapsed = timer() - start\n indices = out.cpu().numpy()\n return indices, elapsed\n\n return f, elapsed\n\n return fit"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Unfortunately, the code above creates a full\n$\\mathrm{N}_{\\text{queries}}\\times \\mathrm{N}_{\\text{points}}$\ndistance matrix that may not fit in the GPU memory.\nTo work around this problem and avoid memory overflows, we benchmark a second implementation\nthat works with small batches of queries at time:\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"def KNN_torch_batch_loop(K, metric=\"euclidean\", **kwargs):\n def fit(x_train):\n # Setup the K-NN estimator:\n x_train = tensor(x_train)\n Ntrain, D = x_train.shape\n start = timer()\n # The \"training\" time here should be negligible:\n x_train_norm = (x_train**2).sum(-1)\n elapsed = timer() - start\n\n def f(x_test):\n x_test = tensor(x_test)\n\n # Estimate the largest reasonable batch size:\n Ntest = x_test.shape[0]\n av_mem = int(5e8) # 500 Mb of GPU memory per batch\n # Remember that a vector of D float32 number takes up 4*D bytes:\n Ntest_loop = min(max(1, av_mem // (4 * D * Ntrain)), Ntest)\n Nloop = (Ntest - 1) // Ntest_loop + 1\n out = int_tensor(Ntest, K)\n\n start = timer()\n # Actual K-NN query:\n for k in range(Nloop):\n x_test_k = x_test[Ntest_loop * k : Ntest_loop * (k + 1), :]\n out[Ntest_loop * k : Ntest_loop * (k + 1), :] = KNN_torch_fun(\n x_train, x_train_norm, x_test_k, K, metric\n )\n\n # torch.cuda.empty_cache()\n\n elapsed = timer() - start\n indices = out.cpu().numpy()\n return indices, elapsed\n\n return f, elapsed\n\n return fit"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## JAX bruteforce implementation\n\nWe now re-implement the same method with JAX-XLA routines.\n\nNote that we run this script with the command line option\n``XLA_PYTHON_CLIENT_ALLOCATOR=platform``: this prevents JAX\nfrom locking up GPU memory and allows\nus to benchmark JAX, FAISS, PyTorch and KeOps next to each other.\nThis may impact performances - but as a general rule,\nwe found JAX to be orders of magnitude slower than PyTorch\nand KeOps in these benchmarks, even with unrestrained access to the GPU device memory.\nNeedless to say, this is subject to change with future releases:\nwe stay tuned to keep this documentation up to date and welcome\nall suggestions!\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"from functools import partial\nimport jax\nimport jax.numpy as jnp\n\n\n@partial(jax.jit, static_argnums=(2, 3))\ndef knn_jax_fun(x_train, x_test, K, metric):\n if metric == \"euclidean\":\n diss = (\n (x_test**2).sum(-1)[:, None]\n + (x_train**2).sum(-1)[None, :]\n - 2 * x_test @ x_train.T\n )\n\n elif metric == \"manhattan\":\n diss = jax.lax.abs(x_test[:, None, :] - x_train[None, :, :]).sum(-1)\n\n elif metric == \"angular\":\n diss = -x_test @ x_train.T\n\n elif metric == \"hyperbolic\":\n diss = (\n (x_test**2).sum(-1)[:, None]\n + (x_train**2).sum(-1)[None, :]\n - 2 * x_test @ x_train.T\n )\n diss = diss / (x_test[:, 0][:, None] * x_train[:, 0][None, :])\n\n else:\n raise NotImplementedError(f\"The '{metric}' distance is not supported.\")\n\n indices = jax.lax.top_k(-diss, K)[1]\n return indices"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Straightforward K-NN query, with a scikit-learn interface:\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"def KNN_JAX(K, metric=\"euclidean\", **kwargs):\n def fit(x_train):\n # Setup the K-NN estimator:\n start = timer(use_torch=False)\n x_train = jax_tensor(x_train)\n elapsed = timer(use_torch=False) - start\n\n def f(x_test):\n x_test = jax_tensor(x_test)\n\n # Actual K-NN query:\n start = timer(use_torch=False)\n indices = knn_jax_fun(x_train, x_test, K, metric)\n indices = np.array(indices)\n elapsed = timer(use_torch=False) - start\n return indices, elapsed\n\n return f, elapsed\n\n return fit"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Smarter routine, which relies on small batches to avoid memory overflows:\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"def KNN_JAX_batch_loop(K, metric=\"euclidean\", **kwargs):\n def fit(x_train):\n # Setup the K-NN estimator:\n start = timer(use_torch=False)\n x_train = jax_tensor(x_train)\n elapsed = timer(use_torch=False) - start\n\n def f(x_test):\n x_test = jax_tensor(x_test)\n\n # Estimate the largest reasonable batch size\n av_mem = int(5e8) # 500 Mb\n Ntrain, D = x_train.shape\n Ntest = x_test.shape[0]\n Ntest_loop = min(max(1, av_mem // (4 * D * Ntrain)), Ntest)\n Nloop = (Ntest - 1) // Ntest_loop + 1\n indices = np.zeros((Ntest, K), dtype=int)\n\n start = timer(use_torch=False)\n # Actual K-NN query:\n for k in range(Nloop):\n x_test_k = x_test[Ntest_loop * k : Ntest_loop * (k + 1), :]\n indices[Ntest_loop * k : Ntest_loop * (k + 1), :] = knn_jax_fun(\n x_train, x_test_k, K, metric\n )\n elapsed = timer(use_torch=False) - start\n return indices, elapsed\n\n return f, elapsed\n\n return fit"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## KeOps bruteforce implementation\n\nKeOps lets us implement a bruteforce K-NN search efficiently,\n**without having to worry about memory overflows**.\nWe perform all the \"symbolic\" computations on the distance formulas\nahead of time, using the advanced ``Vi`` and ``Vj`` helpers that are described\nin [this tutorial](../_auto_tutorials/a_LazyTensors/plot_lazytensors_c.html).\nNote that we could also rely on the simpler ``LazyTensor`` syntax,\nat the cost of a small overhead that is negligible in most settings.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"from pykeops.torch import Vi, Vj\n\n\ndef KNN_KeOps(K, metric=\"euclidean\", **kwargs):\n def fit(x_train):\n # Setup the K-NN estimator:\n x_train = tensor(x_train)\n start = timer()\n\n # Encoding as KeOps LazyTensors:\n D = x_train.shape[1]\n X_i = Vi(0, D) # Purely symbolic \"i\" variable, without any data array\n X_j = Vj(1, D) # Purely symbolic \"j\" variable, without any data array\n\n # Symbolic distance matrix:\n if metric == \"euclidean\":\n D_ij = ((X_i - X_j) ** 2).sum(-1)\n elif metric == \"manhattan\":\n D_ij = (X_i - X_j).abs().sum(-1)\n elif metric == \"angular\":\n D_ij = -(X_i | X_j)\n elif metric == \"hyperbolic\":\n D_ij = ((X_i - X_j) ** 2).sum(-1) / (X_i[0] * X_j[0])\n else:\n raise NotImplementedError(f\"The '{metric}' distance is not supported.\")\n\n # K-NN query operator:\n KNN_fun = D_ij.argKmin(K, dim=1)\n\n # N.B.: The \"training\" time here should be negligible.\n elapsed = timer() - start\n\n def f(x_test):\n x_test = tensor(x_test)\n start = timer()\n\n # Actual K-NN query:\n indices = KNN_fun(x_test, x_train)\n\n elapsed = timer() - start\n\n indices = indices.cpu().numpy()\n return indices, elapsed\n\n return f, elapsed\n\n return fit"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## SciKit-Learn tree-based and bruteforce methods\n\nAs a standard baseline, we include the scikit-learn K-NN operators\nin our benchmark. Note that these routines only run on the CPU\nand don't perform well on high-dimensional datasets:\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"from sklearn.neighbors import NearestNeighbors\n\n\ndef KNN_sklearn(K, metric=\"euclidean\", algorithm=None, **kwargs):\n if metric in [\"euclidean\", \"angular\"]:\n p = 2\n elif metric == \"manhattan\":\n p = 1\n else:\n raise NotImplementedError(f\"The '{metric}' distance is not supported.\")\n\n KNN_meth = NearestNeighbors(n_neighbors=K, algorithm=algorithm, p=p, n_jobs=-1)\n\n def fit(x_train):\n # Setup the K-NN estimator:\n start = timer()\n KNN_fun = KNN_meth.fit(x_train).kneighbors\n elapsed = timer() - start\n\n def f(x_test):\n start = timer()\n distances, indices = KNN_fun(x_test)\n elapsed = timer() - start\n\n return indices, elapsed\n\n return f, elapsed\n\n return fit\n\n\nKNN_sklearn_auto = partial(KNN_sklearn, algorithm=\"auto\")\nKNN_sklearn_ball_tree = partial(KNN_sklearn, algorithm=\"ball_tree\")\nKNN_sklearn_kd_tree = partial(KNN_sklearn, algorithm=\"kd_tree\")\nKNN_sklearn_brute = partial(KNN_sklearn, algorithm=\"brute\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## FAISS approximate and brute-force methods\n\nFinally, we include run times for the reference FAISS library:\nout of the many (excellent) packages that are showcased on the\n[ANN-benchmarks website](http://ann-benchmarks.com)_,\nit is probably the most popular option and the package that provides the\nbest GPU support.\n\nA first baseline method is given by the\nHierarchical Navigable Small World graphs algorithm (**HNSW**), on the **CPU**.\nNote that the reference implementation provided by the\n[Non-Metric Space Library](https://github.com/nmslib/nmslib)\nwould probably be even more efficient.\n\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import faiss\n\n\ndef KNN_faiss_HNSW(K, metric=\"euclidean\", M=36, **kwargs):\n def fit(x_train):\n from benchmark_utils import timer\n\n D = x_train.shape[1]\n\n if metric in [\"euclidean\", \"angular\"]:\n index = faiss.IndexHNSWFlat(D, M)\n index.hnsw.efConstruction = 500\n else:\n raise NotImplementedError(f\"The '{metric}' distance is not supported.\")\n\n # Pre-processing:\n start = timer(use_torch=False)\n index.add(x_train)\n elapsed = timer(use_torch=False) - start\n\n # Return an operator for actual KNN queries:\n def f(x_test, efSearch=10):\n faiss.ParameterSpace().set_index_parameter(index, \"efSearch\", efSearch)\n start = timer(use_torch=False)\n distances, indices = index.search(x_test, K)\n elapsed = timer(use_torch=False) - start\n return indices, elapsed\n\n return f, elapsed\n\n return fit"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Choosing good parameter values for approximate nearest neighbors schemes\nis a non-trivial problem.\nTo keep things simple, we stick to the guidelines of the\nreference [ANN-Benchmarks website](https://github.com/erikbern/ann-benchmarks/blob/cb954d1af7124c201aa2c8dfc77681e639fce586/algos.yaml#L95)_\nand consider two configurations with an **increasing level of precision**,\nbut **slower run times**:\n\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"KNN_faiss_HNSW_fast = partial(KNN_faiss_HNSW, M=4)\nKNN_faiss_HNSW_slow = partial(KNN_faiss_HNSW, M=36)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We also benchmark two of the **fast GPU methods** provided by the FAISS library:\n\n- a **bruteforce \"Flat\"** method, with no parameters;\n- the **approximate \"IVF-Flat\"** method, with two main parameters (`nlist` and `nprobe`).\n\nCrucially, we do **not** benchmark the most advanced schemes\nprovided by FAISS, such as the quantization-based\n**IVF-PQ** algorithm. These methods are powerful and very efficient, but come with many\ncaveats and parameters to tune: we lack the expertise to\nuse them properly and leave them aside for the moment.\n\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# Load FAISS on the GPU:\n# (The library pre-allocates a cache file of around ~1Gb on the device.)\nres = faiss.StandardGpuResources()\ndeviceId = 0\n\n\ndef KNN_faiss_gpu(\n K,\n metric,\n algorithm=\"flat\",\n nlist=8192,\n nprobe=100,\n m=None,\n use_float16=False,\n **kwargs,\n):\n def fit(x_train):\n D = x_train.shape[1]\n\n co = faiss.GpuClonerOptions()\n co.useFloat16 = use_float16\n\n if metric in [\"euclidean\", \"angular\"]:\n if algorithm == \"flat\":\n index = faiss.IndexFlatL2(D) # May be used as quantizer\n index = faiss.index_cpu_to_gpu(res, deviceId, index, co)\n\n elif algorithm == \"ivfflat\":\n quantizer = faiss.IndexFlatL2(D) # the other index\n faiss_metric = (\n faiss.METRIC_L2\n if metric == \"euclidean\"\n else faiss.METRIC_INNER_PRODUCT\n )\n index = faiss.IndexIVFFlat(quantizer, D, nlist, faiss_metric)\n index = faiss.index_cpu_to_gpu(res, deviceId, index, co)\n\n assert not index.is_trained\n index.train(x_train) # add vectors to the index\n assert index.is_trained\n\n else:\n raise NotImplementedError(f\"The '{metric}' distance is not supported.\")\n\n # Pre-processing:\n start = timer(use_torch=False)\n index.add(x_train)\n index.nprobe = nprobe\n elapsed = timer(use_torch=False) - start\n\n # Return an operator for actual KNN queries:\n def f(x_test):\n start = timer(use_torch=False)\n distances, indices = index.search(x_test, K)\n elapsed = timer(use_torch=False) - start\n return indices, elapsed\n\n return f, elapsed\n\n return fit"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Using the FAISS-Flat bruteforce routines is straightforward:\n\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"KNN_faiss_gpu_Flat = partial(KNN_faiss_gpu, algorithm=\"flat\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"On the other hand, the FAISS-IVF-Flat method is a bit more complex.\nJust as we did for the HNSW algorithm, we rely on the\n[ANN-Benchmarks guidelines](https://github.com/erikbern/ann-benchmarks/blob/cb954d1af7124c201aa2c8dfc77681e639fce586/algos.yaml#L50)\nand define two routines with **increasing levels of precision**:\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"KNN_faiss_gpu_IVFFlat_fast = partial(\n KNN_faiss_gpu, algorithm=\"ivfflat\", nlist=400, nprobe=1\n)\nKNN_faiss_gpu_IVFFlat_slow = partial(\n KNN_faiss_gpu, algorithm=\"ivfflat\", nlist=4096, nprobe=40\n)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Benchmark parameters\n\nFinally, we compare all our methods through a unified interface.\n\nNote
Fitting KeOps, JAX, PyTorch and FAISS in a single script is not easy:\n all these libraries have different failure modes,\n with some of the C++ errors thrown by JAX and FAISS being\n very hard to \"catch\" in a proper Python structure.\n To keep things simple, we use environment variables\n to make a few \"pre-compilation runs\" prior to the\n final benchmark that is rendered on this website.
\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import os\n\ngetenv = lambda s: bool(os.getenv(s, \"False\").lower() in [\"true\", \"1\"])\n\nkeops_only = getenv(\"KEOPS_DOC_PRECOMPILE\")\njax_only = getenv(\"KEOPS_DOC_PRECOMPILE_JAX\")\n\n\ndef run_KNN_benchmark(name, loops=[1]):\n # Load the dataset and some info:\n dataset = generate_samples(name)(1)\n N_train, dimension = dataset[\"train\"].shape\n N_test, _ = dataset[\"test\"].shape\n metric = dataset[\"metric\"]\n\n # Routines to benchmark:\n if keops_only:\n routines = [(KNN_KeOps, \"KeOps (GPU)\", {})]\n elif jax_only:\n routines = [(KNN_JAX_batch_loop, \"JAX (small batches, GPU)\", {})]\n else:\n routines = [\n (KNN_KeOps, \"KeOps (GPU)\", {}),\n (KNN_faiss_gpu_Flat, \"FAISS-Flat (GPU)\", {}),\n (KNN_faiss_gpu_IVFFlat_fast, \"FAISS-IVF-Flat (GPU, nprobe=1)\", {}),\n (KNN_faiss_gpu_IVFFlat_slow, \"FAISS-IVF-Flat (GPU, nprobe=40)\", {}),\n (KNN_torch, \"PyTorch (GPU)\", {}),\n (KNN_torch_batch_loop, \"PyTorch (small batches, GPU)\", {}),\n (KNN_JAX_batch_loop, \"JAX (small batches, GPU)\", {}),\n (KNN_faiss_HNSW_fast, \"FAISS-HNSW (CPU, M=4)\", {}),\n (KNN_faiss_HNSW_slow, \"FAISS-HNSW (CPU, M=36)\", {}),\n (KNN_sklearn_ball_tree, \"sklearn, Ball-tree (CPU)\", {}),\n (KNN_sklearn_kd_tree, \"sklearn, KD-tree (CPU)\", {}),\n # (KNN_sklearn_brute, \"sklearn, bruteforce (CPU)\", {}),\n ]\n\n # Actual run:\n full_benchmark(\n f\"K-NN search on {name}: {N_test:,} queries on a dataset of {N_train:,} points\\nin dimension {dimension:,} with a {metric} metric.\",\n routines,\n generate_samples(name),\n min_time=1e-4,\n max_time=1 if (keops_only or jax_only) else 10,\n loops=loops,\n problem_sizes=Ks,\n xlabel=\"Number of neighbours K\",\n frequency=True,\n ylabel=\"Queries per second (Hz = 1/s)\",\n legend_location=\"upper right\",\n linestyles=[\n \"o-\",\n \"s-\",\n \"^:\",\n \"<:\",\n \"v-\",\n \"x-\",\n \"+-\",\n \"*--\",\n \"p--\",\n \"s-.\",\n \"^-.\",\n # \"<-.\",\n ],\n )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Random samples in a Euclidean space\n\nSmall dataset of **10k points in dimension 3**, as is typical in\ne.g. **shape analysis** and point cloud processing.\nIn this scenario, bruteforce approaches are most efficient:\n**KeOps slightly edges FAISS-Flat**, with both methods out-performing\nother routines by **an order of magnitude**.\nNote that the HNSW, IVF-Flat and scikit-learn functions\nincur a significant \"training\" pre-processing time,\ndetailed below the curves.\n\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"run_KNN_benchmark(\"R^D a\", loops=[10, 1])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Large dataset of **1M points in dimension 3**, as is typical\nin **computer graphics**.\nIn this setting, taking some time to create a multiscale\nindex of the input dataset can be worthwhile:\nthe IVF-Flat and HNSW methods provide **faster queries** at the cost\nof significant **pre-processing times**.\nAmong \"on-the-fly\" bruteforce methods, KeOps edges\nthe FAISS-Flat routine and is the most competitive option.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"run_KNN_benchmark(\"R^D b\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Large dataset of **1M points in dimension 10**,\nas can be typical in **low-dimensional machine learning**.\nIn this setting, approximate strategies such as the IVF-Flat method\nare **most competitive** - and we would expect the IVF-PQ routines to perform\neven better!\n\nNote
We don't display CPU-based methods with pre-processing\n times longer than 60s, but stress that these routines can\n provide excellent performances in \"offline\" scenarios.
\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"run_KNN_benchmark(\"R^D c\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Large dataset of **1M points in dimension 100**,\nwith **random Gaussian samples**.\nCrucially, when the dataset is high-dimensional and has\nlittle to no geometric structure, **bruteforce methods become relevant once again**:\nFAISS-Flat and KeOps provide the only two reasonable run times.\nAs detailed in [our high-dimensional benchmarks](plot_benchmark_high_dimension.html),\nthe cuBLAS-based routines of FAISS edge our KeOps implementation\nwhen the dimension of the ambient space D exceeds 50-100.\n\nOne of our top priorities for early 2021 is to close this gap\nwith improved CUDA schemes. Adding support for\nsome of the new hardware features of Ampere GPUs (Tensor cores,\nquantized numerical types, etc.) should also help\nto improve performances across the board.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"run_KNN_benchmark(\"R^D d\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Random samples in other spaces\n\n**Cosine similarity metric with 1M points in dimension 10**,\nas can be typical in low-dimensional machine learning.\nThis metric is generally well-supported by standard libraries:\nusing efficient matrix-matrix products,\nit is even easier to implement than the squared Euclidean distance.\n\nUnsurprisingly, run times follow closely the trends\nof the previous examples.\nIn dimension 10, approximate IVF-like strategies provide\nthe largest amount of queries per second.\nKeOps remains competitive among bruteforce methods,\nwithout any pre-processing time.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"run_KNN_benchmark(\"S^{D-1}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The picture changes completely\nonce we start working with less common formulas\nsuch as the **Manhattan-L1 metric**.\nIn this scenario, neither cuBLAS nor FAISS can be used and\nKeOps remain the only competitive library for K-NN search on the GPU.\nThis is true with **1M points in dimension 10**:\n\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"run_KNN_benchmark(\"R^D f\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**1M point in dimension 100**, or any other dataset:\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"run_KNN_benchmark(\"R^D g\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The same lesson holds in e.g. hyperbolic spaces.\nIn the example below, we perform K-NN queries\nfor the hyperbolic metric with **1M points in the Poincare half-plane of dimension 10**.\nThe run times for KeOps remain in line with the \"Euclidean\" benchmarks\nand **orders of magnitude faster** than standard PyTorch and JAX implementations.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"run_KNN_benchmark(\"H^D\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Standard datasets\n\nThe benchmarks above were all performed on random Gaussian samples.\nThese results provide an informative baseline...\nBut in practice, most real-life datasets present a\n**geometric structure** that can be leveraged by clever algorithms.\nTo measure the performances of bruteforce and IVF-like methods in\n\"realistic\" machine learning scenarios, we now benchmark\nour routines on several [standard datasets](https://ann-benchmarks.com).\n\nFirst of all, on the well-known **MNIST collection of handwritten digits**:\na collection of 60k 28-by-28 images, encoded as vectors\nof dimension 784 and endowed with the **Euclidean metric**.\nThis dataset is relatively **small** (less than 100k training samples)\nbut **high-dimensional** (D > 50) and highly **clustered** around\na dozen of prototypes (the digits 0, 1, ..., 9 and their variants).\nUnsurprisingly, it is handled much more efficiently by the FAISS routines\nthan by our bruteforce KeOps implementation.\n\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"run_KNN_benchmark(\"MNIST a\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note, however, that KeOps remains the only viable option\nto work easily with less common metrics such as the Manhattan-L1 norm:\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"run_KNN_benchmark(\"MNIST b\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To conclude this benchmark, we evaluate our routines\non the [GloVe word embeddings](https://nlp.stanford.edu/projects/glove/)\nfor natural language processing:\n**1.2M words**, represented as vectors of **dimension 25-100** and\ncompared with each other using the **cosine similarity metric**.\n\nIn dimension 25, KeOps performs on par with the FAISS-Flat bruteforce\nroutines. Both methods are slower than IVF-like algorithms\nin terms of queries per second:\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"run_KNN_benchmark(\"GloVe25\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In dimension 100, the pre-processing times associated\nto IVF-like methods increase significantly while\nthe FAISS-Flat routine edges the KeOps engine\nby a sizeable margin:\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"run_KNN_benchmark(\"GloVe100\")\n\nplt.show()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.7"
}
},
"nbformat": 4,
"nbformat_minor": 0
}PK MMXln n ! plot_benchmark_convolutions.ipynb{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n# Radial kernels convolutions\n\nThis benchmark compares the performances of KeOps versus Numpy and PyTorch on various radial\nkernels convolutions. Namely it computes:\n\n\\begin{align}a_i = \\sum_{j=1}^N f\\Big(\\frac{\\|x_i-y_j\\|}{\\sigma}\\Big) b_j, \\quad \\text{ for all } i=1,\\cdots,M\\end{align}\n\nwhere $f$ is a Gauss or Cauchy or Laplace or inverse multiquadric kernel. See e.g. [wikipedia](https://en.wikipedia.org/wiki/Radial_basis_function)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup\nStandard imports:\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import numpy as np\nimport timeit\nimport matplotlib\nfrom matplotlib import pyplot as plt\nfrom pykeops.numpy.utils import np_kernel\nfrom pykeops.torch.utils import torch_kernel\nfrom pykeops.torch import Vi, Vj, Pm"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Benchmark specifications:\n\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"M = 10000 # Number of points x_i\nN = 10000 # Number of points y_j\nD = 3 # Dimension of the x_i's and y_j's\nE = 3 # Dimension of the b_j's\nREPEAT = 10 # Number of loops per test\n\ndtype = \"float32\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Create some random input data:\n\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"x = np.random.randn(M, D).astype(dtype) # Target points\ny = np.random.randn(N, D).astype(dtype) # Source points\nb = np.random.randn(N, E).astype(dtype) # Source signal\nsigma = np.array([2.4]).astype(dtype) # Kernel radius"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And load it as PyTorch variables:\n\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"try:\n import torch\n\n use_cuda = torch.cuda.is_available()\n device = \"cuda\" if use_cuda else \"cpu\"\n torchtype = torch.float32 if dtype == \"float32\" else torch.float64\n\n xc = torch.tensor(x, dtype=torchtype, device=device)\n yc = torch.tensor(y, dtype=torchtype, device=device)\n bc = torch.tensor(b, dtype=torchtype, device=device)\n sigmac = torch.tensor(sigma, dtype=torchtype, device=device)\n\nexcept:\n pass"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Convolution Benchmarks\n\nWe loop over four different kernels:\n\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"kernel_to_test = [\"gaussian\", \"laplacian\", \"cauchy\", \"inverse_multiquadric\"]\nkernels = {\n \"gaussian\": lambda xc, yc, sigmac: (\n -Pm(1 / sigmac**2) * Vi(xc).sqdist(Vj(yc))\n ).exp(),\n \"laplacian\": lambda xc, yc, sigmac: (\n -(Pm(1 / sigmac**2) * Vi(xc).sqdist(Vj(yc))).sqrt()\n ).exp(),\n \"cauchy\": lambda xc, yc, sigmac: (\n 1 + Pm(1 / sigmac**2) * Vi(xc).sqdist(Vj(yc))\n ).power(-1),\n \"inverse_multiquadric\": lambda xc, yc, sigmac: (\n 1 + Pm(1 / sigmac**2) * Vi(xc).sqdist(Vj(yc))\n )\n .sqrt()\n .power(-1),\n}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"With four backends: Numpy, vanilla PyTorch, Generic KeOps reductions\nand a specific, handmade legacy CUDA code for kernel convolutions:\n\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"speed_numpy = {i: np.nan for i in kernel_to_test}\nspeed_pytorch = {i: np.nan for i in kernel_to_test}\nspeed_pykeops = {i: np.nan for i in kernel_to_test}\n\n\nprint(\"Timings for {}x{} convolutions:\".format(M, N))\n\nfor k in kernel_to_test:\n print(\"kernel: \" + k)\n\n # Pure numpy\n g_numpy = np.matmul(np_kernel(x, y, sigma, kernel=k), b)\n speed_numpy[k] = timeit.repeat(\n \"gnumpy = np.matmul( np_kernel(x, y, sigma, kernel=k), b)\",\n globals=globals(),\n repeat=5,\n number=1,\n )\n print(\"Time for NumPy: {:.4f}s\".format(np.median(speed_numpy[k])))\n\n # Vanilla pytorch (with cuda if available, and cpu otherwise)\n try:\n g_pytorch = torch_kernel(xc, yc, sigmac, kernel=k) @ bc\n torch.cuda.synchronize()\n speed_pytorch[k] = (\n np.array(\n timeit.repeat(\n \"torch_kernel(xc, yc, sigmac, kernel=k) @ bc; torch.cuda.synchronize()\",\n globals=globals(),\n repeat=REPEAT,\n number=4,\n )\n )\n / 4\n )\n\n print(\n \"Time for PyTorch: {:.4f}s\".format(np.median(speed_pytorch[k])),\n end=\"\",\n )\n print(\n \" (absolute error: \",\n np.max(np.abs(g_pytorch.cpu().numpy() - g_numpy)),\n \")\",\n )\n except:\n print(\"Time for PyTorch: Not Done\")\n\n # Keops: LazyTensors implementation (with cuda if available)\n try:\n g_pykeops = (kernels[k](xc, yc, sigmac) @ bc).cpu()\n torch.cuda.synchronize()\n speed_pykeops[k] = (\n np.array(\n timeit.repeat(\n \"kernels[k](xc, yc, sigmac) @ bc; torch.cuda.synchronize()\",\n globals=globals(),\n repeat=REPEAT,\n number=4,\n )\n )\n / 4\n )\n print(\n \"Time for KeOps LazyTensors: {:.4f}s\".format(\n np.median(speed_pykeops[k])\n ),\n end=\"\",\n )\n print(\n \" (absolute error: \",\n np.max(np.abs(g_pykeops.data.numpy() - g_numpy)),\n \")\",\n )\n except:\n print(\"Time for KeOps LazyTensors: Not Done\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Display results\n\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# plot violin plot\nplt.violinplot(\n list(speed_numpy.values()),\n showmeans=False,\n showmedians=True,\n)\nplt.violinplot(\n list(speed_pytorch.values()),\n showmeans=False,\n showmedians=True,\n)\nplt.violinplot(\n list(speed_pykeops.values()),\n showmeans=False,\n showmedians=True,\n)\n\nplt.xticks([1, 2, 3, 4], kernel_to_test)\nplt.yscale(\"log\")\n# plt.ylim((0, .01))\n\nplt.grid(True)\nplt.xlabel(\"kernel type\")\nplt.ylabel(\"time in s.\")\n\ncmap = plt.get_cmap(\"tab10\")\nfake_handles = [matplotlib.patches.Patch(color=cmap(i)) for i in range(4)]\n\nplt.legend(\n fake_handles,\n [\"NumPy\", \"PyTorch\", \"KeOps Lazytensors\"],\n loc=\"best\",\n)\n\nplt.show()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.7"
}
},
"nbformat": 4,
"nbformat_minor": 0
}PK ]MXa0$ 0$ &