# Structure of the repository¶

But how does KeOps handle symbolic formulas on the GPU? How can its routines outperform the CUDA backends of Deep Learning frameworks by such a wide margin? To answer these questions, we need to dive into the mixed C++/Python/Matlab codebase of the KeOps package, whose structure may be summarized as follows:

• The pykeops/ folder, with common/, numpy/ and torch/ subfolders contains our Python wrappers and relies on the fantastic PyBind11 library.

• The keopslab/ folder provides a collection of entry points for Matlab scripts.

• The keops/ folder contains our C++ files and the associated compilation scripts. The generic KeOps engine that we are now about to discuss is implemented in the core/ subfolder which contains:

• The link_autodiff.cpp and link_autodiff.cu “main” C++ files, which define the methods that binding libraries may use to create high-level modules.

• The pack/ subfolder, which defines abstract types for lists and tuples within the C++ templating system. Using advanced concepts that were introduced with the C++11 revision, this file allows us to drive the nvcc compiler with declarative “variadic templating” and generate routines that manipulate an arbitrary number of parameters, $$i$$- and $$j$$-variables.

• The autodiff/ subfolder, which defines the primitives of the KeOps symbolic syntax: variables, abstract unary and binary operations, gradients.

• The mapreduce/GpuConv*_.cu CUDA files, which implement our massively parallel Map-Reduce schemes. These files contain the core logic of the KeOps library.

• The mapreduce/CpuConv*_.cpp C++ files, which implement simple Map-Reduce schemes using standard “for” loops. They may be used to test the correctness of our parallel implementations and provide a fall-back mode to users who do not have access to GPU chips on their machines.

• The reductions/ subfolder, which implements the supported $$\operatorname{Reduction}$$ operations: sum, arg-min, log-sum-exp, etc.

• The formulas/ subfolder, which implement the atomic operations that users may combine to define vector-valued formulas $$F$$.

As evidenced here, the KeOps engine is heavily reliant on modern features of the C++ language: every time Genred encounters a new kind of generic operation (up to the values of $$\mathrm{M}$$, $$\mathrm{N}$$ and the data arrays which are free to change between every call), the string that specifies a generic formula is parsed by the compiler and a new “.dll” or “.so” shared object is generated before being executed on the relevant Python or Matlab tensors.