Running on GPUs

ITensor provides package extensions for running tensor operations on a variety of GPU backends. You can activate a backend by loading the appropriate Julia GPU package alongside ITensors.jl and moving your tensors and/or tensor networks to an available GPU using that package's provided conversion functions.

For example, you can load CUDA.jl to perform tensor operations on NVIDIA GPUs or Metal.jl to perform tensor operations on Apple GPUs:

using ITensors

i, j, k = Index.((2, 2, 2))
A = randomITensor(i, j)
B = randomITensor(j, k)

# Perform tensor operations on CPU
A * B

###########################################
using CUDA # This will trigger the loading of `NDTensorsCUDAExt` in the background

# Move tensors to NVIDIA GPU
Acu = cu(A)
Bcu = cu(B)

# Perform tensor operations on NVIDIA GPU
Acu * Bcu

###########################################
using Metal # This will trigger the loading of `NDTensorsMetalExt` in the background

# Move tensors to Apple GPU
Amtl = mtl(A)
Bmtl = mtl(B)

# Perform tensor operations on Apple GPU
Amtl * Bmtl

Note that we highly recommend using these new package extensions as opposed to ITensorGPU.jl, which is ITensor's previous CUDA backend. The package extensions are better integrated into the main library so are more reliable and better supported right now. We plan to deprecate ITensorGPU.jl in the future.

GPU backends

ITensor currently provides package extensions for the following GPU backends:

Our goal is to support all GPU backends which are supported by the JuliaGPU organization.

Some important caveats to keep in mind related to the ITensor GPU backends are:

  • only dense tensor operations are well supported right now. Block sparse operations (which arise when QN conservation is enabled) are under active development and either may not work or may be slower than their CPU counterparts,
  • certain GPU backends do not have native support for certain matrix decompositions like svd, eigen, and qr in which case we will perform those operations on CPU. If your calculation is dominated by those operations, there likely is no advantage to running it on GPU right now. CUDA generally has good support for native matrix decompositions, while Metal and AMD have more limited support right now, and
  • single precision (Float32) calculations are generally fastest on GPU.

The table below summarizes each backend's current capabilities.

CUDAROCmMetaloneAPI
Contractions (dense)N/A
Contractions (cuTENSOR)In progressN/AN/AN/A
QR (dense)On CPUOn CPUN/A
SVD (dense)On CPUOn CPUN/A
Eigendecomposition (dense)On CPUOn CPUN/A
Double precision (Float64)N/AN/A
Block sparseIn progressIn progressIn progressN/A