Threading and performance
A plurality (if not majority) of the time spent in MPO construction is in performing the sparse QR decomposition. This can often be sped-up significantly by loading MKL.
try
using MKL
catch
endITensorMPOConstruction can take advantage of multiple threads in a variety of ways.
ITensorMPS.add!(os::OpIDSum, ...)is thread-safe. This can be used to construct aOpIDSumin parallel.During the MPO construction
Threads.@threadsare used, primarily to iterate over the connected components (i.e. QN blocks) in parallel.In the sparse QR decomposition that is done for every connected component. The threading here is controlled by
BLAS.set_num_threads. Over subscription is possible since this QR decomposition is itself called from withinThreads.@threads.
Threading tips
- The MPO construction is memory bound, and the performance gain (or loss!) from multi-threading is system and operator dependent.
- If the MPO has lots of connected components (i.e. is sparse), use Julia threads.
- If the MPO has few connected components (i.e. is dense) use BLAS threads.