Timing and Profiling your code

It is very important to time and profile your code to make sure your code is running as fast as possible. Here are some tips on timing and profiling your code.

If you are concerned about the performance of your code, a good place to start is Julia's performance tips.

Timing and benchmarking

Julia has many nice timing tools available. Tools like @time can be used to measure the time of specific lines of code. For microbenchmarking, we recommend the BenchmarkTools package.

ProfileView.jl

Julia has a built-in profiler in the standard module Profile. You can use it as follows:

julis> using ITensors

julia> using Profile

julia> @profile include(joinpath(pkgdir(ITensors), "examples", "dmrg", "1d_heisenberg_conserve_spin.jl"))
sweeps = Sweeps
1 cutoff=1.0E-10, maxdim=10, mindim=1, noise=0.0E+00
2 cutoff=1.0E-10, maxdim=20, mindim=1, noise=0.0E+00
3 cutoff=1.0E-10, maxdim=100, mindim=1, noise=0.0E+00
4 cutoff=1.0E-10, maxdim=100, mindim=1, noise=0.0E+00
5 cutoff=1.0E-10, maxdim=200, mindim=1, noise=0.0E+00

After sweep 1 energy=-137.995732867390 maxlinkdim=10 maxerr=1.93E-02 time=0.862
After sweep 2 energy=-138.801057557054 maxlinkdim=20 maxerr=3.37E-05 time=1.126
After sweep 3 energy=-138.940075984826 maxlinkdim=91 maxerr=9.99E-11 time=1.880
After sweep 4 energy=-138.940086063995 maxlinkdim=99 maxerr=1.00E-10 time=3.033
After sweep 5 energy=-138.940086076330 maxlinkdim=95 maxerr=9.97E-11 time=2.824
Final energy = -138.940086076330

julia> Profile.print()
Overhead ╎ [+additional indent] Count File:Line; Function
=========================================================
   ╎6576 @Base/client.jl:485; _start()
   ╎ 6576 @Base/client.jl:302; exec_options(opts::Base.JLOptions)
   ╎  6576 @Base/client.jl:372; run_main_repl(interactive::Bool, quiet::Bool, b...
   ╎   6576 @Base/essentials.jl:706; invokelatest
   ╎    6576 @Base/essentials.jl:708; #invokelatest#2
   ╎     6576 @Base/client.jl:387; (::Base.var"#874#876"{Bool, Bool, Bool})(REPL...
   ╎    ╎ 6576 .../stdlib/v1.6/REPL/src/REPL.jl:305; run_repl(repl::REPL.AbstractREPL, consumer::Any)
   ╎    ╎  6576 ...stdlib/v1.6/REPL/src/REPL.jl:317; run_repl(repl::REPL.AbstractREPL, consumer::...
[...]

We don't actually recommend printing it since the output is extremely long and difficult to read. Instead, it is best to use a visualization tool to visualize the profile, so you can see at a glance what parts of your code are taking up the most time.

For a quick and easy way of seeing how fast different parts of your code are, we highly recommend ProfileView.jl. All you have to do is load ProfileView and append a simple macro @profview:

julia> using ITensors

julia> using ProfileView

julia> @profview include(joinpath(pkgdir(ITensors), "examples", "dmrg", "1d_heisenberg_conserve_spin.jl"));
sweeps = Sweeps
1 cutoff=1.0E-10, maxdim=10, mindim=1, noise=0.0E+00
2 cutoff=1.0E-10, maxdim=20, mindim=1, noise=0.0E+00
3 cutoff=1.0E-10, maxdim=100, mindim=1, noise=0.0E+00
4 cutoff=1.0E-10, maxdim=100, mindim=1, noise=0.0E+00
5 cutoff=1.0E-10, maxdim=200, mindim=1, noise=0.0E+00

After sweep 1 energy=-137.995732867390 maxlinkdim=10 maxerr=1.93E-02 time=0.977
After sweep 2 energy=-138.801057557054 maxlinkdim=20 maxerr=3.37E-05 time=1.252
After sweep 3 energy=-138.940075984826 maxlinkdim=91 maxerr=9.99E-11 time=2.263
After sweep 4 energy=-138.940086063995 maxlinkdim=99 maxerr=1.00E-10 time=2.938
After sweep 5 energy=-138.940086076330 maxlinkdim=95 maxerr=9.97E-11 time=2.988
Final energy = -138.940086076330

A window will pop up with a "flame graph", where the width of a bar corresponds to the percentage of time the function call took, and as you go up in the graph you see timings for more and more nested functions. The graph will still look quite complicated for this case, but at larger bond dimensions you should see that certain functions like matrix multiplications and decompositions will start to dominate.

TimerOutputs.jl

For more focused timing of specific parts of your code, we recommend the package TimerOutputs. This can help measure lines of code that are called repeatedly, where you want to know the accumulated time. We have some timers defined internally in ITensors.jl for the dmrg function which can be called as follows:

julia> using ITensors

julia> ITensors.TimerOutputs.enable_debug_timings(ITensors)
timeit_debug_enabled (generic function with 1 method)

julia> ITensors.TimerOutputs.reset_timer!(ITensors.NDTensors.timer)
 ──────────────────────────────────────────────────────────────────
                           Time                   Allocations      
                   ──────────────────────   ───────────────────────
 Tot / % measured:     59.4μs / 0.00%              992B / 0.00%    

 Section   ncalls     time   %tot     avg     alloc   %tot      avg
 ──────────────────────────────────────────────────────────────────
 ──────────────────────────────────────────────────────────────────

julia> include("1d_heisenberg_conserve_spin.jl")
sweeps = Sweeps
1 cutoff=1.0E-10, maxdim=10, mindim=1, noise=0.0E+00
2 cutoff=1.0E-10, maxdim=20, mindim=1, noise=0.0E+00
3 cutoff=1.0E-10, maxdim=100, mindim=1, noise=0.0E+00
4 cutoff=1.0E-10, maxdim=100, mindim=1, noise=0.0E+00
5 cutoff=1.0E-10, maxdim=200, mindim=1, noise=0.0E+00

After sweep 1 energy=-137.995732867390 maxlinkdim=10 maxerr=1.93E-02 time=0.597
After sweep 2 energy=-138.801057557054 maxlinkdim=20 maxerr=3.37E-05 time=0.798
After sweep 3 energy=-138.940075984826 maxlinkdim=91 maxerr=9.99E-11 time=1.285
After sweep 4 energy=-138.940086063995 maxlinkdim=99 maxerr=1.00E-10 time=1.878
After sweep 5 energy=-138.940086076330 maxlinkdim=95 maxerr=9.97E-11 time=1.936
Final energy = -138.940086076330

julia> ITensors.NDTensors.timer
 ────────────────────────────────────────────────────────────────────────────────
                                         Time                   Allocations      
                                 ──────────────────────   ───────────────────────
        Tot / % measured:             14.4s / 45.0%           9.59GiB / 95.1%    

 Section                 ncalls     time   %tot     avg     alloc   %tot      avg
 ────────────────────────────────────────────────────────────────────────────────
 dmrg: eigsolve             990    4.72s  72.7%  4.76ms   7.27GiB  79.7%  7.52MiB
 dmrg: replacebond!         990    1.31s  20.2%  1.32ms   1.12GiB  12.2%  1.15MiB
 dmrg: position!            990    360ms  5.55%   364μs    571MiB  6.11%   590KiB
 dmrg: psi[b]*psi[b+1]      990    106ms  1.63%   107μs    182MiB  1.95%   188KiB
 ────────────────────────────────────────────────────────────────────────────────