https://pytorch.org/tutorials/recipes/recipes/benchmark.html#pytorch-benchmark
Looks a bit over-complicated at the first glance (why provide classes for random tensor generation, I have no idea), but it has a few very nice features:
- Automated
num_threads handling- Automated CUDA synchronization
- Report generation, storing the results, comparing the results
But I suppose there is nothing wrong just using
%%timeit manually setting num_threads.#deep_learning