Spark in me - Internet, data science, math, deep learning, philosophy(@snakers4). Speeding Up Your PyTorch Networks for CPU InferenceKey ingredients:- PyTorch native network- CPU inf

Speeding Up Your PyTorch Networks for CPU Inference

Key ingredients:

- PyTorch native network
- CPU inference / deploy
- JIT, ONNX, int8 quantization

Some notes on how much you can speed up your networks mostly out of the box with very few tweaks. These conclusions hold for very small networks (1M params, 10-30 layers, and medium-sized networks (20M params, 20-40 layers):

- Just using JIT can give you up to a 30% boost. With smaller batch-sizes (and feature map sizes) there is a smaller boost - 5-10%. Boost saturates with a certain batch-size / feature map size;

- Just using int8 quantization can give you up to a 30% boost. Same caveats as with JIT;

- Same with JIT+ int8, total speed ups up to 50%, also more equal speed ups for small batches and feature maps;

- Using ONNX however is generally faster than PyTorch out-of-the-box, but it is most pronounced for small feature-maps, e.g. you can get a 40% speed-up for small batch and zero speed-up for a large batch;

- ONNX + int8 does not seem to work in PyTorch now. We have not tried porting networks manually from ONNX to quantized ONNX;

We are not comparing apples to apples here, but ONNX inference with quantization seems the most promising provided its wide support of back-ends.

#deep_learning