Key ingredients:
- PyTorch native network
- CPU inference / deploy
- JIT, ONNX, int8 quantization
Some notes on how much you can speed up your networks mostly out of the box with very few tweaks. These conclusions hold for very small networks (1M params, 10-30 layers, and medium-sized networks (20M params, 20-40 layers):
- Just using
JIT can give you up to a 30% boost. With smaller batch-sizes (and feature map sizes) there is a smaller boost - 5-10%. Boost saturates with a certain batch-size / feature map size;- Just using
int8 quantization can give you up to a 30% boost. Same caveats as with JIT;- Same with
JIT+ int8, total speed ups up to 50%, also more equal speed ups for small batches and feature maps;- Using
ONNX however is generally faster than PyTorch out-of-the-box, but it is most pronounced for small feature-maps, e.g. you can get a 40% speed-up for small batch and zero speed-up for a large batch;-
ONNX + int8 does not seem to work in PyTorch now. We have not tried porting networks manually from ONNX to quantized ONNX;We are not comparing apples to apples here, but ONNX inference with quantization seems the most promising provided its wide support of back-ends.
#deep_learning