Обложка канала

Spark in me - Internet, data science, math, deep learning, philosophy

2440 @snakers4

Канал про интересные мне темы - интернет - статистика - наука о данных Без рекламы и буллшита.

Spark in me - Internet, data science, math, deep learning, philosophy

6 лет назад
Открыть в
Getting The Most Out of AMP

We were digging deep into understanding how to utilize AMP properly. Surprise-surprise:

- It works better with large networks, wide networks
- It works poorly with separable convolutions
-You need a bit more involved design considerations than just "have your channels divisible by 8":

For matrix multiplication:
On FP16 inputs, all three dimensions (M, N, K) must be multiples of 8.

For convolution:
On FP16 inputs, input and output channels must be multiples of 8.

Also:

Prefer dense math operations.
For example, vanilla convolutions have much higher arithmetic intensity than depth-wise separable convolutions.

Also:

Choose mini-batch to be a multiple of 8
Choose linear layer dimensions to be a multiple of 8
Choose convolution layer channel counts to be a multiple of 8
For classification problems, pad vocabulary to be a multiple of 8
For sequence problems, pad the sequence length to be a multiple of 8

Please see
- https://docs.nvidia.com/deeplearning/performance/mixed-precision-training/index.html


#deep_learning