Spark in me - Internet, data science, math, deep learning, philosophy(@snakers4). Some Additional Thoughts on DDPDDP docs say that you cannot use multiple DDP processes on one GPU (o

Some Additional Thoughts on DDP

DDP docs say that you cannot use multiple DDP processes on one GPU (otherwise you would have to use their RPC framework, which is a bit too much hassle and complication, at least for now for me personally!).

Turns out you can. But the speed up was negligible in my case:

- GPU utilization 70-80% 1 process per GPU => GPU utilization 90%-100%;
- Total epoch time decreased by 3-5%;
- Interestingly, I tried 2 DDP workers on 2 GPUs vs 4 DDP workers on 2 GPUs ans 3 DDP workers on 2 GPUs (1 on master, 2 on other GPU), and 3 workers were much slower, so probably it is the compute bottleneck, not the communication bottleneck (we will see with Ampere GPUs!);
- Following advice from Nvidia, I also tried MPS (which is supposed help several processes run smoothly on one GPU), but I just could not make it work with DDP, it failed with cryptic errors at first after cuda.empty.cache() and then just randomly. Sad times;

#deep_learning