Spark in me - Internet, data science, math, deep learning, philosophy(@snakers4). Trying PyTorch DDP Again Just a quick note. DDP expects to have a gradient / backward pass on each w

Spark in me - Internet, data science, math, deep learning, philosophy

2440 @snakers4

Открыть

Канал про интересные мне темы - интернет - статистика - наука о данных Без рекламы и буллшита.

Spark in me - Internet, data science, math, deep learning, philosophy

@snakers4 6 лет назад

Trying PyTorch DDP Again

Just a quick note. DDP expects to have a gradient / backward pass on each worker (or not to have it on all workers). Otherwise it hangs.

So do not forget to use grad scaler with native PyTorch AMP.

In my particular case, DDP worked well with AMP, but when I added grad scaler it stopped exploding / de-syncing and started converging even faster. If only I had GPUs with FP16 support =)

I guess nice work, Nvidia?

#deep_learning