Обложка канала

Spark in me - Internet, data science, math, deep learning, philosophy

2440 @snakers4

Канал про интересные мне темы - интернет - статистика - наука о данных Без рекламы и буллшита.

Spark in me - Internet, data science, math, deep learning, philosophy

6 лет назад
Открыть в
Notes from captain obvious:

Сomparing two GPUs with Tensor Cores, one of the single best indicators for each GPU’s performance is their memory bandwidth;

Most computation time on GPUs is memory access;

A100 compared to the V100 is 1.70x faster for NLP and 1.45x faster for computer vision;

Tesla A100 compared to the V100 is 1.70x faster for NLP and 1.45x faster for computer vision;

3-Slot design of the RTX 3090 makes 4x GPU builds problematic. Possible solutions are 2-slot variants or the use of PCIe extenders;

4x RTX 3090 will need more power than any standard power supply unit on the market can provide right now (this is BS, but power connectors may be an issue - I have 2000W PSU);

With BF16 precision, training might be more stable than with FP16 precision while providing the same speedups;

The new fan design for the RTX 30sV series features both a blower fan and a push/pull fan;

350W TDP;

Compared to an RTX 2080 Ti, the RTX 3090 yields a speedup of 1.57x for convolutional networks and 1.5x for transformers while having a 15% higher release price. Thus the Ampere RTX 30s delivers a pretty substantial improvement over the Turing RTX 20s series;

PCIe 4.0 and PCIe lanes do not matter in 2x GPU setups. For 4x GPU setups, they still do not matter much;

NVLink is not useful. Only useful for GPU clusters;

No info about power connector. But I believe the first gaming gpus use 2*6 pin plus maybe some adapter;

Despite heroic software engineering efforts, AMD GPUs + ROCm will probably not be able to compete with NVIDIA due to lacking community and Tensor Core equivalent for at least 1-2 years;

You will need +50Gbits/s network cards to gain speedups if you want to parallelize across machines;

So if you expect to run deep learning models after 300 days, it is better to buy a desktop instead of using AWS spot instances (also fuck off AWS and Nvidia with sla about data centers);