Spark in me - Internet, data science, math, deep learning, philosophy, страница 27, все посты канала

Spark in me - Internet, data science, math, deep learning, philosophy

Getting The Most Out of AMP

We were digging deep into understanding how to utilize AMP properly. Surprise-surprise:

- It works better with large networks, wide networks
- It works poorly with separable convolutions
-You need a bit more involved design considerations than just "have your channels divisible by 8":

For matrix multiplication:
On FP16 inputs, all three dimensions (M, N, K) must be multiples of 8.

For convolution:
On FP16 inputs, input and output channels must be multiples of 8.

Also:

Prefer dense math operations.
For example, vanilla convolutions have much higher arithmetic intensity than depth-wise separable convolutions.

Also:

Choose mini-batch to be a multiple of 8
Choose linear layer dimensions to be a multiple of 8
Choose convolution layer channel counts to be a multiple of 8
For classification problems, pad vocabulary to be a multiple of 8
For sequence problems, pad the sequence length to be a multiple of 8

Please see
- https://docs.nvidia.com/deeplearning/performance/mixed-precision-training/index.html

#deep_learning

Spark in me - Internet, data science, math, deep learning, philosophy

Trying Out New Ampere GPUs and MIG (RU)

Играемся с Новыми GPU на базе Ampere от Nvidia и пробуем MIG

https://habr.com/ru/post/530986/

Please like / share / repost!

#hardware
#deep_learning

Spark in me - Internet, data science, math, deep learning, philosophy

First Experience With A100 GPUs

(0)
Under 100% load they are indeed 15-20 degrees cooler, i.e. 60 - 70C (similar to 3090).

(1)

./gpu_burn 120

- 1080 Ti 8000 - 8,500
- Titan X (Maxwell) ~4,300
- 3090 (Ampere) ~16,500
- A100 (wo MIG) ~16,700 Gflop/s

./gpu-burn -tc 120

- 3090 (Ampere) ~38,500
- A100 (wo MIG) ~81,500 Gflop/s

(2)
Using MIG is kind of straight-forward, but obviously it does not work properly with gpu-burn out of the box.

Obviously, the most interesting thing is to test MIG 2,3,7 setups against 2x 3090 / 1080 Ti / Titan X.

#deep_learning

Spark in me - Internet, data science, math, deep learning, philosophy

2020 DS / ML Digest 13

Highlights:

- Silero models now has an experimental Ukrainian model
- CV inference 101
- High-Resolution 3D Human Digitization
- Background Features in Google Meet
- How to Build an Open-Domain Question Answering System?
- A case for … Keeping encryption elitist
- Objectron dataset
- See the above posts about 3090 ... and hopefully new posts comparing Titan X / 1080 Ti / 3090 / A100 =)

Please like / share / repost!

https://spark-in.me/post/2020_ds_ml_digest_13

#digest

Spark in me - Internet, data science, math, deep learning, philosophy

Some More Observations About 3090

- torch.cuda.empty_cache() does not seem to do anything for networks with variable depth / sequence length / girth

- DDP + AMP ... seems 3x slower instead of 2x faster (lol) for some networks, we are looking for the cause

- For some networks, 2x speed bump using AMP out of the box

- Now DDP prevents me from using 2 processes on 1 GPU with

RuntimeError: NCCL error in: /opt/conda/conda-bld/pytorch_1603729096996/work/torch/lib/c10d/ProcessGroupNCCL.cpp:784, invalid usage, NCCL version 2.7.8

- Looks like they are much more efficient in parallelizing and keeping high utilization (80-100%), same networks train ~2x-3x faster compared to Titan X (Maxwell) and 1080 Ti without any tweaks to the code

- Same networks use more RAM with 3090 compared to 1080 Ti (?)

- I kind of was afraid that these cards would be under-utilized (50%), but they are just faster. Magic

#deep_learning

Spark in me - Internet, data science, math, deep learning, philosophy

First Experience With 3090 Gpus (0) Under 100% load they are indeed 15-20 degrees cooler. (1) Lol, gpu-burn shows strange results using default settings - 2x less Gflops compared to 1080 Ti ./gpu_burn 600: - 1080 Ti 8000 - 8500 - Titan X (Maxwell) ~4300…

Spark in me - Internet, data science, math, deep learning, philosophy

First Experience With 3090 Gpus

(0)
Under 100% load they are indeed 15-20 degrees cooler.

(1)
Lol, gpu-burn shows strange results using default settings - 2x less Gflops compared to 1080 Ti

./gpu_burn 600:

- 1080 Ti 8000 - 8500
- Titan X (Maxwell) ~4300
- 3090 (Ampere) ~3000

 ./gpu-burn -tc 600

- 3090 (Ampere) ~3000

Idk, maybe it's me, maybe it's gpu-test, need to test on real tasks!

PS
I had an old image, maybe bumping CUDA / CUDNN will help.

#deep_learning

Spark in me - Internet, data science, math, deep learning, philosophy

#интересно
Оказывается, iPavlova больше нет...
https://www.facebook.com/olga.kairova/posts/10157719960593034

Spark in me - Internet, data science, math, deep learning, philosophy

Some Additional Thoughts on DDP

DDP docs say that you cannot use multiple DDP processes on one GPU (otherwise you would have to use their RPC framework, which is a bit too much hassle and complication, at least for now for me personally!).

Turns out you can. But the speed up was negligible in my case:

- GPU utilization 70-80% 1 process per GPU => GPU utilization 90%-100%;
- Total epoch time decreased by 3-5%;
- Interestingly, I tried 2 DDP workers on 2 GPUs vs 4 DDP workers on 2 GPUs ans 3 DDP workers on 2 GPUs (1 on master, 2 on other GPU), and 3 workers were much slower, so probably it is the compute bottleneck, not the communication bottleneck (we will see with Ampere GPUs!);
- Following advice from Nvidia, I also tried MPS (which is supposed help several processes run smoothly on one GPU), but I just could not make it work with DDP, it failed with cryptic errors at first after cuda.empty.cache() and then just randomly. Sad times;

#deep_learning

Spark in me - Internet, data science, math, deep learning, philosophy

https://pc-01.tech/vram-drive2/

Spark in me - Internet, data science, math, deep learning, philosophy

2020-11-03 [Experimental] Ukrainian Model V1 Released

- An experimental model
- Trained from a small community contributed corpus
- New Full model size reduced to 85 MB
- New Quantized model is only 25 MB
- No TF or ONNX models
- Will be re-released a fine-tuned model from a larger - Russian corpus upon V3 release

https://github.com/snakers4/silero-models

Spark in me - Internet, data science, math, deep learning, philosophy

Silero Models EN V2 Released

Almost forgot to announce it!

- New EN V2 model - https://github.com/snakers4/silero-models/issues/20#issuecomment-720932378
- Quality benchmarks - https://github.com/snakers4/silero-models/wiki/Quality-Benchmarks#en-v2

A minor release, i.e. other models not affected.

English model was made much more robust to certain dialects. The model should generalize much better in general.

Spark in me - Internet, data science, math, deep learning, philosophy

Nice Links About PyTorch Training Optimization

(to continue several latest posts)

- https://sergey.party/2020/10/13/pytorch-performance-guide.html
- https://pytorch.org/tutorials/intermediate/memory_format_tutorial.html
- https://github.com/pytorch/pytorch/wiki/Operators-with-Channels-Last-support

#deep_learning

Spark in me - Internet, data science, math, deep learning, philosophy

Trying PyTorch DDP Again

Just a quick note. DDP expects to have a gradient / backward pass on each worker (or not to have it on all workers). Otherwise it hangs.

So do not forget to use grad scaler with native PyTorch AMP.

In my particular case, DDP worked well with AMP, but when I added grad scaler it stopped exploding / de-syncing and started converging even faster. If only I had GPUs with FP16 support =)

I guess nice work, Nvidia?

#deep_learning

Spark in me - Internet, data science, math, deep learning, philosophy

Torch Dataloader With Workers Leaking RAM

Everyone has faced this issue for HUGE datasets. Is is just because of python itself. If you faced it - you know what I am talking about.

I do not claim this to be a definitive solution, but it worked for me.

import time
import torch
import random
import string
from multiprocessing import Manager
from torch.utils.data import Dataset, DataLoader


def id_gen(size=6,
           chars=string.ascii_uppercase):
    return ''.join(random.choice(chars)
                   for _ in range(size))


class DataIter(Dataset):
    def __init__(self):
        m = Manager()
        self.data = m.dict({i: {'key': random.random(),
                                'path': id_gen(size=10)}
                            for i in range(1000000)})

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        data = self.data[idx]
        return torch.tensor(data['cer']), data['path']


train_data = DataIter()

train_loader = DataLoader(train_data,
                          batch_size=60,
                          shuffle=False,
                          drop_last=False,
                          pin_memory=False,
                          num_workers=10)

tic = time.time()

for i, item in enumerate(train_loader):
    if (i + 1) % 1000 == 0:
        toc = time.time()
        print(f"Time for 1000 batches in {toc - tic} s")
        tic = time.time()

Be careful with manager dict though. Though it behaves like a dict, if you just try to iterate over its keys, it will be slow, because it has some overhead for inter-process communication.

If you just need the whole dict, it has some methods to access the whole dict in one big object, which is fast.

#pytorch
#deep_learning

Spark in me - Internet, data science, math, deep learning, philosophy

2020 DS / ML Digest 2 Highlights - New STT benchmarks from FAIR - Analysis of GPT-2 by thegradient - Google’s Meena, a 2.6 billion parameter end-to-end trained neural conversational model (not AGI ofc) - OpenAI now uses PyTorch - LaserTag - cool idea on…

Spark in me - Internet, data science, math, deep learning, philosophy

https://youtu.be/CyfBEULRprM

Spark in me - Internet, data science, math, deep learning, philosophy

2020 DS / ML Digest 12

Highlights:

- Neural network visualization tool
- Russian large GPT by Sber
- Some tests of 3090
- Large radiology dataset
- New wave of space-tech
- Containerization landscape

Please like / share / repost!

https://spark-in.me/post/2020_ds_ml_digest_12

#digest

Spark in me - Internet, data science, math, deep learning, philosophy. Страница 27

Spark in me - Internet, data science, math, deep learning, philosophy

Spark in me - Internet, data science, math, deep learning, philosophy

Spark in me - Internet, data science, math, deep learning, philosophy

Реклама

Spark in me - Internet, data science, math, deep learning, philosophy

Spark in me - Internet, data science, math, deep learning, philosophy

Spark in me - Internet, data science, math, deep learning, philosophy

Spark in me - Internet, data science, math, deep learning, philosophy

Spark in me - Internet, data science, math, deep learning, philosophy

Spark in me - Internet, data science, math, deep learning, philosophy

Spark in me - Internet, data science, math, deep learning, philosophy

Spark in me - Internet, data science, math, deep learning, philosophy

Spark in me - Internet, data science, math, deep learning, philosophy

Spark in me - Internet, data science, math, deep learning, philosophy

Spark in me - Internet, data science, math, deep learning, philosophy

Spark in me - Internet, data science, math, deep learning, philosophy

Реклама

Spark in me - Internet, data science, math, deep learning, philosophy

Spark in me - Internet, data science, math, deep learning, philosophy

Spark in me - Internet, data science, math, deep learning, philosophy