Обложка канала

Spark in me - Internet, data science, math, deep learning, philosophy. Страница 25

2440 @snakers4

Канал про интересные мне темы - интернет - статистика - наука о данных Без рекламы и буллшита.

  • Spark in me - Internet, data science, math, deep learning, philosophy

    Ukrainian Open STT 1000 Hours

    Following the path of Open STT in Russian, now you can enjoy a similar dataset in Ukrainian:

    - Torrent Link
    - GitHub Link

    Congratulations to our Ukrainian friends for finally publishing a diverse easily downloadable dataset!

    Their pages / dataset UX is still a bit rough on the edges, but compared how fast for example Common Voice accumulates data (130 hours for Russian and 43 hours for Ukrainian), UA Open STT and Open STT remain the best resources for respective languages to date.

    Also unlike the majority of STT datasets which are (i) behind a paywall or sponsored by corporations (ii) have limited scope / domains (iii) fit some sort of agenda (i.e. use more GPUs than necessary, use our bloated tools, etc), this dataset is legit made by real people.

    Also recently corporations have taken up the trend of rehashing publicly available data, which is cool, but unique data is still nowhere to be seen for obvious reasons (except for Common Voice, which is decent only for English).

    #dataset
  • Spark in me - Internet, data science, math, deep learning, philosophy

    Excel in Notebooks

    This notebook tool looks awesome.
    It enables you to essentially replicate most useful excel functionality within notebook pandas dataframes:

    - https://github.com/quantopian/qgrid

    Was looking for something like this for a long time!

    Similar tools I saw before required some fiddling (just search spreadsheets / excel on the channel), this one just works with an existing pandas dataframe! You can also hack in HTML elements and there are ample callbacks for your custom functionality.

    #data_science
  • Spark in me - Internet, data science, math, deep learning, philosophy

  • Реклама

  • Spark in me - Internet, data science, math, deep learning, philosophy

  • Spark in me - Internet, data science, math, deep learning, philosophy

    MS continues to develop their ONNX runtime, they just rolled out a new website

    - https://www.onnxruntime.ai/

    Looks like a heavy nod towards PyTorch instructions
  • Spark in me - Internet, data science, math, deep learning, philosophy

    I just love enterprise-grade hardware
  • Spark in me - Internet, data science, math, deep learning, philosophy

    Multiple pinned posts feature is awesome, please suggest which posts to pin (a link)?
  • Spark in me - Internet, data science, math, deep learning, philosophy

    📌

    A small addition to the above post. I did not mention storinators (which are just configurable servers in a custom case ofc).

    The cheapest storinator with 16 drive bays and 10 gig ethernet goes for US$3.5k+ wo drives.

    So DIY with 10+ drives is 5x cheaper even with 10 gig network / ECC RAM.

    If you need 60 drives then there are no options though.
  • Spark in me - Internet, data science, math, deep learning, philosophy

    ....

    A solution? A set of OSS components and off-the-shelf available tools:

    - Any MB with at least one or two 10 Gbit/s ethernet ports (the most expensive part) and at least 8-10 SATA slots (PCIE risers can add 2-4 more ports per riser). There are several older mATX SuperMicro boards and a lot of newer overpriced "gaming" boards in this segment

    - The cheapest 10 Gbit/s switch or some used "large" switch if you know where / how to buy one

    - Any off-the-shelf Fractal Design computer case with 8-10 HDD bays (do not forget 2 "free" 5 inch bays that can be used for 3.5' drives!) or similar (Fractal Design are a bit expensive, but build quality is awesome)

    - Any suitable RAM / CPU (for SuperMicro you will have to buy Xeon and ECC RAM)

    - Linux as OS (any flavour you like), mdadm for RAID arrays, samba for local sharing

    - Just mount your drives locally via fstab (just an example):

    //192.168.2.5/share /mnt/share/ cifs username=YOUR_USER,password=YOUR_PASS,iocharset=utf8,uid=YOUR_UID,gid=YOUR_GID,dir_mode=0775,file_mode=0775


    - Use any other OSS Linux software, change it as you wish, use any drives you want!

    - You can have mdadm, zfs, luks, lvm - whatever you want in any combination;

    If you just count the bare minimum cost w/o drives, probably you can get away with US$600-700 per 8 - 16 drives

    #hardware
  • Spark in me - Internet, data science, math, deep learning, philosophy

    Local Network Storage Solution - Hacked

    When you do a lot of experiments, depending on your situation a local network shared storage (w or w/o SSD cache) and / or smaller local SSD / NVME usually can cover your needs.

    If you google something like "NAS", you will find ridiculously overpriced / underpowered devices, both in B2C and enterprise segments.

    All jokes aside, you may find "small" devices for 4 disks starting from US$500-700 with laughable specs (i.e. no 10 Gbit/s ethernet), but with a lot of old proprietary bloatware (i.e. some clunky GUIs not updated for years).

    In the "serious" segment - a rack-mounted 12-drive device may cost you north of US$7-10k (luckily with 10 Gbit/s ethernet) . All these prices are without drives. This is totally insane and ridiculous!

    ....
  • Spark in me - Internet, data science, math, deep learning, philosophy

    I usually do not repost off topic content, but this one is just too good.

    Tldr - corrupt politicians and idiots dismantle nuclear plants (Germany) and over invest into expensive bs "renewables" (USA) when nuclear has always been the best real available option.

    This has always been obvious to me just via eye balling, looks like now there is a proper video:

    https://youtu.be/Jzfpyo-q-RM
  • Spark in me - Internet, data science, math, deep learning, philosophy

    Interesting Loss Weighting Idea - Gradient Adaptive Factor

    When you have 2+ losses in your NN, sometimes loss weighting is not really straightforward. Usually total loss is:

    loss = loss_0 + lambda_1 * loss_1 + ...


    Of course you can tune these "lambdas" manually or using some naïve NAS (or some ad hoc heuristic, i.e. this loss more important), but all these approaches have 2 drawbacks:

    - Slow / compute intensive / ad hoc;
    - There is no guarantee that these values are always optimal;

    Usually when something is not stable (and multiple losses often explode on init) some sort of adaptive clipping is employed. I just stumbled upon a technique called Gradient Adaptive Factor, see an example here.

    The idea is simple - balance your losses so that their gradient sizes are roughly similar.

    #deep_learning
  • Spark in me - Internet, data science, math, deep learning, philosophy

    2021 DS / ML Digest 01

    Highlights
    :

    - Rethinking evaluation in ASR
    - DALL·E: Creating Images from Text
    - Deformable Neural Radiance Fields
    - Analysis of 100 Weeks of Curated AI News

    Please like / share / repost!

    https://spark-in.me/post/2021_ds_ml_digest_01

    #digest
  • Spark in me - Internet, data science, math, deep learning, philosophy

    ​​IceGiant ProSiphon Elite - гигантский кулер-термосифон поступил в продажу

    Еще в декабре 2019 мы опубликовали новость по поводу разработки IceGiant ProSiphon Elite, который обещал свершить революцию. Гигантский кулер-термосифон ориентирован на охлаждение процессоров HEDT (high-end desktop). Изначально продажи планировались на весну 2020, но что-то пошло не так. Однако сегодня мы рады сообщить, что IceGiant ProSiphon Elite уже можно купить в рознице.

    В отличие от классических тепловых трубок, концепция термосифона разделяет зоны испарения и конденсации, что позволяет системе работать более эффективно под экстремальной нагрузкой (мы описывали конструкцию в оригинальной новости). Контур охлаждения работает по законам физики, поэтому помпа ему не требуется. По сравнению с СВО, данный подход снижает сложность и увеличивает срок службы. Как и устраняет зачастую раздражающий шум помпы.

    Визуально ProSiphon Elite можно спутать с 240-мм теплообменником, если бы не подошва, которую можно установить на сокеты AMD AM4, TR4 и sTRX4 (так что даже Threadripper поддерживаются) и сокеты Intel LGA 1200, 115x, 1366, 2011(-3) и 2066. Модули памяти под кулером не должны превышать по высоте 48 мм. Как можно видеть, кулер состоит из вертикальных ребер охлаждения, видны и три горизонтальных зоны конденсации. Четыре 120-мм вентилятора в конфигурации тяни-толкай продувают через кулер воздух. Вентиляторы ШИМ работают со скоростью до 2.300 об/мин.

    Перейти к новости
  • Spark in me - Internet, data science, math, deep learning, philosophy

    When deciding how to cool a new ThreadRipper I considered this thing (it is pricey, huge and impossible to buy), but in the end just opted for a dual-fan Noctua.

    This is an exciting option for high-end workstations (32 or 64 cores), maybe there is anyone who actually tried this thing?

    I have not yet seen real independent reviews ...
  • Реклама

  • Spark in me - Internet, data science, math, deep learning, philosophy

  • Spark in me - Internet, data science, math, deep learning, philosophy

    Silero VAD Article on Habr (RU)

    - https://habr.com/ru/post/537274/

    Its contents are mostly adapted and translated silero-vad README page.

    If you have a Habr account, please ⬆️
  • Spark in me - Internet, data science, math, deep learning, philosophy

    will be doing a proper PR release soon
    this instrument is free under MIT license =)