Обложка канала

Spark in me - Internet, data science, math, deep learning, philosophy

2440 @snakers4

Канал про интересные мне темы - интернет - статистика - наука о данных Без рекламы и буллшита.

Spark in me - Internet, data science, math, deep learning, philosophy

4 года назад
Открыть в
TensorStore for High-Performance, Scalable Array Storage In ML training engineering it gets complicated, when you deal with 100M+ datasets. Of course you can get away with basic tools like Redis / python's manager / PyTorch even has its version of Redis. Surprisingly, if you just implement a naïve disk database (i.e. hashed subfolders with a separately stored index), with sufficiently large dataset and small files you can run out of inodes. Of course, you can easily implement some custom simple chunking strategy (i.e. text data into a dataframe etc). I wonder if this tool can help with this part. - ai.googleblog.com/2022/09…nce.html If anyone has experience, please share.
TensorStore for High-Performance, Scalable Array Storage

Posted by Jeremy Maitin-Shepard and Laramie Leavitt, Software Engineers, Connectomics at Google Many exciting contemporary applications o...

Google AI Blog