Spark in me - Internet, data science, math, deep learning, philosophy(@snakers4). TensorStore for High-Performance, Scalable Array Storage In ML training engineering it gets complic

TensorStore for High-Performance, Scalable Array Storage In ML training engineering it gets complicated, when you deal with 100M+ datasets. Of course you can get away with basic tools like Redis / python's manager / PyTorch even has its version of Redis. Surprisingly, if you just implement a naïve disk database (i.e. hashed subfolders with a separately stored index), with sufficiently large dataset and small files you can run out of inodes. Of course, you can easily implement some custom simple chunking strategy (i.e. text data into a dataframe etc). I wonder if this tool can help with this part. - ai.googleblog.com/2022/09…nce.html If anyone has experience, please share.