en_v6 speech-to-text models
📎 Please see the metrics here
📎 A large number of new validation datasets added for dialects and VOIP
📎 The model family now includes variations of small and xlarge models
📎 Single digit quality gains both for CE and EE models, the gains are less pronounced with EE models
🗜 Best gains reserved for xsmall models, which will not be public for the time being and have almost reached small models in terms of quality, but are 2x smaller (14M params)
⚠️ The models seem to be fit quite well on the data, but the returns are diminishing compared to V3 => V4 => V5. We are already investigating new radical ways to make the models better, stay tuned
📦 Also we have started working on packaging the utils for the public Silero models in a pip package (will work similarly to torch.hub.load)