Improved Text Recapitalization and Repunctuation Model for 4 Languages
- The model now can work with long inputs, 512 tokens or ca. 150 words;
- Inputs longer than 150 words are automatically processed in chunks;
- The bugs with newer PyTorch versions have been fixed;
- Model was trained longer with larger batches;
- Model size slightly reduced to 85 MB;
- The rest of model optimizations were deemed too high maintenance;
Link