togethercomputer/RedPajama-Data
The RedPajama-Data repository contains code for preparing large datasets for training large language models.
Language: Python
Stars: 453 Issues: 3 Forks: 25
github.com/togethe…ama-Data
The RedPajama-Data repository contains code for preparing large datasets for training large language models. - GitHub - togethercomputer/RedPajama-Data: The RedPajama-Data repository contains code ...