FMInference/FlexGen
Running large language models like OPT-175B/GPT-3 on a single GPU. Up to 100x faster than other offloading systems.
Language: Python
#chatgpt #deep_learning #gpt_3 #high_throughput #large_language_models #machine_learning #offloading #opt
Stars: 1799 Issues: 11 Forks: 72
https://github.com/FMInference/FlexGen
Running large language models like OPT-175B/GPT-3 on a single GPU. Up to 100x faster than other offloading systems. - GitHub - FMInference/FlexGen: Running large language models like OPT-175B/GPT-3...