一夜普及，单显卡已能运行gpt3模型，速度100x

48次阅读

共计 352 个字符，预计需要花费 1 分钟才能阅读完成。

flexgen https://github.com/FMInference/FlexGenRunning large language models like OPT-175B/g p t-3 on a single GPU. Up to 100x faster than other offloading systems. 在单个 gpu 上运行大型语言模型，类似 opt-175b/g p t-3，相比类似系统有高达 100x 速度提升 Hardware: an NVIDIA T4 (16GB) instance on GCP with 208GB of DRAM and 1.5TB of SSD. 硬件：GCP(Google Cloud Platform) 上一块 16gb t4，208gb 内存，1.5tb ssd。是不是要那么多内存存疑 12 小时直接 2.3k star，什么是国际热度，战术后仰

正文完