Case Studyllmtext to speechh100mlops
SAIL Optimizes Orpheus-TTS For Higher Throughput
8.9
Relevance Score
SAIL evaluated the publicly available Orpheus-TTS deployment (served via Baseten) and applied system-level optimizations to characterize and improve real-time inference performance. Baseline sustained about 24 concurrent real-time streams per NVIDIA H100 GPU, and after optimizations sustained 216 streams (~10×), reducing equivalent annual accelerator spend from about $1.4M to $140k for a 100-GPU capacity.


