For sequence learning tasks that utilize recurrent neural networks, scale is both the key to accuracy and the bane of speed. We'll take existing state-of-the-art language modeling techniques and speed them up by orders of magnitude without losing accuracy. The tactics include injecting flexibility into NVIDIA's black box cuDNN LSTM; replacing the LSTM with the more parallelized and customizable Quasi-Recurrent Neural Network; reducing the softmax bottleneck using the adaptive softmax; and investigating individual function efficiency on the GPU using the NVIDIA Visual Profiler. The end result is a general and scalable language model framework that can achieve state-of-the-art quality on the WikiText-103 dataset (103 million words) in under 12 hours using a single NVIDIA Volta V100. The resulting PyTorch codebase is open source for experimentation and extension.