Benchmarking TPU, GPU, and CPU Platforms for Deep Learning release_s7nrcp7lerdcndkxyyh5iyoxyy

by Yu Emma Wang, Gu-Yeon Wei, David Brooks

Released as a article .

2019  

Abstract

Training deep learning models is compute-intensive and there is an industry-wide trend towards hardware specialization to improve performance. To systematically benchmark deep learning platforms, we introduce ParaDnn, a parameterized benchmark suite for deep learning that generates end-to-end models for fully connected (FC), convolutional (CNN), and recurrent (RNN) neural networks. Along with six real-world models, we benchmark Google's Cloud TPU v2/v3, NVIDIA's V100 GPU, and an Intel Skylake CPU platform. We take a deep dive into TPU architecture, reveal its bottlenecks, and highlight valuable lessons learned for future specialized system design. We also provide a thorough comparison of the platforms and find that each has unique strengths for some types of models. Finally, we quantify the rapid performance improvements that specialized software stacks provide for the TPU and GPU platforms.
In text/plain format

Archived Files and Locations

application/pdf  2.6 MB
file_nvwaa6al4vfwlojcgmizedcndu
arxiv.org (repository)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  article
Stage   submitted
Date   2019-07-24
Version   v1
Language   en ?
arXiv  1907.10701v1
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: bfd75fe7-ca41-4bef-9486-27cb2f6cd1ff
API URL: JSON