Dark Memory and Accelerator-Rich System Optimization in the Dark Silicon
Era
release_s5gixmwzdrh7nhjwmfyp22qywq
by
Ardavan Pedram, Stephen Richardson, Sameh Galal, Shahar Kvatinsky, and
Mark A. Horowitz
2016
Abstract
The key challenge to improving performance in the age of Dark Silicon is how
to leverage transistors when they cannot all be used at the same time. In
modern SOCs, these transistors are often used to create specialized
accelerators which improve energy efficiency for some applications by 10-1000X.
While this might seem like the magic bullet we need, for most CPU applications
more energy is dissipated in the memory system than in the processor: these
large gains in efficiency are only possible if the DRAM and memory hierarchy
are mostly idle. We refer to this desirable state as Dark Memory, and it only
occurs for applications with an extreme form of locality.
To show our findings, we introduce Pareto curves in the energy/op and
mm^2/(ops/s) metric space for compute units, accelerators, and on-chip
memory/interconnect. These Pareto curves allow us to solve the power,
performance, area constrained optimization problem to determine which
accelerators should be used, and how to set their design parameters to optimize
the system. This analysis shows that memory accesses create a floor to the
achievable energy-per-op. Thus high performance requires Dark Memory, which in
turn requires co-design of the algorithm for parallelism and locality, with the
hardware.
In text/plain
format
Archived Files and Locations
application/pdf 1.3 MB
file_yq4qxmg26nhp3jn6nsqoidja2m
|
arxiv.org (repository) web.archive.org (webarchive) |
1602.04183v2
access all versions, variants, and formats of this works (eg, pre-prints)