Paper Details
Reference:
Reza Mokhtari and Michael Stumm,
"S-L1: A software-based GPU L1 cache that outperforms the hardware L1 for data processing applications",
In Proceedings of the 2015 International Symposium on Memory Systems (MEMSYS'15), Washington DC, USA, Association for Computing Machinery, October, 2015, pp. 121–132.
Download:
Abstract:
Implementing a GPU L1 data cache entirely in software to usurp the hardware L1 cache sounds counter-intuitive. However, we show how a software L1 cache can perform significantly better than the hardware L1 cache for data-intensive streaming (i.e., "Big-Data") GPGPU applications. Hardware L1 data caches can perform poorly on current GPUs, because the size of the L1 is far too small and its cache line size is too large given the number of threads that typically need to run in parallel.
Our paper makes two contributions. First, we experimentally characterize the performance behavior of modern GPU memory hierarchies and in doing so identify a number of bottlenecks. Secondly, we describe the design and implementation of a software L1 cache, S-L1. On ten streaming GPGPU applications, S-L1 performs 1.9 times faster, on average, when compared to using the default hardware L1, and 2.1 times faster, on average, when compared to using no L1 cache.
Keywords:
GPU L1 cache, GPU memory hierarchy, GPU memory management, data-intensive streaming, Big-Data GPGPU application
Reference Info:
DOI: 10.1145/2818950.2818969
ISBN: 9781450336048
OCLC: 6011156941
BibTeX:
@inproceedings(Mokhtari-MemSys15, author = {Reza Mokhtari and Michael Stumm}, title = {{S-L1}: {A} software-based {GPU L1} cache that outperforms the hardware {L1} for data processing applications}, booktitle = {Proceedings of the 2015 International Symposium on Memory Systems (\textbf{MEMSYS'15})}, location = {Washington DC, USA}, publisher = {Association for Computing Machinery}, month = {October}, year = {2015}, pages = {121-132}, doi = {10.1145/2818950.2818969}, isbn = {9781450336048}, keywords = {GPU L1 cache, GPU memory hierarchy, GPU memory management, data-intensive streaming, Big-Data GPGPU application} )