Communication efficient work distributions in stencil operation based applications
Loading...
Downloads
0
Date issued
Authors
Schneible, Joseph
Říha, Lubomír
Malik, Maria
El-Ghazawi, Tarek
Alexandru, Andrei
Journal Title
Journal ISSN
Volume Title
Publisher
Wiley
Location
Signature
Abstract
In recent years, the use of accelerators in conjunction with CPUs, known as heterogeneous computing, hasbrou ght about significant performance increases for scientifi c applications. One of the best examples ofthis is lattice quantum chromodynamics (Q CD), a stencil operation based simulation. These simulationshave a large memory footprint necessitating the use of many graphics processing units (GPUs) in parallel.This requires the use of a heterogeneous cluster with one or more GPUs per node. In order to obtainoptimal performance, it is necessary to determine an efficient commu nication pattern bet ween G PUs onthe same node and between nodes. In this paper, we present a performance model based method for min-imizing the communication time of applications with stencil o perations, s uch a s l attice Q CD, o n hetero-geneous computing systems with a non-blocking InfiniBand interconnection network. The proposedmethod is able to increase the performance of the most computationally intensive kernel of lattice QCDby 25% due to improved overlapping of communication and computation. We also demonstrate that theaforementioned performance model and efficient communication patterns can be used to determine a costefficient heterogeneous system design for stencil operation based applications.
Description
Subject(s)
Citation
Concurrency and Computation: Practice and Experience. 2015, vol. 27, issue 13, p. 3262-3280.