Communication efficient work distributions in stencil operation based applications

Loading...
Thumbnail Image

Downloads

0

Date issued

Authors

Schneible, Joseph
Říha, Lubomír
Malik, Maria
El-Ghazawi, Tarek
Alexandru, Andrei

Journal Title

Journal ISSN

Volume Title

Publisher

Wiley

Location

Signature

Abstract

In recent years, the use of accelerators in conjunction with CPUs, known as heterogeneous computing, hasbrou ght about significant performance increases for scientifi c applications. One of the best examples ofthis is lattice quantum chromodynamics (Q CD), a stencil operation based simulation. These simulationshave a large memory footprint necessitating the use of many graphics processing units (GPUs) in parallel.This requires the use of a heterogeneous cluster with one or more GPUs per node. In order to obtainoptimal performance, it is necessary to determine an efficient commu nication pattern bet ween G PUs onthe same node and between nodes. In this paper, we present a performance model based method for min-imizing the communication time of applications with stencil o perations, s uch a s l attice Q CD, o n hetero-geneous computing systems with a non-blocking InfiniBand interconnection network. The proposedmethod is able to increase the performance of the most computationally intensive kernel of lattice QCDby 25% due to improved overlapping of communication and computation. We also demonstrate that theaforementioned performance model and efficient communication patterns can be used to determine a costefficient heterogeneous system design for stencil operation based applications.

Description

Subject(s)

Citation

Concurrency and Computation: Practice and Experience. 2015, vol. 27, issue 13, p. 3262-3280.