Kernel density estimation in accelerators: Implementation and performance evaluation

Abstract

Kernel density estimation (KDE) is a popular technique used to estimate the probability density function of a random variable. KDE is considered a fundamental data smoothing algorithm, and it is a common building block in many scientific applications. In a previous work we presented S-KDE, an efficient algorithmic approach to compute KDE that outperformed other state-of-the-art implementations, providing accurate results in much reduced execution times. Its parallel implementation targeted multi- and many-core processors. In this work we present an OpenCL implementation of S-KDE, targeting modern accelerators in a portable way. We test our implementation on three accelerators from different manufacturers, achieving speedups around 5× compared to a hand-tuned serial version of S-KDE. We also analyze the performance of the code in these accelerators, to find out to what extent our code exploits their capabilities.