We are grateful to NVIDIA Corporation for the donation of NVIDIA GPU such as Tesla K40 and a GTX Titan X (Maxwell) GPU cards that we use in our research projects.

Parallelizing Irregular Algorithms on heterogeneous platforms

Exploring irregular algorithms and how to map them to parallel platforms using parallel programming models such as OpenMP and OpenACC. Prior work includes parallelizing MIT’s sFFT serial algorithm on multicore platform and GPUs using OpenMP and CUDA respectively. Current work builds on past results and aims to create a high-level directive-based software using OpenACC to parallelize sFFT on massively parallel processors. Click here to read more.

Parallelizing Bioinformatics Applications

We are building a project, called AccSequencer, which utilize the power of directive-based models such as OpenMP and OpenACC to accomplish fast alignment for thousands of gene queries over the human gnome in relative short period of time. We plan to use directive-based models instead of low-level proprietary language such as CUDA to not only be able to reduce the steep learning curve and also be able to target multiple platforms.

###Creating Evaluation and Verification Suite for OpenMP and OpenACC This project creates a set of test cases for OpenACC 2.5 and OpenMP 4.5 and beyond to validate and verify various compiler implementations, providing a means of diagnosing bugs within the compiler’s design and identifying ambiguities within the specification of the programming models.

Deep Learning for CANDLE benchmark

In this project we are exploring design of Deep Learning models on the three pilot cases studies of CANDLE benchmark set of interest to NCI and DOE labs. The project targets building predictive models that will predict drug response to new tumor types, build models to understand mutations responsible for cancer and build models to mine volumes of patients’ data on the study of patterns of tumor types.

Exploring Tool Support for Parallel Codes

We are working with profiling tools such as TAU, Score-p, and NVprof in order to profile large scientific applications. Upon installing these profiler tools on our local community cluster, we are able to generate profiles of short sample codes. We also made use of a variety of visualization tools such as Cube and the NVIDIA visual profiler. Our goal moving forward is to profile much more complex codes in order to determine the best way to optimize them in the future.

Scalable Graph Analytics and Machine Learning on Distributed, Heterogeneous Systems

We are leveraging distributed programming frameworks (such as Apache Spark) and high-level accelerator frameworks and libraries (such as OpenACC, OpenMP, PyOpenCL, etc) to bridge the gap between Big Data and HPC. We are applying our techniques to graph analytics and machine learning codes to demonstrate scalable performance on real-world applications. Our goal is to develop techniques to allow programmers to achieve scalable performance on distributed, heterogeneous systems using high-level languages and libraries.