Parallelizing Irregular Algorithms on heterogeneous platforms

Exploring irregular algorithms and how to map them to parallel platforms using parallel programming models such as OpenMP and OpenACC. Prior work includes parallelizing MIT’s sFFT serial algorithm on multicore platform and GPUs using OpenMP and CUDA respectively. Current work builds on past results and aims to create a high-level directive-based software using OpenACC to parallelize sFFT on massively parallel processors. Click here to read more.

Parallelizing Bioinformatics Applications

We are building a project, called AccSequencer, which utilize the power of directive-based models such as OpenMP and OpenACC to accomplish fast alignment for thousands of gene queries over the human gnome in relative short period of time. We are using similar algorithm with BWA and BarraCUDA, but we utilize the power of GPU which makes our code more scalable than BWA. We plan to use directive-based models instead of low-level proprietary language such as CUDA to not only be able to reduce the steep learning curve and also be able to target multiple platforms. .

Creating Evaluation and Verification Suite for OpenACC

We are creating a set of tests for the OpenACC specifications to guide both users of the OpenACC language in the details of it’s use as well as compiler developers who are working to support the language. The suite will be able to be used to verify the conformity of the compilers as well, providing a means of diagnosing bugs in the compiler’s design.

Deep Learning for Image Classification

Exploring creation of high-level software abstraction for GPUs using deep learning frameworks such as Caffe, Theano and Torch in order to classify millions of images by using deep learning techniques. We use a state of the art embedded board - Nvidia Jetson Tegra TX1. The goal is for scientists to not require to learn low-level language such as CUDA to classify large scale images for deep learning but invest the time on doing science and developing newer algorithms.

Exploring Parallel Programming on Novel Platforms

Current and future parallel computer architectures are a diversified landscape of heterogeneous hardware that uses different types of memory and cores. To program these architectures there is an increasing set of parallel programming models that range from low-level to high-level approaches. This implies a challenge for programmers that want to exploit today’s or tomorrow’s parallel architectures, mainly due to the inherent need to re-write (completely or partially) an existing application to exploit the targeted hardware. We explore ways to exploit compiler technology and parallel programming models to create compiler-based tools that help to maintain a single code base and achieve performance portability to some degree.

Exploring Tool Support for Parallel Codes

We are working with profiling tools such as TAU, Score-p, and NVprof in order to profile large scientific applications. Upon installing these profiler tools on our local community cluster, we are able to generate profiles of short sample codes. We also made use of a variety of visualization tools such as Cube and the NVIDIA visual profiler. Our goal moving forward is to profile much more complex codes in order to determine the best way to optimize them in the future.

Scalable Graph Analytics and Machine Learning on Distributed, Heterogeneous Systems

We are leveraging distributed programming frameworks (such as Apache Spark) and high-level accelerator frameworks and libraries (such as OpenACC, OpenMP, PyOpenCL, etc) to bridge the gap between Big Data and HPC. We are applying our techniques to graph analytics and machine learning codes to demonstrate scalable performance on real-world applications. Our goal is to develop techniques to allow programmers to achieve scalable performance on distributed, heterogeneous systems using high-level languages and libraries.