Blog on OpenACC Validation and Verification (V&V) Testsuite

July 15, 2018

Purpose of the V&V Testsuite

A comprehensive and community-driven OpenACC validation suite is an essential tool for computing facilities that procure and evaluate large production and experimental systems to test OpenACC compilers’ implementations, its coverage of a given specification, conformance to the specification and consistency with the definition of the features.

Testsuite available on Github

The OpenACC Validation and Vetification Testsuite is available on GitHub for anyone to download and use.

Contact

If you are interested to contribute to the testsuite either functional test cases or use cases or have questions, please contact Sunita Chandrasekaran (schandra@udel.edu) and Kyle Friedline (utimatu@udel.edu) at the University of Delaware or make a pull request on our GitHub.

Project Goal

Throughout the development of the testsuite for OpenACC, our focus has been on reliability. The goal is to ensure that the kernels will properly translate to device code without any sort of failure. However, with the large scope of the OpenACC specifications and the highly difficult task of porting standard Fortran/C/C++ code to GPU’s not only conformantly, but also efficiently.

In order to provide this reliability, we have designed the testsuite to, as much as possible, find bugs with compiler implementations of OpenACC features to CPUs and GPUs and find ambiguities in the definition for features within the specification.

In many cases, it is evident that some portions of the OpenACC specification can not be tested in isolation from the rest of feature set, so we often employ a hierarchy of features. For example, it is impossible to test the operation of kernels without some management of memory on the device. However, it is possible to test the memory management of memory without the use of kernels. In this way, even though we can not make pure isolation tests, we can still find the root cause of failures if we find whether a prerequisite of the test also failed, or if the test itself failed.

OpenACC 2.6

With the release of the OpenACC 2.6 specification , we have begun the process of covering the new features of the 2.6 spec while continuing to test the 1.0, 2.0, and 2.5 specification features. With the addition of the reference counts and the original form of the data management clauses being removed and replaced with their present_or counterparts, many of our tests from the original OpenACC testsuite for 1.0 no longer function as they should. And in addition to these changes in the specifications, the scope of the target architectures has also shifted to allow for the inclusion of the devices with shared memory. These differences result in many changes to the shape of our tests, requiring updated forms of our tests in the 1.0 testsuite.

In order to provide a clear picture of the detected faults, instead of just testing each directive and clause, we often aim to test the clauses in chunks of their expected functionality. While there are many compilers such as the Cray compiler, OpenARC, Omni, RoseACC and OpenUH, the only compilers that claim to support OpenACC 2.5 are GCC and PGI of which only PGI supports 2.6. To that end, our results will focus on PGI and GCC compilers only.

Fig.1. below shows the current pass/fail results of the PGI 18.1 and GCC 7.1.1-20170802 compiler when used against our testsuite targeting a NVIDIA P100. While this is a simplified view of actual results, this demonstrates to some degree the performance of these two compilers.

Many of the issues with GCC in the Data and Enter/Exit Data directives come from the fact that GCC (as of version 7.1.1-20170802) still does not support the reference counting behaviour and breaks when multiple entries or exits of the same data are encountered. In addition, the Runtime Library and Atomic tests have not been completely audited for conformance and accuracy.

Beyond these two categories of tests, the rest of the tests are available on our GitHub. While development continues in a separate repository, as the tests pass our quality review standards they are added to the public repository. In addition to testing these compilers, we also provide the testsuite in order to be able to differentiate performance in compilers in porting codes to various hardware architectures.

Fig.2. below shows the difference in PGI 18.1’s conformance between accelerator types. Though we would have liked to be able to show the same differentiation on hardware for GCC, we were very limited in the systems that had GCC (with OpenACC support) installed.

While at the moment we are developing only tests for attempted isolation of a particular clause and directive, we will also be looking to provide a suite of use-case tests that have larger and more complex kernels that imitate better the situations that are encountered in the real world. If you would like to contribute to the testsuite development, we encourage you to make a pull request and contribute your use-cases or your tests for specific directives and clauses.

Two aspects of the testsuite development:

A meta-analysis of how we translate specifications to tests with parallel_firstprivate.c as an example
A hands on example of what a test look like with enter_data_if as an example.

A meta-analysis of how we translate specifications to tests:

For this example, with the test parallel_firstprivate.c where we test the firstprivate clause on the parallel directive, we do not write any tests of the operation of the parallel directive since coverage of that functionality is covered in tests such as parallel.c, parallel_default_copy.c. Additionally, other aspects of the test such as data transfers and loops with gang/worker clauses are tested elsewhere without a dependence on the testing of the firstprivate clause.
In this test, we also do not test (initially) the firstprivate clause in a normal use case. Instead, we divide the test into sections, the first of which uses the firstprivate clause in a situation that tests the “first” nature of the test. If the data in the firstprivate clause is not initialized to the values on the host, then the test will fail. The second section of the test uses the firstprivate clause in a situation that tests the “private” nature of the clause. If the data is not privatized across the gangs, then we would expect this portion of the test to fail. This allows for more clear results that show not only what directives/clauses are not conformant with specifications, but also what attributes of them are failing to work properly. This can be beneficial both to the compiler developers as well as those planning to use these compilers as they can quickly see what features of OpenACC are fully supported by each compiler and if their implementations are sufficient for the purposes they have need of.

A hands on example of what a test look like:

We develop each test to adhere closely to the specifications definitions yet also to be compatible with all the architectures that the testsuite may be run on. Below is an example of a test of the if clause on an enter data directive.

In this example, we use the enter data directive with both the copyin clause and the if clause. Even though we are trying to isolate the if clause on the enter data directive, we need to use at least copyin or create to allocate data on the device if for no other reason other than to test that the operation did or did not happen, depending on the argument in the if clause. In this case, we use the acc_is_present routine from the runtime library. This would be an issue if the testing of acc_is_present itself also required the use of the enter data directive with an if clause. However, testing of the acc_is_present only uses the enter data directive with the copyin clause.

The test uses an array to be the subject of the copyin clause on three separate occasions. The first is in the case when the argument of the if clause is 0. On line 908 of the OpenACC specifications (V 2.6), it says, “When the condition [argument] in the if clause evaluates to zero in C or C++ … no device memory will be allocated or deallocated, and no data will be moved.” So at this point in the execution, there have been no operations to allocate the data on the device. So if we use the acc_is_present runtime routine, which we have tested elsewhere, the returned value should be 0 in the case when the data is not present on the device. If it does not return zero, we increment our error counter. In the second use of the enter data directive with the if clause, we instead use 1 as the argument.

The specifications read, “When the condition [argument] evaluates to nonzero in C or C++ … the data will be allocated or deallocated and moved as specified.” After this executes, the data should be treated as specified, or in this case, copied in. It makes no difference whether or not the data is copied in this case since we only care about whether or not the operation occurred for which we are using acc_is_present as an indicator. This time, we test to make sure that the data is present.

However, in the case when it is not, we cannot guarantee that the data is going to be indicated as present by the acc_is_present routine. Why? The acc_is_present routine “tests whether the specified host data is present on the device.” In the case when no device is attached, at least a device with separate memory, it could be interpreted that the behaviour of the routine is not specified as the data, while not on the device (which suggests the return value should be non-zero), is also present where any parallel or kernels region would execute if called at this point (which suggests the return value should be zero).

To bypass the vague language, we copy in our array one more time, this time omitting the if clause. Since the enter data directive with the copyin clause has already been tested, we assume that this operation should operate properly. Now, if the value returned from acc_is_present changes, then we would assume that the first enter data did not work. If it does not change, then we would assume that, to the best of our knowledge, the test operated to a similar measurable degree as the enter data directive without the if clause. While this method is not perfect and leaves many possibilities for the test to pass while the tested directive/clause may not operate properly, this is only one example that could be combined in order to better flesh out the operation of the directive/clause, and this is only one of what is now over 100 isolation tests in our public repository (and 600 in our private repository).

This project is work in progress so please expect GitHub to be populated with more tests soon. If you have any questions, please drop us a note!

Publications

Kyle Friedline, Sunita Chandrasekaran, Graham Lopex, Oscar Hernandez. 2017. OpenACC 2.5 Validation Testsuite targeting multiple architectures. In LNCS Proceedings of 2nd International Workshop on Performance Portable Programming Models for Accelerators, LNCS, volume 10524, pp 557-575, 2017
Rengan Xu, Cheng Wang, Sunita Chandrasekaran, Barbara Chapman, “An OpenACC 1.0 Validation Suite”, In Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW), MTAAP 2014, pp 1407-1416, 2014