LLM4VV


 

 

LLM4VV: Developing LLM-Driven Testsuite for Compiler Validation

PI: Sunita Chandrasekaran

Students: Zack Sollenberger, Saieda Ali Zada, Rahul Patel

Past Students: Christian Munley, Aaron Jarmusch

Funding Agency and Period: OpenACC: 07/2017 – present

Project Duration: 04/30/23 – present

 

Project Summary:

 

Large language models (LLMs) are a new and powerful tool for a wide span of applications involving natural language and demonstrate impressive code generation abilities. The goal of this project is to automatically generate tests and use these tests to validate and verify compiler implementations of directive-based parallel programming paradigms. To do so, we have explored the use of agentic AI systems that can generate tests, review them autonomously, and revise them in an iterative feedback loop. Additionally, we are exploring the use of RAG to ensure that the tool can continue to produce high-quality, usable compiler tests even as new specifications are released.

 

Our most recent paper, LLM4VV: Evaluating Cutting-Edge LLMs for Generation and Evaluation of Directive-Based Parallel Programming Model Compiler Tests was the only undergraduate-led paper accepted and published at High Performance Computing (HiPC) 2025 by Zachariah Sollenberger, Rahul Patel, Saieda Ali Zada, and Sunita Chandrasekaran. The paper clarified our agentic pipeline into a Dual-LLM Framework that consists of a Generative Agent and a Discriminative Agent that generate and validate (respectively) compiler tests for directive-based parallel programming models. The premise of the paper was to evaluate multiple LLMs against a custom benchmark to determine which LLMs would be the best choices to finetune into these agents, with the best performers being Deepseek-Coder-33B and Qwen2.5-Coder-32B.

 

Publications:

 

Zachariah Sollenberger, Rahul Patel, Saieda Ali Zada, & Sunita Chandrasekaran, “LLM4VV: Evaluating Cutting-Edge LLMs for Generation and Evaluation of Directive-Based Parallel Programming Model Compiler Tests,” 2025 IEEE 32nd International Conference on High Performance Computing, Data, and Analytics (HiPC), 2025, pp. 333-342, 🔗 DOI

 

Zachariah Sollenberger, Jay Patel, Christian Munley, Aaron Jarmusch, & Sunita Chandrasekaran, “LLM4VV: Exploring LLM-as-a-Judge for Validation and Verification Testsuites,” SC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis, Atlanta, GA, USA, 2024, pp. 1885-1893. 🔗 DOI

 

Munley, C., Jarmusch, A., & Chandrasekaran, S.Publication: LLM4VV: Developing LLM-driven testsuite for compiler validation. Future Generation Computer Systems, 160, 1–13.