LLM4VV
LLM4VV: Developing LLM-Driven Testsuite for Compiler Validation
PI: Sunita Chandrasekaran
Students: Zack Sollenberger, Saieda Ali Zada, Rahul Patel
Past Students: Christian Munley, Aaron Jarmusch
Funding Agency and Period: OpenACC: 07/2017 – present
Project Duration: 04/30/23 – present
Project Summary:
Large language models (LLMs) are a new and powerful tool for a wide span of applications involving natural language and demonstrate impressive code generation abilities. The goal of this project is to automatically generate tests and use these tests to validate and verify compiler implementations of directive-based parallel programming paradigms. To do so, we have explored the use of agentic AI systems that can generate tests, review them autonomously, and revise them in an iterative feedback loop. Additionally, we are exploring the use of RAG to ensure that the tool can continue to produce high-quality, usable compiler tests even as new specifications are released.
Our most recent paper, LLM4VV: Evaluating Cutting-Edge LLMs for Generation and Evaluation of Directive-Based Parallel Programming Model Compiler Tests was the only undergraduate-led paper accepted and published at High Performance Computing (HiPC) 2025 by Zachariah Sollenberger, Rahul Patel, Saieda Ali Zada, and Sunita Chandrasekaran. The paper clarified our agentic pipeline into a Dual-LLM Framework that consists of a Generative Agent and a Discriminative Agent that generate and validate (respectively) compiler tests for directive-based parallel programming models. The premise of the paper was to evaluate multiple LLMs against a custom benchmark to determine which LLMs would be the best choices to finetune into these agents, with the best performers being Deepseek-Coder-33B and Qwen2.5-Coder-32B.
Publications:
Zachariah Sollenberger, Rahul Patel, Saieda Ali Zada, & Sunita Chandrasekaran, “LLM4VV: Evaluating Cutting-Edge LLMs for Generation and Evaluation of Directive-Based Parallel Programming Model Compiler Tests,” 2025 IEEE 32nd International Conference on High Performance Computing, Data, and Analytics (HiPC), 2025, pp. 333-342, 🔗 DOI
Zachariah Sollenberger, Jay Patel, Christian Munley, Aaron Jarmusch, & Sunita Chandrasekaran, “LLM4VV: Exploring LLM-as-a-Judge for Validation and Verification Testsuites,” SC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis, Atlanta, GA, USA, 2024, pp. 1885-1893. 🔗 DOI
Munley, C., Jarmusch, A., & Chandrasekaran, S. — Publication: LLM4VV: Developing LLM-driven testsuite for compiler validation. Future Generation Computer Systems, 160, 1–13.