NVIDIA Legate for Scalable Python Computing: cuPyNumeric, Custom Tasks, Legate Boost, and Advanced Profiling


 

PI: Sunita Chandrasekaran

UDEL Students: Nathan Graddon, Nihaal Surpani, Jay Patel

Funding Agency: NVIDIA

Duration: 10/01/2024 – 12/03/2025

 

Our students worked on a four-part tutorial project focused on NVIDIA Legate, cuPyNumeric, Legate Tasks, Legate Boost, and advanced profiling/debugging workflows. The project was designed to help users understand how Python-based numerical and machine learning workloads can scale across CPUs, GPUs, and multi-node systems using the Legate ecosystem.

  

Project Summary:

 

The NVIDIA Legate / cuPyNumeric tutorial project was divided into four main parts. Part 1, completed by Jay Patel, introduced cuNumeric/cuPyNumeric as a NumPy-compatible library that allows familiar Python array programs to scale across CPUs, GPUs, and distributed systems. This section covered setup, basic usage, and examples such as matrix multiplication, conjugate gradient, and other numerical workloads.

  

Part 2, completed by Jay Patel and Nihaal Surpani, focused on extending cuPyNumeric with Legate Tasks. This section explained how users can create custom scalable operations using the @task decorator, CPU/GPU task variants, input and output arrays, and partitioning constraints such as align and broadcast.

  

Part 3, completed by Nihaal Surpani, covered Legate Boost, a scalable gradient boosting library built on the Legate runtime. This section showed how Legate can be applied to machine learning workflows such as regression and classification, allowing models to train across CPUs, GPUs, and distributed resources with a familiar Python interface.

  

Part 4, completed by Nathan Graddon, covered Advanced Topics in cuPyNumeric: Profiling and Debugging. This section focused on using the Legate profiler to analyze performance, identify bottlenecks, compare efficient and inefficient cuPyNumeric code, and understand debugging strategies for memory issues such as out-of-memory errors.

  

Together, the four parts provide a complete learning path for using NVIDIA Legate and cuPyNumeric: starting with basic scalable NumPy-style programming, moving into custom task development, applying Legate to machine learning, and finally learning how to profile, debug, and optimize real applications.

  

Official Documentation:

 

cuPyNumeric Tutorial

Extend cuPyNumeric with Legate Tasks

Advanced Topics in cuPyNumeric: Profiling and Debugging