NVIDIA cuPyNumeric 25.03: The Future of Accelerated Computing
NVIDIA cuPyNumeric is revolutionizing the landscape of scientific computing by providing a distributed and accelerated drop-in replacement for NumPy, leveraging the power of the Legate framework. With its latest milestone update, version 25.03, cuPyNumeric introduces significant enhancements that promise to make high-performance computing more accessible and efficient than ever before.
Full Stack Now Open Source
One of the most exciting developments in cuPyNumeric 25.03 is that NVIDIA has open-sourced the Legate framework and runtime layer, which underpins cuPyNumeric. This transition to an open-source model, governed by the Apache 2 license, reflects NVIDIA’s dedication to transparency, collaboration, and reproducibility in the scientific community. Contributors can now dive deep into the codebase, audit components, and extend functionalities without facing any limitations—an opportunity that invites innovation and enhances community engagement.
Simplified Installation with PIP Support
Previously, cuPyNumeric installations primarily relied on conda. However, with version 25.03, users can now enjoy the convenience of installing cuPyNumeric via pip, using the straightforward command:
pip install nvidia-cupynumeric
This addition simplifies the setup process significantly, making it easier to integrate cuPyNumeric into various workflows, virtual environments, and continuous integration (CI) pipelines. The package on PyPI is designed to be multinode and multirank capable, allowing developers to harness the power of cuPyNumeric not only on single-node systems but also across multi-GPU multinode clusters.
Example Installation on SLURM Clusters
For those eager to get started, here’s a quick guide to installing and running cuPyNumeric on SLURM clusters:
Step 1: Environment Setup
After logging into your cluster, the first step is to load essential environment modules, including CUDA and MPI, which are crucial for executing cuPyNumeric in a multinode or multirank environment. If these modules are not available, consider reaching out to your system administrator for assistance.
module purge # clear existing modules
module load cuda # load CUDA toolkit
module load openmpi # load Open MPI
Next, create and activate a virtual environment (recommended for isolation):
python -m venv legate
source legate/bin/activate
Step 2: Package Installation
With the environment set up, install cuPyNumeric and Legate using pip:
pip install legate nvidia-cupynumeric
Step 3: Run Applications
You can allocate interactive compute nodes using the srun command:
srun -p partition-name # Request a partition
-N 2 # 2 compute nodes
--gres=gpu:8 # 8 GPUs per node
--time=00:30:00 # 30-minute time limit
--pty bash # Start an interactive shell
Finally, run a cuPyNumeric program with:
legate --gpus 8 # GPUs per process
--ranks-per-node 1 # Processes per node
--nodes 2 # Total nodes (matches -N)
--launcher mpirun # launch with MPI
./prog.py
For batch job submissions, you can utilize:
#!/bin/bash
#SBATCH --job-name=cupynumeric
#SBATCH --nodes=2
#SBATCH --gres=gpu:8
#SBATCH --time=00:30:00
module load cuda openmpi
source legate/bin/activate
legate --gpus 8
--ranks-per-node 1
--nodes ${SLURM_NNODES}
--launcher mpirun
./prog.py
For more detailed instructions, check out the cuPyNumeric 25.03 installation guide.
Native HDF5 I/O Support
Another standout feature in cuPyNumeric 25.03 is native support for HDF5 over GPU Direct Storage. This enhancement allows for efficient handling of large datasets and ensures seamless interoperability with scientific computing environments. By integrating HDF5, users can now handle complex data structures on disk in a compact, portable, and performant manner, which is especially valuable for high-performance computing and data-intensive applications.
Here’s a quick example of how to use HDF5 with cuPyNumeric:
from legate.core.io.hdf5 import from_file
import cupynumeric as np
x = from_file("data.h5", dataset_name="x")
y = from_file("data.h5", dataset_name="y")
xx = np.asarray(x)
yy = np.asarray(y)
a = 8675.309
yy[:] = a * xx + yy
This feature dramatically enhances I/O efficiency, making it easier for researchers and developers to work with large datasets.
Get Started with cuPyNumeric 25.03
NVIDIA cuPyNumeric 25.03 is set to strengthen the foundation for both research and production environments. For those interested in diving deeper into the new features and capabilities offered in this release, detailed release notes are available. The cuPyNumeric team is eager to engage with the growing community and welcomes feedback, contributions, and ideas for future updates. Join the conversation by submitting issues directly to the nv-legate/cupynumeric GitHub repository.
With the advancements in cuPyNumeric 25.03, the future of accelerated computing looks brighter than ever. Whether you’re a researcher pushing the boundaries of science or a developer looking to optimize your workflows, cuPyNumeric is a tool worth exploring.
Inspired by: Source

