Migrating the Jacobi CUDA Graphs from CUDA* to SYCL*

ID 730033
Updated 7/6/2023
Version Latest
Public

Get the Latest on All Things CODE

author-image

By

Overview

The Jacobi Cuda Graphs sample demonstrates the number of iterations needed to solve a system of Linear Equations using the Jacobi Iterative Method. This sample includes the migration of CUDA* Graph explicit API calls such as cudaGraphCreate(), cudaGraphAddMemcpyNode(), cudaGraphInstantiate(), to SYCL*.

 In doing so it uses the Taskflow parallel programming model, which manages a task dependency graph. The sample is implemented using SYCL by migrating code from original CUDA source code and offloading computations to a CPU, GPU, or accelerator.

 

Area

Description

What you will learn

Migrate and optimize Jacobi CUDA Graphs sample from CUDA to SYCL.

Time to complete

15 minutes

Category        

Concepts and Functionality

Key Implementation Details

The Jacobi CUDA Graphs computations happen inside a two-kernel Jacobi Method and Final Error Kernels. Element reduction is performed to obtain the final error or sum value.

In this sample, the vectors are loaded into shared memory for faster access, and thread blocks are partitioned into tiles. Then, a reduction of input data is performed in each of the partitioned tiles using sub-group primitives. These intermediate results are added to a final sum variable via an atomic add operation.

The computation kernels are either scheduled using 2 alternative types of function calls:

  • Host function JacobiMethodGpuCudaGraphExecKernelSetParams(), which uses explicit CUDA Graph APIs
  • Host function JacobiMethodGpu(), which uses regular CUDA APIs to launch kernels.

Original CUDA source files: JacobiCudaGraphs.

Migrated SYCL source files including step by step instructions: guided_JacobiCudaGraphs_SYCLmigration.

References