Migration Workflow Guidelines#
Overview#
The CUDA* to SYCL* code migration workflow consists of the following high-level stages:
Stage 1: Prepare for Migration. Prepare your project and configure the tool for a successful migration.
Stage 2: Migrate Your Code. Review tool options and migrate your code with the tool.
Stage 3: Review the Migrated Code. Review and manually convert any unmigrated code.
Stage 4: Build the New SYCL Code Base. Build your project with the migrated code.
Stage 5: Validate the New SYCL Application. Validate your new SYCL application to check for correct functionality after migration.
This document describes the steps in each stage with general recommendations and optional steps.
Note
CUDA* API migration support is broad but not complete. If you encounter CUDA APIs that were not migrated due to a lack of tool support, please report it to the Migrating to SYCL forum or priority support. Alternatively, submit an issue or contribute to the SYCLomatic project. This helps prioritize which CUDA APIs will be supported in future releases.
Prerequisites#
Install SYCLomatic. The latest prebuild of SYCLomatic is available from oneapi-src/SYCLomatic.
You can also review the repo README for build and installation instructions.
Set up the tool environment. Refer to Get Started with SYCLomatic for setup instructions.
Stage 1: Prepare for Migration#
Before migrating your CUDA code to SYCL, prepare your CUDA source code for the migration process.
Prepare Your CUDA Project#
Before migration, it is recommended to prepare your CUDA project to minimize errors during migration:
Make sure your CUDA source code has no syntax errors.
Make sure your CUDA source code is Clang compatible.
Fix Syntax Errors#
If your original CUDA source code has syntax errors, it may result in unsuccessful migration.
Before you start migration, make sure that your original CUDA source code builds and runs correctly:
Compile your original source code using the compiler defined for your original CUDA project.
Run your compiled application and verify that it functions as expected.
When your code compiles with no build errors and you have verified that your application works as expected, your CUDA project is ready for migration.
Clang Compatibility#
SYCLomatic uses the latest version of the Clang* parser to analyze your CUDA source code during migration. The Clang parser isn’t always compatible with the NVIDIA* CUDA compiler driver (nvcc). The tool will provide errors about incompatibilities between nvcc and Clang during migration.
In some cases, additional manual edits to the CUDA source may be needed before migration. For example:
The Clang parser may need namespace qualification in certain usage scenarios where nvcc does not require them.
The Clang parser may need additional forward class declarations where nvcc does not require them.
Space within the triple brackets of kernel invocation is tolerated by nvcc but not Clang. For example,
cuda_kernel<< <num_blocks, threads_per_block>> >(args…)
is ok for nvcc, but the Clang parser requires the spaces to be removed.
If you run the migration tool on CUDA source code that has unresolved incompatibilities between nvcc and Clang parsers, you will get a mixture of errors in the migration results:
Clang errors, which must be resolved in the CUDA source code
DPCT warnings, which must be resolved in the migrated SYCL code
For detailed information about dialect differences between Clang and nvcc, refer to llvm.org’s Compiling CUDA with clang page.
Run CodePin to Capture Application Signature#
CodePin is a feature that helps reduce the effort of debugging inconsistencies in runtime behavior. CodePin generates reports from the CUDA and SYCL programs that, when compared, can help identify the source of divergent runtime behavior.
Enable the CodePin tool during the migration in order to capture the project signature.
This signature will be used later for validation after migration.
Enable CodePin with the –enable-codepin
option.
For detailed information about debugging using the CodePin tool, refer to Debug Migrated Code Runtime Behavior.
Configure the Tool#
CUDA header files used by your project must be accessible to the tool. If you have not already done so, configure the tool and ensure header files are available.
Refer to Get Started with SYCLomatic for installation and setup information.
Record Compilation Commands#
Use intercept-build
to Generate a Compilation Database to capture the detailed
build options for your project. The migration tool uses build information from
the database (such as header file paths, include paths, macro definitions, compiler
options, and the original compiler) to guide the migration of your CUDA code.
If your development environment prevents you from using intercept-build, use the alternate method described in Generate a Compilation Database with Other Build Systems.
Note
If you need to re-run your migration after the original migration and the CUDA build script has changed, you need to either
re-run intercept-build to get an updated compilation database to use in your migration or
manually update the compilation database to capture the changes from the updated CUDA build script.
Set Up Revision Control#
After migration, the recommendation is to maintain and develop your migrated application in SYCL to avoid vendor lock-in, though you may choose to continue your application development in CUDA. Continuing to develop in CUDA will result in the need to migrate from CUDA to SYCL again.
Revision control allows comparison between versions of migrated code, which can help you decide what previous manual changes to the SYCL code you want to merge into the newly migrated code.
Make sure to have revision control for your original CUDA source before the first migration. After the first migration, be sure to place the migrated SYCL code, with all subsequent manual SYCL changes, under revision control as well.
Run Analysis Mode#
You can use Analysis Mode to generate a report before migration that will indicate how much of your code will be migrated, how much will be partially migrated, and an estimate of the manual effort needed to complete migration after you have run the tool. This can be helpful to estimate the work required for your migration.
Stage 2: Migrate Your Code#
Plan Your Migration#
Before executing your migration, review the available tool features and options that can be used to plan your specific migration.
Migration Rules#
The tool uses a default set of migration rules for all migrations. If default rules do not give the migration results you need, you can define custom rules for your migration. This is helpful in multiple scenarios, for example:
After migration, you discover multiple instances of similar or identical CUDA source code that were not migrated, and you know how the CUDA source code should be migrated to SYCL. In this case, you can define a custom rule and re-run the migration for better results. This is useful for Incremental Migration or scenarios where you may run multiple migrations over time.
You know before migration that some code patterns in your original CUDA source will not be accurately migrated to SYCL using the built-in rules. In this case, you can define a custom migration rule to handle specific patterns in our CUDA source during migration.
For detailed information about defining custom rules, refer to Migration Rules.
For working examples of custom rules, refer to the optional predefined rules
located in the extensions/opt_rules
folder on the installation path of the tool.
Incremental Migration#
SYCLomatic provides incremental migration, which automatically merges the results from multiple migrations into a single migrated project.
Incremental migration can be used to
migrate a CUDA* project incrementally, for example 10 files at a time
migrate new CUDA files into an already migrated project
migrate multiple code paths
Incremental migration is enabled by default. Disable incremental migration using
the --no-incremental-migration
option.
For detailed information and examples of incremental migration, refer to Incremental Migration.
Command-Line Options#
SYCLomatic provides many command-line options to direct your migration. Command-line options provide control to
Refer to the Alphabetical Option List for a full list of all available command-line options.
Buffer vs USM Code Generation#
Intel promotes both buffer and USM in the SYCL/oneAPI context. Some oneAPI libraries preferentially support buffer versus USM, so there may be some design consideration in configuring your migration. USM is used by default, but buffer may be a better fit for some projects.
The buffer model sets up a 1-3 dimensional array (buffer) and accesses its components via a C++ accessor class. This grants more control over the exact nature and size of the allocated memory, and how host and offload target compute units access it. However, the buffer model can also create extra class management overhead, which can require more manual intervention and may yield less performance.
USM (unified shared memory) is a newer model, beginning with SYCL2020. USM is a pointer-based memory management model using
malloc_device/malloc_shared/malloc_host
allocator functions, similar to how C++ code usually handles memory accesses when no GPU device offload is involved. Choosing the USM model can make it easier to add to existing code and migrate from CUDA code. Management of the USM memory space is however very much done by the SYCL runtime, reducing granularity of control for the developer.
For more information on USM versus Buffer modes, please see the following sections of the GPU Optimization Guide: * Unified Shared Memory Allocations * Buffer Accessor Modes
What to Expect in Migrated Code#
When the tool migrates CUDA code to SYCL code, it inserts diagnostic messages as comments in the migrated code. The DPCT diagnostic messages are logged as comments in the migrated source files and output as warnings to the console during migration. These messages identify areas in the migrated code that may require your attention to make the code SYCL compliant or correct. This step is detailed in Stage 3: Review the Migrated Code.
The migrated code also uses DPCT helper functions to provide utility support for
the generated SYCL code. The helper functions use the dpct:: namespace
. Helper
function source files are located at <tool-installation-directory>/latest/include/dpct
.
DPCT helper functions can be left in migrated code but should not be used in new
SYCL code. Use standard SYCL and C++ when writing new code. For information about
the DPCT namespace, refer to the DPCT Namespace Reference.
Run Migration#
After reviewing the available migration tool functionality and options, run your migration.
You can run the tool from the command line.
If your project uses a Makefile or CMake file, use the corresponding option to automatically migrate the file to work with the migrated code:
To migrate a Makefile, use the
--gen-build-scripts
option.To migrate a CMake file, use the
--migrate-build-script
or--migrate-build-script-only
option. (Note that these options are experimental.)
For example:
c2s -p compile_commands.json --in-root ../../.. --gen-helper-function --gen-build-scripts
This example migrate command:
uses the tool alias
c2s
.dpct
can also be used.uses a compilation database, specified with the
-p
optionspecifies the source to be migrated with the
--in-root
optioninstructs the tool to generate helper function files with the
--gen-helper-function
optioninstructs the tool to migrate the Makefile using the
--gen-build-scripts
option
The following samples show migrations of CUDA code using the tool and targets Intel and NVIDIA* hardware:
Stage 3: Review the Migrated Code#
After running SYCLomatic, manual editing is usually required before the migrated SYCL code can be compiled. DPCT warnings are logged as comments in the migrated source files and output to the console during migration. These warnings identify the portions of code that require manual intervention. Review these comments and make the recommended changes to ensure the migrated code is consistent with the original logic.
For example, this original CUDA* code:
1void foo() {
2 float *f;
3 cudaError_t err = cudaMalloc(&f, 4);
4 printf("%s\n", cudaGetErrorString(err));
5}
results in the following migrated SYCL code:
1void foo() {
2 float *f;
3 int err = (f = (float *)sycl::malloc_device(4, dpct::get_default_queue()), 0);
4 /*
5 DPCT1009:1: SYCL uses exceptions to report errors and does not use the error
6 codes. The original code was commented out and a warning string was inserted.
7 You need to rewrite this code.
8 */
9 printf("%s\n",
10 "cudaGetErrorString is not supported" /*cudaGetErrorString(err)*/);
11}
Note the DPCT1009 warning inserted where additional review is needed.
For a detailed explanation of the comments, including suggestions to fix the issues, refer to the Diagnostics Reference.
At this stage, you may observe that the same DPCT warnings were generated repeatedly in your code or that the same manual edits were needed in multiple locations to fix a specific pattern in your original source code. Consider defining the manual edits needed to fix repeated DPCT warnings as a user-defined migration rule. This allows you to save your corrections and automatically apply them to a future migration of your CUDA source.
Note
CUDA* API migration support is broad but not complete. If you encounter CUDA APIs that were not migrated due to a lack of tool support, please report it to the Migrating to SYCL forum or priority support. Alternatively, submit an issue or contribute to the SYCLomatic project. This helps prioritize which CUDA APIs will be supported in future releases.
Stage 4: Build the New SYCL Code Base#
After you have completed any manual migration steps, build your converted code.
Install New SYCL Code Base Dependencies#
Converted code makes use of oneAPI library APIs and Intel SYCL extensions. Before compiling, install the appropriate oneAPI libraries and a compiler that supports the Intel SYCL extensions.
If your CUDA source uses … |
… install this oneAPI library |
---|---|
cuBLAS, cuFFT, cuRAND, cuSolver, cuSparse |
Intel® oneAPI Math Kernel Library (oneMKL) |
Thrust, CUB |
Intel® oneAPI DPC++ Library (oneDPL) |
cuDNN |
Intel® oneAPI Deep Neural Network Library (oneDNN) |
NCCL |
Intel® oneAPI Collective Communications Library (oneCCL) |
The following compilers support Intel SYCL extensions:
Most libraries and the Intel® oneAPI DPC++/C++ Compiler are included in the Intel® oneAPI Base Toolkit (Base Kit). Libraries and the compiler are also available as stand-alone downloads.
Compile for Intel CPU and GPU#
If your program targets Intel GPUs, install the latest Intel GPU drivers before compiling.
Use your updated Makefile or CMake file to build your program, or compile it
manually at the command line using a compiler that supports the Intel SYCL extensions.
Make sure that all linker and compilation commands use the -fsycl
compiler
option with the C++ driver. For example:
icpx -fsycl migrated-file.cpp
For detailed information about compiling with the Intel® oneAPI DPC++/C++ Compiler, refer to the Intel® oneAPI DPC++/C++ Compiler Developer Guide and Reference.
Compile for AMD* or NVIDIA* GPU#
If your program targets AMD* or NVIDIA GPUs, install the appropriate Codeplay* plugin for the target GPU before compiling. Instructions for installing the AMD and NVIDIA GPU plugins, as well as how to compile for those targets, can be found in the Codeplay plugin documentation:
Install the oneAPI for AMD GPUs plugin from Codeplay.
Install the oneAPI for NVIDIA GPUs plugin from Codeplay.
Stage 5: Validate the New SYCL Application#
After you have built your converted code, validate your new SYCL application to check for correct functionality after migration.
Use a Debugger to Validate Migrated Code#
After you have successfully compiled your new SYCL application, run the app in debug mode using a debugger such as Intel Distribution for GDB to verify that your application runs as expected after migration.
Learn more about Debugging with Intel Distribution for GDB.
Use CodePIN to Validate Migrated Code#
If the CodePin feature has been enabled during the migration time, project signature will be logged during the execution time.
The signature contains the data value of each execution checkpoint, which can be verified manually or with an auto-analysis tool.
For detailed information about debugging using the CodePin tool, refer to Debug Migrated Code Runtime Behavior.
Optimize Your Code#
Optimize your migrated code for Intel GPUs using Intel® tools such as Intel® VTune™ Profiler and Intel® Advisor. These tools help identify areas of code to improve for optimizing your application performance.
Additional hardware- or library-specific optimization information is available:
For detailed information about optimizing your code for Intel GPUs, refer to the oneAPI GPU Optimization Guide.
For detailed information about optimizing your code for AMD GPUs, refer to the Codeplay AMD GPU Performance Guide.
For detailed information about optimizing your code for NVIDIA GPUS, refer to the Codeplay NVIDIA GPU Performance Guide.
Find More#
Content |
Description |
---|---|
Intel® oneAPI DPC++/C++ Compiler Developer Guide and Reference |
Developer guide and reference for the Intel® oneAPI DPC++/C++ Compiler. |
The SYCL 2020 Specification PDF. |
|
Intel branded C++ compiler built from the open-source oneAPI DPC++ Compiler, with additional Intel hardware optimization. |
|
Open-source Intel LLVM-based compiler project that implements compiler and runtime support for the SYCL* language. |
|
Sample CUDA projects with instructions on migrating to SYCL using the tool. |
|
Guided migration samples |
Guided migration of two sample NVIDIA CUDA projects: |
A Jupyter* Notebook that guides you through the migration of a simple example and four step-by-step sample migrations from CUDA to SYCL. |
|
Catalog of CUDA projects that have been migrated to SYCL. |
|
Forum to get assistance when migrating your CUDA code to SYCL. |
|
Intel® oneAPI Math Kernel Library tool to help determine how to include oneMKL libraries for your specific use case. |
|
This tutorial describes the basic scenarios of debugging applications using Intel® Distribution for GDB*. |
|
Tutorials demonstrating an end-to-end workflow using Intel® VTune™ Profiler that you can ultimately apply to your own applications. |