ILGPU

C#

Upgrade: v0.3.X to v0.5.X

The ILGPU compiler has been redesigned and does not rely on LLVM any more. The Cuda SDK is also no longer required to compile and run ILGPU kernels. Therefore, it is not necessary to worry about native library dependencies (except, of course, for the actual GPU drivers) or environment variables.

ILGPU features a new parallel processing model. It allows parallel code generation and transformation phases to reduce compile time and improve overall performance. However, parallel code generation in the frontend module is disabled by default in the current release (beta version). It can be enabled via the enumeration flag ContextFlags.EnableParallelCodeGenerationInFrontend.

The flags enumeration CompileUnitFlags has been replaced by the enumeration ContextFlags. Unlike previous versions, these flags are passed directly to the main constructor ILGPU.Context and affect the entire code generation process. Note that the CompileUnitFlags.UseGPUMath flag has no counterpart in the new version, since all mathematical operations are automatically mapped to the corresponding XMath counterparts.

The GPUMath class has been renamed to XMath in order to reflect its cross platform math capabilities (community request). Note that XMath will be replaced by a new high performance math library in a future release.

The high-level algorithms Warp.Reduce and Warp.AllReduce have been moved to the ILGPU Lightning/Algorithm library.

New GPU debugging and profiling features have been added. Refer to the default Documentation for more information.

Refer to the updated samples in the GitHub repository for more information.

ArrayViews and VariableViews

The ArrayView and VariableView structures have been adapted to the C# 'ref' features. This renders explicit Load and Store methods obsolete. In addition, all methods that accept VariableView<X> parameter types have been adapted to the parameter types ref X. This applies, for example, to all methods of the class Atomic.


class ...
{
    static void ...(...)
    {
        // Old way (obsolete and no longer supported)
        ArrayView<int> someView = ...
        var variableView = someView.GetVariableView(X);
        Atomic.Add(variableView);
        ...
        variableView.Store(42);

        // New way
        ArrayView<int> someView = ...
        Atomic.Add(ref someView[X]);
        ...
        someView[X] = 42;

        // or
        ref var variable = ref someView[X];
        variable = 42;

        // or
        var variableView = someView.GetVariableView(X);
        variableView.Value = 42;
    }
}


Shared Memory

The general concept of shared memory has been redesigned. The previous model required SharedMemoryAttribute attributes on specific parameters that should be allocated in shared memory. The new model uses the static class SharedMemory to allocate this kind of memory procedurally in the scope of kernels. This simplifies programming, kernel-delegate creation and enables non-kernel methods to allocate their own pool of shared memory.

Note that array lengths must be constants in this ILGPU version. Hence, a dynamic allocation of shared memory is currently not supported.

The kernel loader methods LoadSharedMemoryKernelX and LoadSharedMemoryStreamKernelX have been removed. They are no longer required, since a kernel does not have to declare its shared memory allocations in the form of additional parameters.


class ...
{
    static void SharedMemoryKernel(GroupedIndex index, ...)
    {
        // Allocate an array of 32 integers
        ArrayView<int> sharedMemoryArray = SharedMemory.Allocate<int>(32);

        // Allocate a single variable of type long in shared memory
        ref long sharedMemoryVariable = ref SharedMemory.Allocate<long>();

        ...
    }
}


CPU Debugging

Starting a kernel in debug mode is a common task that developers go through many times a day. Although ILGPU has been optimized for performance, you may not wait a few milliseconds every time you start your program to debug a kernel on the CPU. For this reason, the context flag ContextFlags.SkipCPUCodeGeneration has been added. It suppresses IR code generation for CPU kernels and uses the .Net runtime directly. Warning: This avoids general kernel analysis/verification checks. It should only be used by experienced users.

Internals

The old LLVM-based concept of CompileUnit objects is obsolete and has been replaced by a completely new IR. The new IR leverages IRContext objects to manage IR objects that are derived from the class ILGPU.IR.Node. Unlike previous versions, an IRContext is not tied to a specific Backend instance and can be reused accross different hardware architectures.

The global optimization process can be controlled with the enumeration OptimizationLevel. This level can be specified by passing the desired level to the ILGPU.Context constructor. If the optimization level is not explicitly specified, the level is determined by the current build mode (either Debug or Release).

Upgrade: v0.1.X to v0.2.X

If you rely on the LightningContext class (of ILGPU.Lightning v0.1.X) for high-level kernel loading or other high-level operations, you will have to adapt your projects to the API changes. The new API does not require a LightningContext instance. Instead, all operations are extension methods to the ILGPU Accelerator class. This simplifies programming and makes the general API more consistent. Furthermore, kernel caching and convenient kernel loading are now included in the ILGPU runtime system and do not require any ILGPU.Lightning operations. Moreover, if you make use of the low-level kernel-loading functionality of the ILGPU runtime system (in order to avoid additional library dependencies to ILGPU.Lightning), you will also benefit from the new API changes.

Note that all functions from v0.1.X will still work to ensure backwards compatibility. However, they will be removed in future versions.

The Obsolete Lightning Context

The LightningContext class is obsolete and will be removed in future versions. It encapsulated an ILGPU Accelerator instance and provided useful kernel caching and loading features. Moreover, all extensions functions (like sorting, for example) were based on a LightningContext.

We recommend that you replace all occurances of a LightningContext with an ILGPU Accelerator. Furthermore, change the LightningContext creation code with an appropriate accelerator construction from ILGPU. Note that kernel caching and loading are now natively provided by an Accelerator object.


class ...
{
    public static void Main(string[] args)
    {
        // Create the required ILGPU context
        using (var context = new Context())
        {
            // Deprecated code snippets for creating a LightningContext
            var ... = LightningContext.CreateCPUContext(context);
            var ... = LightningContext.CreateCudaContext(context);
            var ... = LightningContext.Create(context, acceleratorId);

            // New version: use default ILGPU accelerators and perform
            // all required operations on an accelerator instance.
            var ... = new CPUAccelerator(context);
            var ... = new CudaAccelerator(context);
            var ... = Accelerator.Create(context, acceleratorId);


            // Old sample for an Initialize command
            var lc = LightningContext.Create(context, ...);
            lc.Initialize(targetView);

            // New version
            var accl = Accelerator.Create(context, acceleratorId);
            accl.Initialize(targetView);
        }
    }
}