A Modern GPU Compiler for .Net Programs

developed by Marcel Köster

ILGPU

A modern, lightweight & fast GPU compiler for high-performance .Net programs


ILGPU is a new JIT (just-in-time) compiler for high-performance GPU programs (also known as kernels) written in .Net-based languages. ILGPU is completely written in C# without any native dependencies which allows you to write GPU programs that are truly portable. It combines the convenience of C++ AMP with the high performance of CUDA. Functions in the scope of kernels do not have to be annotated (e.g. default C# functions) and are allowed to work on value types. All kernels (including all hardware features like shared memory, atomics and warp shuffles) can be executed and debugged on the CPU using the integrated multi-threaded CPU accelerator. And the best feature: it's free! ILGPU is released under the University of Illinois/NCSA Open Source License.

ILGPU is a free and non-sponsored project. It is being developed by a professional and passionate compiler, GPU and computer graphics developer. Support the project with contributions or some small donations in order to speed up the development process and to keep the project alive.

ILGPU Nuget Package   ILGPU.Algorithms Nuget Package   ILGPU.SharpDX Nuget Package

Immediate Assistance via Discord / Weekly Talk-To-The-Dev(s) Meeting

The ILGPU community will try to provide immediate help, feedback and suggestions via Discord ASAP. Log on to the server and you can get started right away. Alternatively, you can always send an email via the contact page.

There will be a weekly talk-to-the-dev(s) meeting on the Discord server every Wednesday from 9-10pm UTC+2 (the first meeting will take place on March 20th 2020). Don't hesitate to join the meeting if you have any questions or suggestions or just want to talk to one of the developers (including me).

New Release Cycles and Enhanced Progress Visibility

An updated ILGPU version will be available every month.

Starting with ILGPU v0.8.1, which will be released on the end of June, an updated ILGPU version will be available every month. As the community grows, new features will be explicitly tracked on the GitHub issues page and are linked to their appropriate milestone to which they belong. Each new feature found on this page and is marked with a specific difficulty level (beginner, intermediate and advanced). At the beginning, we strongly recommend that you a closer look at the beginner issues if you plan to contribute to the ILGPU project.

New ILGPU Major Version (v0.8.0) released on May 15th

A new major release of the ILGPU compiler is available.

This release provides revised kernel launchers for explicitly grouped kernels, on-the-fly specialization of kernels using dynamic partial evaluation and support for dynamic shared memory (CPU and Cuda accelerators only). It also includes significant performance improvements of generated Cuda and OpenCL kernels using a novel internal optimization and transformation pipeline. Furthermore support for enum-value interop, unmanaged buffers and linear arrays to realize allocations in local memory have been added. Note that we have revised all test cases to provide you with a stable and good programming and development experience. All samples, the class reference, documentation and the upgrade guide have been updated.

Special thanks to MoFtZ for contributing to this release. MoFtZ worked on a huge variety of different issues in the OpenCL Backend, the PTX Backend and the internal IR.

Special thanks to the whole ILGPU community for providing feedback, submitting issues and new feature requests.

For a more detailed list of changes and updates refer to the release notes. For a detailed list of individual contributions refer to the public Github commit history.

New ILGPU Beta Version (v0.8.0-beta2) released on April 26th

A new beta release of the ILGPU compiler is available.

This release provides significant performance improvements of generated Cuda and OpenCL kernels. It also features support for enum-value interop and linear arrays to realize allocations in local memory. All samples (branch v08) and the upgrade guide have been updated.

For a more detailed list of changes and updates refer to the release notes.

New ILGPU Beta Version (v0.8.0-beta1) released on March 10th

A new beta release of the ILGPU compiler is available.

This release provides revised kernel launchers for explicitly grouped kernels, on-the-fly specialization of kernels using dynamic partial evaluation and support for dynamic shared memory (CPU and Cuda accelerators only). All samples (branch v08) and the upgrade guide have been updated.

The main documentation page will be updated in the next days. Refer to the upgrade guide, the change logs and the samples.

For a more detailed list of changes and updates refer to the release notes.

High Performance

High performance kernel compilation, dispatch and execution times. Furthermore, type-safe kernel delegates avoid boxing.

High Convenience

Use the power of C# or VB.Net to write high-level kernels and execute them on the GPU. No need to program C++, Cuda or OpenCL.

CPU Accelerator

Single- or multi-threaded execution of kernels on the CPU. This is also useful for debugging or emulation of specific target platforms.

Advanced Debugging

High-level kernel debugging using your favorite .Net debugger. Furthermore, the single-threaded execution feature allows to focus on the algorithm instead of the parallelism.

No Function Annotations

Functions do not have to be annotated in order to use them in the scope of kernels.

Any-CPU Builds

Compile your applications for any cpu. ILGPU will automatically adjust everything else for X86 or X64 platforms.

Implicitly Grouped Kernels

Focus on the algorithm and not on the details. Implicitly grouped kernels let you implement high-level kernels without paying attention to low-level index computations or tiling.

Multi-dimensional Indices

Multi-dimensional index types simplify address computations and kernel writing.

Array Views

No pointer arithmetic and dramatically simplified index computations due to views to memory regions.

Shared Memory

Support for shared (scratch-pad) memory in kernels via array views. Static or dynamic allocation of shared memory is supported.

Atomics and Low-Level Intrinsics

Easy access to atomic functions and low-level-intrinsics like warp shuffles. All functions are supported during CPU debugging.

High-Performance Math Functions

Default math functions and operations are mapped to high-performance math functions. Furthermore, there is support for fast math and forced 32bit math to avoid doubles.

Comparison to C++ AMP and Cuda

Features
.Net Code
C++ Code
Function Annotations Required
NVIDIA GPUs
AMD GPUs
Intel GPUs
High-Level Abstractions (Implicitly Grouped Kernels, ...)
Low-Level Intrinsics
High-Performance Math Functions
Cross-Platform Support
Single-Compilation Cross-Platform Support
Direct Multi-GPU Support
Convenient Algorithm Debugging
Debugging on GPU Hardware
Kernel Profiling
CPU Runtime
CPU Runtime with Shared Memory and Low-Level Intrinsics
SIMD CPU Runtime
ILGPU
checkmark
empty
empty
checkmark
scheduled
scheduled
checkmark
checkmark
checkmark
scheduled
checkmark
checkmark
checkmark
scheduled
scheduled
checkmark
checkmark
scheduled
C++ AMP
empty
checkmark
checkmark
checkmark
scheduled
checkmark
checkmark
empty
empty
checkmark
empty
empty
checkmark
empty
checkmark
checkmark
empty
checkmark
Cuda
empty
checkmark
checkmark
checkmark
scheduled
scheduled
empty
checkmark
checkmark
checkmark
empty
checkmark
empty
checkmark
checkmark
empty
empty
empty

Yellow checkmarks indicate partial or limited support.
Features marked with a red checkmark will be available in the future. Check the Roadmap for details.

Comparison to other GPU compilers for .Net

Features
Function Annotations Required
NVIDIA GPUs
AMD GPUs
Intel GPUs
High-Level Abstractions (Implicitly Grouped Kernels, ...)
Low-Level Intrinsics
High-Performance Math Functions
Avoids Boxing
Cross-Platform Support
DotNetCore Support
Convenient Algorithm Debugging
Debugging on GPU Hardware
Kernel Profiling
CPU Runtime
CPU Runtime with Shared Memory and Low-Level Intrinsics
Debug Assertions
Classes
Lambda Functions
ILGPU
empty
checkmark
checkmark
checkmark
checkmark
checkmark
checkmark
checkmark
checkmark
checkmark
checkmark
checkmark
checkmark
checkmark
checkmark
checkmark
checkmark
checkmark
Commercial Competitors
checkmark
checkmark
empty
empty
checkmark
checkmark
checkmark
checkmark
checkmark
empty
empty
checkmark
checkmark
empty
empty
checkmark
checkmark
checkmark

Yellow checkmarks indicate partial or limited support.
Features marked with a red checkmark will be available in the future. Check the Roadmap for details.

Frequently Asked Questions

Are exceptions supported?

Exceptions require support for exception handlers and a limited support for reference types. Changes of the "intended" control flow (which can be caused by exceptions) are currently not supported. However, there might be a conversion phase in the future that converts several exceptions into debug assertions.

What about debug assertions?

Debug assertions are supported on all accelerators. Note that debug assertions are not available in Release mode.

Are class types supported? And what about lambda functions?

Reference types are currently not supported. However, a limited support for reference types will be added in the future. This will also allow the implementation of delegates.

Lambda functions (or delegates in general) are currently not supported since they require a limited support for reference types and custom code-transformation passes. Support for lambda functions will be added in the future.

Can I debug a kernel on the GPU?

There is basic support for hardware-based kernel debugging and profiling. However, CPU-based kernel debugging is recommended in all cases due to the advanced debugging and testing capabilities.

What about .Net Standard support?

The new ILGPU version supports .Net 4.7, .Net Standard 2.0 (e.g. .Net Core 2.0) and .Net Standard 2.1 (e.g. .Net Core 3.0).

What about Linux and Mac support?

ILGPU supports .Net Core, which allows writing portable .Net applications. Since ILGPU is written in C# and does not rely on native libraries in the current version, kernels can be run on all .Net Core compatible platforms. This allows you to compile your application (including GPU code) only once.

Fork me on GitHub