- Extended kernel loaders with additional delegate overloads. (Community Request)
- Fixed invalid loading of debug symbols from dynamic assemblies.
- Added support for basic line-based GPU debugging and profiling.
- Added support for jagged arrays. (Community Request)
- Redesigned support for sub-warp shuffles. (Community Request)
- Redesigned implicit stream launchers. (Community Request)
- Fixed several code generation issues. (see GitHub)
- Redesigned all required transformations and code generators.
- Redesigned IR in order to significantly improve compilation time and memory consumption.
- Removed all native library dependencies (including LLVM).
- Redesigned huge parts of the compiler.
- Added conceptionally new (experiemental) IR.
- Introduced generic data views in order to generate code for low-level (e.g. PTX) and high-level (e.g Vulkan) targets.
- Adapted IL-Frontend to generate IR code instead of LLVM-IR.
- Added new code-transformation phases to optimize code.
- Added support for parallel code generation.
- Adapted support for portable PDBs.
- Added support for .Net Standard 2.0. (Community Request)
- Added first support for specializing kernels during compilation.
- Updated accelerator caching functionality. (Community Request)
- Improved multithreading support. (Community Request)
- Added automatic disposal of kernels, memory buffers and accelerator streams. (Community Request)
- Integrated basic support for portable PDBs.
- Added native build scripts for linux operating systems.
- Added support for linux operating systems in DLLLoader and PTXBackend.
- Fixed invalid code generation of non-zero (true) branches.
- Added convenient kernel loading and caching to accelerator classes.
- Added properties to query the maximum number of threads of an accelerator.
- Added Disposed event to Accelerator.
- Added support for Cuda 9.0.
- Added new cross-platform Cuda API.
- Added integer-division operators to GPUMath.
- Added RadToDeg and DegToRad conversion methods to GPUMath.
- Added support for .Net Core 2.0.
- Removed LLVMSharp dependency.
- Enhanced SSA code generation.
- Fixed invalid constant generation of padded structures.
- Updated CompilerServices.Unsafe dependency to version 4.4.0.
- Fixed invalid code generation of float-based Atomic.Min/Atomic.Max functions.
- Added support for nullable types in kernels.
- Fixed invalid return value of atomic add in CPU mode.
- Fixed invalid resolving of generic virtual methods.
- Fixed critical thread-divergence issues in CPUAccelerator.
- Added additional checks to avoid group-barrier functions in implicitly- grouped kernels.
- Fixed critical issue in kernel-launcher code generation in CPUAccelerator.
- Fixed invalid loading of double constants in Force32BitFloats mode.
- Fixed invalid code generation of some math intrinsics (Atan, Atan2, Pow).
- Fixed wrong view dimension in GetRowView.
- Added atomics for index types.
- Added new debug views for generic array views in CPU mode.
- Added additional operators to index types.
- Added min/max functions to index types.
- Added new clamp functions to GPUMath.
- Added support for IntPtr.ToPointer functions.
- Enhanced reduction interface.
- Fixed invalid ArgumentOfOfRangeException-check in MemoryBuffer.
- Fixed critical issue in ArrayView3D<T>.GetSliceView.
- Fixed critical issue in ArrayView<T>.GetRowView.
- Removed internal IL-assembly intrinsics by an official Unsage package.
- Added debug and release versions to NuGet package. Reason: Exceptions are not allowed in GPU code but debug assertions are allowed. Release builds do not contain assertions. Hence, debug assemblies are are required for proper error messages in GPU kernels during development.
- Added feature to force all floating-point operations to 32bit (even math intrinsics): CompileUnitFlags.Force32BitFloats.
- Fixed invalid math-intrinsic annotation.
- Fixed invalid error messages in debug assertions on PTX-based devices.
- Fixed critical issues in array code generation.