CUDA kernel files

The CUDA API is an extension of the C / C++ programming language and just like C / C++, CUDA uses header files. Each material model consists of two files, both defined within the same namespace:

  • kernel_mat_XXXX.h (header)
  • kernel_mat_XXXX.cu (source)

Below is an example of our typical implementation, with logical code omitted for clarity and to focus on the interface. Further details will be provided in subsequent subsections.

kernel_mat_XXXX.h

The header file introduces a namespace and provides forward declarations of its host functions, enabling other files like to call them.

namespace mat_kfc
{

void mat_user(UserMatHost host, UserMatDevice device, cudaStream_t stream);

}

kernel_mat_XXXX.cu

The source file is wrapped in the same namespace as the header file. It contains the CUDA kernel function with the __global__ prefix, and a host function. The host function mat_user is used to launch the kernel function.

Additionally, the file includes a __constant__ declaration of the the material properties struct. By utilizing the __constant__ memory space, we effectively cache frequently accessed data (such as material parameters).

namespace mat_kfc
{
__constant__ double cmat[200];

__global__
void kernel_sanders(UserMatDevice device)
{
    /*
    ⣿⣿⣿⣿⣿⣿⣿⡿⢟⣋⣭⣥⣭⣭⣍⡉⠉⠙⠛⠻⠿⣿⣿⣿⣿⣿⣿⣿⣿⣿
    ⣿⣿⣿⣿⣿⡏⠁⠠⠶⠛⠻⠿⣿⣿⣿⣿⣷⡄⠄⠄⠄⠄⠉⠻⢿⣿⣿⣿⣿⣿
    ⣿⣿⣿⣿⠟⠄⢀⡴⢊⣴⣶⣶⣾⣿⣿⣿⣿⢿⡄⠄⠄⠄⠄⠄⠄⠙⢿⣿⣿⣿
    ⣿⣿⡿⠁⠄⠙⡟⠁⣾⣿⣿⣿⣿⣿⣿⣿⣿⣎⠃⠄⠄⠄⠄⠄⠄⠄⠈⢻⣿⣿
    ⣿⡟⠄⠄⠄⠄⡇⠰⠟⠛⠛⠿⠿⠟⢋⢉⠍⢩⣠⡀⠄⠄⠄⠄⠄⠄⠄⠄⢹⣿
    ⣿⠁⠄⠄⠄⠄⠰⠁⣑⣬⣤⡀⣾⣦⣶⣾⣖⣼⣿⠁⠄⠄⠄⠄⠄⠄⠄⠄⠄⢿
    ⡏⠄⠄⠄⠄⠄⠄⠄⠨⣿⠟⠰⠻⠿⣣⡙⠿⣿⠋⠄⢀⡀⣀⠄⣀⣀⢀⣀⣀⢸
    ⡇⠄⠄⠄⠄⠄⠄⠄⠄⣠⠄⠚⠛⠉⠭⣉⢁⣿⠄⢀⡿⢾⣅⢸⡗⠂⢿⣀⡀⢸
    ⡇⠄⠄⠄⠄⠄⠄⠄⠄⠘⢧⣄⠄⣻⣿⣿⣾⠟⣀⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⢸
    ⣿⠄⠄⠄⠄⠄⠄⠄⠄⢠⡀⠄⠄⣿⣿⠟⢁⣴⣿⢸⡄⠄⢦⣤⣤⣤⣤⣄⡀⣼
    ⣿⣧⠄⠄⠄⠄⠄⠄⢠⡸⣿⠒⠄⠈⠛⠄⠁⢹⡟⣾⡇⠄⠈⢿⣿⣿⣿⣿⣿⣿
    ⣿⣿⣧⣠⣴⣦⠄⠄⢸⣷⡹⣧⣖⡔⠄⠱⣮⣻⣷⣿⣿⠄⠄⠘⣿⣿⣿⣿⣿⣿
    ⣿⣿⣿⣿⣿⡇⠄⠄⠸⠿⠿⠚⠛⠁⠂⠄⠉⠉⡅⢰⡆⢰⡄⠄⠘⣿⣿⣿⣿⣿
    ⣿⣿⣿⣿⣿⣷⣤⡀⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⣿⠄⣷⠘⣧⣠⣾⣿⣿⣿⣿⣿
    ⣿⣿⣿⣿⣿⣿⣿⣿⣷⣦⣤⣄⣀⣀⡀⠄⣀⣀⣹⣦⣽⣾⣿⣿⣿⣿⣿⣿⣿⣿
    */
}

void kernelMaterial(UserMatHost host, UserMatDevice device, cudaStream_t stream)
{
    const unsigned int block_size = 128;
    const unsigned int num_blocks = calcNumBlocks(device.num_tasks, block_size);

    cudaMemcpyToSymbolAsync(cmat, host.p_material, sizeof(double) * 200, 0, cudaMemcpyHostToDevice, stream);
    kernel_sanders<<<num_blocks, block_size, 0, stream>>>(device);
}
} // namespace mat_kfc