CUDA kernel files
The CUDA API is an extension of the C / C++ programming language and just like C / C++, CUDA uses header files. Each material model consists of two files, both defined within the same namespace:
kernel_mat_XXXX.h(header)kernel_mat_XXXX.cu(source)
Below is an example of our typical implementation, with logical code omitted for clarity and to focus on the interface. Further details will be provided in subsequent subsections.
kernel_mat_XXXX.h
The header file introduces a namespace and provides forward declarations of its host functions, enabling other files like to call them.
namespace mat_kfc
{
void mat_user(UserMatHost host, UserMatDevice device, cudaStream_t stream);
}
kernel_mat_XXXX.cu
The source file is wrapped in the same namespace as the header file. It contains the CUDA kernel function with the
__global__ prefix, and a host function. The host function mat_user is used to launch the kernel function.
Additionally, the file includes a __constant__ declaration of the the material properties struct. By utilizing the
__constant__ memory space, we effectively cache frequently accessed data (such as material parameters).
namespace mat_kfc
{
__constant__ double cmat[200];
__global__
void kernel_sanders(UserMatDevice device)
{
/*
⣿⣿⣿⣿⣿⣿⣿⡿⢟⣋⣭⣥⣭⣭⣍⡉⠉⠙⠛⠻⠿⣿⣿⣿⣿⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⡏⠁⠠⠶⠛⠻⠿⣿⣿⣿⣿⣷⡄⠄⠄⠄⠄⠉⠻⢿⣿⣿⣿⣿⣿
⣿⣿⣿⣿⠟⠄⢀⡴⢊⣴⣶⣶⣾⣿⣿⣿⣿⢿⡄⠄⠄⠄⠄⠄⠄⠙⢿⣿⣿⣿
⣿⣿⡿⠁⠄⠙⡟⠁⣾⣿⣿⣿⣿⣿⣿⣿⣿⣎⠃⠄⠄⠄⠄⠄⠄⠄⠈⢻⣿⣿
⣿⡟⠄⠄⠄⠄⡇⠰⠟⠛⠛⠿⠿⠟⢋⢉⠍⢩⣠⡀⠄⠄⠄⠄⠄⠄⠄⠄⢹⣿
⣿⠁⠄⠄⠄⠄⠰⠁⣑⣬⣤⡀⣾⣦⣶⣾⣖⣼⣿⠁⠄⠄⠄⠄⠄⠄⠄⠄⠄⢿
⡏⠄⠄⠄⠄⠄⠄⠄⠨⣿⠟⠰⠻⠿⣣⡙⠿⣿⠋⠄⢀⡀⣀⠄⣀⣀⢀⣀⣀⢸
⡇⠄⠄⠄⠄⠄⠄⠄⠄⣠⠄⠚⠛⠉⠭⣉⢁⣿⠄⢀⡿⢾⣅⢸⡗⠂⢿⣀⡀⢸
⡇⠄⠄⠄⠄⠄⠄⠄⠄⠘⢧⣄⠄⣻⣿⣿⣾⠟⣀⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⢸
⣿⠄⠄⠄⠄⠄⠄⠄⠄⢠⡀⠄⠄⣿⣿⠟⢁⣴⣿⢸⡄⠄⢦⣤⣤⣤⣤⣄⡀⣼
⣿⣧⠄⠄⠄⠄⠄⠄⢠⡸⣿⠒⠄⠈⠛⠄⠁⢹⡟⣾⡇⠄⠈⢿⣿⣿⣿⣿⣿⣿
⣿⣿⣧⣠⣴⣦⠄⠄⢸⣷⡹⣧⣖⡔⠄⠱⣮⣻⣷⣿⣿⠄⠄⠘⣿⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⡇⠄⠄⠸⠿⠿⠚⠛⠁⠂⠄⠉⠉⡅⢰⡆⢰⡄⠄⠘⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⣷⣤⡀⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⣿⠄⣷⠘⣧⣠⣾⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⣿⣿⣿⣷⣦⣤⣄⣀⣀⡀⠄⣀⣀⣹⣦⣽⣾⣿⣿⣿⣿⣿⣿⣿⣿
*/
}
void kernelMaterial(UserMatHost host, UserMatDevice device, cudaStream_t stream)
{
const unsigned int block_size = 128;
const unsigned int num_blocks = calcNumBlocks(device.num_tasks, block_size);
cudaMemcpyToSymbolAsync(cmat, host.p_material, sizeof(double) * 200, 0, cudaMemcpyHostToDevice, stream);
kernel_sanders<<<num_blocks, block_size, 0, stream>>>(device);
}
} // namespace mat_kfc