CUDA programming model
Kernels
Functions marked with __global__ are kernels. A kernel is a special type of function that runs on the GPU and is
executed in parallel by many threads. Kernels are the core mechanism by which you offload computation from the CPU
(host) to the GPU (device).
Functions
A __device__ function in CUDA is a type of function that is executed on the GPU and is callable only from other GPU
functions, such as __global__ or other __device__ functions. These functions cannot be called directly from the host
(CPU) code, unless they are also marked with __host__. This will make them callable from both the host and the device.
Example
__device__
int sum(int x, int y) {
return x + y;
}
__global__
void kernel_add_vectors(int* a, int* b, int *c, int num_tasks) {
int idx = threadIdx.x + blockIdx.x * blockDim.x;
if (idx >= num_tasks) return;
c[idx] = sum(a[idx], b[idx]);
}
More information
For a more comprehensive understanding of CUDA, we recommend exploring the official NVIDIA CUDA documentation, which provides detailed guides, tutorials and reference materials.