CUDA programming model

Kernels

Functions marked with __global__ are kernels. A kernel is a special type of function that runs on the GPU and is executed in parallel by many threads. Kernels are the core mechanism by which you offload computation from the CPU (host) to the GPU (device).

Functions

A __device__ function in CUDA is a type of function that is executed on the GPU and is callable only from other GPU functions, such as __global__ or other __device__ functions. These functions cannot be called directly from the host (CPU) code, unless they are also marked with __host__. This will make them callable from both the host and the device.

Example

__device__
int sum(int x, int y) {
    return x + y;
}

__global__
void kernel_add_vectors(int* a, int* b, int *c, int num_tasks) {
    int idx = threadIdx.x + blockIdx.x * blockDim.x;
    if (idx >= num_tasks) return;

    c[idx] = sum(a[idx], b[idx]);
}

More information

For a more comprehensive understanding of CUDA, we recommend exploring the official NVIDIA CUDA documentation, which provides detailed guides, tutorials and reference materials.