Thread identifier
CUDA threads are organized hierarchically into blocks, and these blocks are further organized into a grid. To efficiently map threads to data elements in parallel programming, CUDA provides a way to identify each thread uniquely using three built-in variables:
threadIdx.x: This variable identifies the thread's index within a block. In a 1D block,threadIdx.xranges from0toblockDim.x - 1.blockDim.x: This variable represents the total number of threads in each block along the x-axis. It is a constant value for all blocks within the grid.blockIdx.x: This variable identifies the block's index within the grid along the x-axis. In a 1D grid,blockIdx.xranges from0togridDim.x - 1, wheregridDim.xis the total number of blocks in the grid along the x-axis.
Compute a unique index
Each thread in the grid can be uniquely identified by a combination of these three variables. The most common way to compute a unique index for a thread in a 1D execution is by combining these values as follows:
int idx = threadIdx.x + blockIdx.x * blockDim.x;
Preventing out-of-bounds access
The number of threads you launch might not perfectly match the number of tasks or data elements you want to process. For
example, if you have a small number of tasks (num_tasks), but you launch a large number of threads, some of those
threads would end up with no valid data to process.
By checking if idx is greater than or equal to num_tasks, we can ensure that only the threads with valid work continue
processing. This check effectively "turns off" threads that don't have anything to do, allowing the program to run
safely and correctly.
if (idx >= num_tasks) return;