Data access pattern

In the UserMatDevice, arrays with the dp_ prefix are device pointers, referring to memory allocated on the GPU. Device pointers are passed as arguments to the kernel.

Per-thread arrays

Each thread is responsible for one element of the array, meaning that each thread operates on a unique index within the array.

// Read
int fail = dp_internal_fail[idx];

// Write
dp_internal_fail[idx] = fail;

Per-thread tensor arrays

In contrast to individual threads operating on single data points, threads process fixed-size blocks of data. Specifically, each thread handles six data points that collectively represent a tensor's components.

double stress[6];

// Read
for (int i = 0; i < 6; ++i) {
    stress[i] = dp_stress[idx * 6 + i];
}

// Write
for (int i = 0; i < 6; ++i) {
    dp_stress[idx * 6 + i] = stress[i];
}

Per-thread state variable arrays

Similar to per-thread tensor arrays, but with a fixed-size block length of num_history. This value is obtained from the UserMatDevice properties struct.

// Read
double epsp   = dp_history[idx * num_history + 0];
double damage = dp_history[idx * num_history + 1];

// Write
dp_history[idx * num_history + 0] = epsp;
dp_history[idx * num_history + 1] = damage;

Per-thread 3x3 matrix arrays

In this case, we employ a stride pattern. Stride access is a memory access pattern where threads access memory locations at uniform intervals, rather than sequential locations. Use the stride variable defined in the UserMatDevice properties struct.

double fmat[9];

// Read
for (int i = 0; i < 9; ++i) {
    fmat[i] = dp_f_mat[stride * i + idx];
}

// Write
for (int i = 0; i < 9; ++i) {
    dp_f_mat[stride * i + idx] = fmat[i];
}

Subgroup-shared array (element level)

// Read
int eroded = dp_eroded[idx / num_ip];

// Write
dp_eroded[idx / num_ip] = eroded;

Important: Accessing subgroup-shared arrays may be subject to race conditions when reading or writing.

Curve arrays

Curve arrays are used with the load_curve function to load a curve from a curve array into a variable. See the load_curve for more information.

double sigy1 = mat::load_curve(dp_curve_data, dp_curve_val, cmat.idlc, epsp);