Data access pattern
In the UserMatDevice, arrays with the dp_ prefix are device pointers, referring to memory allocated on the GPU.
Device pointers are passed as arguments to the kernel.
Per-thread arrays
Each thread is responsible for one element of the array, meaning that each thread operates on a unique index within the array.
// Read
int fail = dp_internal_fail[idx];
// Write
dp_internal_fail[idx] = fail;
Per-thread tensor arrays
In contrast to individual threads operating on single data points, threads process fixed-size blocks of data. Specifically, each thread handles six data points that collectively represent a tensor's components.
double stress[6];
// Read
for (int i = 0; i < 6; ++i) {
stress[i] = dp_stress[idx * 6 + i];
}
// Write
for (int i = 0; i < 6; ++i) {
dp_stress[idx * 6 + i] = stress[i];
}
Per-thread state variable arrays
Similar to per-thread tensor arrays, but with a fixed-size block length of num_history. This value is obtained from
the UserMatDevice properties struct.
// Read
double epsp = dp_history[idx * num_history + 0];
double damage = dp_history[idx * num_history + 1];
// Write
dp_history[idx * num_history + 0] = epsp;
dp_history[idx * num_history + 1] = damage;
Per-thread 3x3 matrix arrays
In this case, we employ a stride pattern. Stride access is a memory access pattern where threads access memory locations
at uniform intervals, rather than sequential locations. Use the stride variable defined in the
UserMatDevice properties struct.
double fmat[9];
// Read
for (int i = 0; i < 9; ++i) {
fmat[i] = dp_f_mat[stride * i + idx];
}
// Write
for (int i = 0; i < 9; ++i) {
dp_f_mat[stride * i + idx] = fmat[i];
}
Subgroup-shared array (element level)
// Read
int eroded = dp_eroded[idx / num_ip];
// Write
dp_eroded[idx / num_ip] = eroded;
Important: Accessing subgroup-shared arrays may be subject to race conditions when reading or writing.
Curve arrays
Curve arrays are used with the load_curve function to load a curve from a curve array into a variable. See the
load_curve for more information.
double sigy1 = mat::load_curve(dp_curve_data, dp_curve_val, cmat.idlc, epsp);