CPU memory access

In order to maximize performance, consider adopting a data transfer approach that involves loading data from heap memory (data arrays from UserMatCPU) into local stack variables at beginning of a function. By doing so, you can take advantage of the much faster memory access speeds from the stack compared to accessing data stored on the heap. This strategy allows for efficient processing and minimizes overhead associated with frequent heap memory accesses.