Introduction
IMPETUS Solver provides a framework for custom material models through dynamic-link library (DLL) or shared object files under Linux. Users can compile their own user-defined materials, which are then linked to the software at the start of a simulation. This approach offers several advantages, including seamless compatibility with any version of the software, provided there have been no major changes to the interface.
Users can take advantage of the latest software updates without requiring significant modifications to their custom material models. By incorporating specialized knowledge and expertise into their simulations, users can leverage proprietary material properties or complex constitutive relationships that may not be readily available in standard material models.
The ability to link up to eight user-defined material models also provides flexibility in modeling complex systems comprising diverse materials. This feature enables users to tailor the solver's behavior to meet specific simulation requirements, thereby increasing its overall utility and effectiveness.
Licence
This IMPETUS User material package and documentation are released under the MIT License.
MIT License
Copyright (c) 2024 IMPETUS
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
Installation
To compile and build this framework, you need to have the necessary compilation tools installed. This includes standard compilers for the languages used in the framework.
Prerequisites
Windows
- Visual Studio 2022 (with Visual C++)
- NVIDIA CUDA Toolkit (CUDA 12.x)
Linux
- GNU Make (make)
- GNU Compiler Collection (GCC) (g++)
- NVIDIA CUDA Toolkit (CUDA 12.x)
Structure
This framework contains the following folders and files:
user-material-models/
├── docs/
├── example-models/
├── output/
└── user-material/
├── gpu_error.cpp
├── gpu_error.h
├── interface.cpp
├── interface.h
├── kernel_mat_XXXX.cu
├── kernel_mat_XXXX.h
├── mat_user_defines.h
├── mat_user_functions.h
├── mat_user_structs_cpu.h
├── mat_user_structs_gpu.h
├── mat_user.h
├── mat_XXXX.cpp
├── mat_XXXX.h
├── user-material.vcxproj.filters
└── user-material.vcxproj
├── user-material.sln
├── CMakeLists.txt
└── Makefile
-
The
docsdirectory is where this documentation is located. -
The
example-modelsdirectory contains example models that can be used to test the provided user material models. -
The
outputdirectory serves as the destination for the compiled DLL / shared object file if compilation is successful. -
The
user-materialdirectory contains C++ and CUDA code. -
The
user-material.slnfile is a Visual Studio 2022 solution file. It contains the compilation and execution configurations for the framework. -
The
CMakeLists.txtprovides a set of rules for building the user material project using Cmake -
The
Makefileprovides a set of rules for building the user material project under Linux. It contains the compilation and execution configurations for the framework.
Compilation
Windows
This guide explains how to compile a "Release" version in Visual Studio, producing a DLL file named user-material.dll
in the output folder.
-
Open the Solution in Visual Studio
- Launch Visual Studio.
- Navigate to
File>Open>Project/Solution.... - Browse and select the solution file (
impetus-user-material.sln), then clickOpen.
-
Select the "Release" Configuration
- In the toolbar, find the "Solution Configurations" dropdown (usually located at the top of the window).
- Click the dropdown and select
Release. This setting ensures that the build is optimised for performance.
-
Build the Solution
- In the toolbar, click
Build>Build Solution. - Visual Studio will compile the solution using the "Release" configuration.
- In the toolbar, click
-
Verify the Output
- After the build completes (if successful), navigate to the output directory
output\. - Confirm that the
user-material.dllfile is present.
- After the build completes (if successful), navigate to the output directory
Linux
This guide explains how to compile your project on Linux using make. The output will be a shared library
(DLL equivalent) named user-material.so located in the output directory.
-
Navigate to the Project Directory
- Open a terminal window.
- Use the
cdcommand to navigate to the directory containing yourMakefile.
cd /path/to/your/project -
Run the
makeCommand- Type
makein the terminal and pressEnter. This command will compile your project using the settings defined in theMakefile.
make - Type
-
Verify the Output
- Once the build completes, open the
output/directory to confirm that theuser-material.sofile has been generated.
ls output/- You should see
user-material.soin the directory.
- Once the build completes, open the
Usage
The parameter input for user defined material models are defined using the *MAT_USER_[X] command in the input deck.
See the *MAT_USER_X documentation for more information.
Start simulation with your DLL / shared object in the GUI
- In IMPETUS Solver GUI, navigate to
Solveand then in the Ribbon menu click onStart simulationbutton. - Browse for your input file.
- Check the "Use defined material model" check box.
- Browse for your DLL / shared object file.
- Click
Startbutton to start the simulation.
Start simulation with your DLL / shared object in CLI
To start simulations in CLI, use the impstart command, which is located in [installation_dir]/Solver/gui.
This tool add simulations to the job queue and status can be seen in the Solve tab of IMPETUS Solver GUI.
Usage
impstart [options] INPUTFILE
Impstart is a command line tool and its options are obtained by executing it without arguments.
Use the --material-path option to specify where to find the material DLL / shared object file.
Example
impstart --material-path /path/to/user-material.so /path/to/input-file.k
Overview
Our user-defined material models framework is built on a C++ architecture, making it easy for users to develop their own custom material models. The framework provides a seamless interface for implementing materials behavior that is thoroughly integrated into the software.
Our framework also incorporates a GPU-accelerated implementation that leverages Nvidia's CUDA architecture. By offloading computationally intensive material model to the GPU, users can achieve substantial speedups over traditional CPU-based implementations. Our CUDA interface is easy to use and enables users to simulate complex material behavior at unprecedented scales and speeds.
MatUser class
To create your own custom material model using our framework, derive a new class from the MatUser base class, which
serves as a foundation for defining user-defined materials. The MatUser class provides a basic structure through its
virtual functions, enabling you to override and customize the behavior of your material model.
To get started, create a new C++ class that inherits from MatUser. For example:
#include "mat_user.h"
class MatMyModel : public MatUser {
// Your implementation goes here...
};
Configuration
Each material must define its material configuration. This is done by setting the class member variables in the constructor of your material model class. See Material configuration section for more information on how to configure a material model.
Virtual functions to override
void runInit(UserMatInitCPU data) const
This function is invoked during the initialization phase of the simulation. You can use it to set up any necessary data structures or perform other initialization tasks, such as set the inital state of the material and, if required, calculate material parameters.
void runMat(UserMatCPU data) const
This function is executed each time step of the simulation if GPU acceleration in not available /enabled. It is effectively defining the behavior of the material. The constitutive response, failure criteria, and post-failure behavior are all critical components that can be tailored to capture the unique characteristics of various materials.
void runInitGPU(UserMatHost host, UserMatDevice device, cudaStream_t stream) const
Currently not used, reserved for future implementation
void runMatGPU(UserMatHost host, UserMatDevice device, cudaStream_t stream) const
This is similar to runCPU but for GPU accelerated computation. Unlike the CPU-based executions, where member functions
can be directly defined within the class structure, CUDA kernel implementations must reside in separate files due to the
language's inherent limitations. As a result, this function acts as an interface, invoking the relevant GPU-accelerated
implementation from the external file.
Material model interface
The material model interface is a set of C functions that are used to configure, initialize and run a material model.
These functions are implemented in interface.cpp.
Overview of interface functions
Each user material consists of five functions, which are all called by IMPETUS Solver:
mat_user_X_config: Get the configuration parameters required for the material model.init_mat_user_X: Initializes the material state.mat_user_X: Evaluates the material response.init_mat_user_X_gpu: Currently not being used, but reserved for future GPU implementation.mat_user_X_gpu: Evaluates the material behavior on a GPU device.
Implementing Your Own User Material
- Create an instance of your derived
MatUserclass. - Choose an available slot (e.g.,
mat_user_1). - Link the the following functions with your material model's corresponding functions:
mat_user_X_configinit_mat_user_Xmat_user_Xinit_mat_user_X_gpumat_user_X_gpu
Example Use Case
Suppose you want to link your user material (e.g. MatMetal). You would:
- Create an instance of
MatMetalclass (don't forget the namespace). - Choose slot
mat_user_1. - Link each of the functions for this material ID with its corresponding class functions:
static user_metal::MatMetal metal;
void mat_user_1_config(UserMatConfig* config, UserMatProp* properties)
{
metal.getConfig(config, properties);
}
void init_mat_user_1(UserMatInitCPU data)
{
metal.runInit(data);
}
void mat_user_1(UserMatCPU data)
{
metal.runMat(data);
}
void init_mat_user_1_gpu(UserMatHost host, UserMatDevice device, cudaStream_t stream)
{
metal.runInitGPU(host, device, stream);
}
void mat_user_1_gpu(UserMatHost host, UserMatDevice device, cudaStream_t stream)
{
metal.runMatGPU(host, device, stream);
}
Namespaces
Namespaces allow us to use the same name for different functions or variables in different contexts without causing
ambiguity. For example, consider two material models: metal and rubber. Both may have a function named execute.
Without namespaces, defining these functions with the same name in a global scope would lead to a naming conflict.
By placing execute within respective namespaces, we can easily tell them apart.
Namespaces are a powerful tool that enables us to define multiple functions or variables with the same name without causing ambiguity. This is particularly useful when working with different material models, such as metal and rubber, which may share common functionality but require distinct implementations.
For instance, consider two material models: mat_metal and mat_rubber, each of which might have a function named execute.
Without namespaces, defining these functions in the global scope would result in a naming conflict, and compilation
errors.
By placing the execute function within its respective namespace, we can easily differentiate between the two
implementations. This not only improves code organization and maintainability but also reduces the likelihood of bugs
and errors.
namespace user_metal
{
void execute()
{
}
} // namespace user_metal
namespace user_rubber
{
void execute()
{
}
} // namespace user_rubber
int function()
{
// Call metal execution
user_metal::execute();
// Call rubber execution
user_rubber::execute();
}
Tensor and Matrix order
Tensor Notation
All symmetric tensors in IMPETUS Solver uses an alternative notation of the Voigt notation.
\(
\begin{bmatrix}
\sigma_{1} & \sigma_{6} & \sigma_{5} \\
& \sigma_{2} & \sigma_{4} \\
& & \sigma_{3}
\end{bmatrix}
\)
Traditional voigt notation: 11, 22, 33, 23, 13, 12.
\(
\begin{bmatrix}
\sigma_{1} & \sigma_{4} & \sigma_{6} \\
& \sigma_{2} & \sigma_{5} \\
& & \sigma_{3}
\end{bmatrix}
\)
Our definition is: 11, 22, 33, 12, 23, 13.
Matrix access order
Matrices are stored in column major order.
\(
\begin{bmatrix}
\sigma_{11} & \sigma_{12} & \sigma_{13} \\
\sigma_{21} & \sigma_{22} & \sigma_{23} \\
\sigma_{31} & \sigma_{32} & \sigma_{33}
\end{bmatrix}
\)
The order of the matrix is: 11, 21, 31, 12, 22, 32, 13, 23, 33
Material configuration
To properly configure each material, you must implement its properties. This is achieved through the constructor of your material model class, where you can set the relevant class member variables.
The following sections will provide detailed information on the parameters that need to be configured for each material. By following these guidelines and setting the required parameters in the constructor of your material model class, you can ensure that each material is accurately represented and prepared for simulation.
Material properties
Class member variables
bool gpu_enabled;
int num_hist;
int pos_epsp;
int pos_temp;
int pos_texture;
int pos_damage;
int erode_flag;
int pos_erode_flag;
std::vector<int> curve;
std::vector<std::pair<int, std::string>> contour;
Description
bool gpu_enabled
Set to true if GPU acceleration should be used, otherwise false.
int num_hist
The number of state variables used by each material model. The state variable array (p_history) is a container where
one can store data for each integration point. These variables can change throughout the simulation.
int pos_epsp
Specifies where in the state variables array (p_history) the effective plastic strain variable is located. This is
also used for damage properties.
int pos_temp
Thermal definition from for example *INITIAL_TEMPERATURE can be obtained through the pos_temp value. The value set
will specify where in the state variables array (p_history) to set the initial temperature. This is also used for
damage properties.
int pos_texture
When implementing material models using fiber direction (for example fiber composites) with
*INITIAL_MATERIAL_DIRECTION_WRAP, a texture location must be set. The location tells IMPETUS Solver where in the state
variables array (p_history) to set the initial fiber direction. It requires 9 slots (3 vectors) in the state variables
array (from the specified index). Make sure to allocate enough space when setting num_hist.
int pos_damage
The location in the state variable array (p_history) where damage is stored.
int erode_flg
Specifies how the material model should handle damage evolution.
0: no element erosion / node splitting1: element erosion2: node splitting
int erode_flg_pos
Specifies the position of the erosion flag defined in material parameter list (p_cmat / cmat). If erode_flg_pos is
set, the value in this position in cmat will be copied to cmat[80].
std::vector<int> curve
Specifies where in the material parameter list (p_cmat / cmat) the curve / function ID's are located. Up to 50
curves can be specified per material models. Use curve.emplace_back(id) to add new entries.
std::vector<std::pair<int, std::string>> contour
Set a specific state variable for output, which can then be visualised as a contour plot in the GUI. Use
contour.emplace_back(std::make_pair(id, "Contour plot name")); to add new entries.
Stiffness properties
Note: This is optional.
NO_REDUCTIONis applied by default.
Setting the stiffness arrays and these options will allow for IMPETUS Solver to control the critical time step calculations. If set, reduction operations (per element) will be performed for the specific data array(s).
Options for each stiffness variable:
NO_REDUCTION: No changes are made to the original valuesMAX_REDUCTION: The maximum value reduction is applied for each elementMIN_REDUCTION: The minimum value reduction is applied for each element
These reduction operations affect the following data arrays defined in
UserMatDevice struct:
stiff_shearaffectsdevice.dp_shearstiff_bulkaffectsdevice.dp_bulkstiff_xiaffectsdevice.dp_xistiff_bfacaffectsdevice.dp_bfac
Class member variables
int stiff_shear
int stiff_bulk
int stiff_xi
int stiff_bfac
Description
bool stiff_shear
Setting this property will apply reduction operations on the data.dp_shear array.
Options:
NO_REDUCTION: No changes are made to the original valuesMAX_REDUCTION: The maximum value reduction is applied for each elementMIN_REDUCTION: The minimum value reduction is applied for each element
int stiff_bulk
Setting this property will apply reduction operations on the data.dp_shear array.bulk` array.
Options:
NO_REDUCTION: No changes are made to the original valuesMAX_REDUCTION: The maximum value reduction is applied for each elementMIN_REDUCTION: The minimum value reduction is applied for each element
int stiff_xi
Setting this property will apply reduction operations on the data.dp_shear array.xi` array.
Options:
NO_REDUCTION: No changes are made to the original valuesMAX_REDUCTION: The maximum value reduction is applied for each elementMIN_REDUCTION: The minimum value reduction is applied for each element
int stiff_bfac
Setting this property will apply reduction operations on the data.dp_shear array.bfac` array.
Options:
NO_REDUCTION: No changes are made to the original valuesMAX_REDUCTION: The maximum value reduction is applied for each elementMIN_REDUCTION: The minimum value reduction is applied for each element
Damage properties
User material models can be combined with the *PROP_DAMAGE_[XXXX] commands, enabling the use of the build-in damage
failure criterion models. This feature is ideal if one want to use well known and defined damage models.
When a damage ID is specified, our software automatically uses the built-in implementation. The only requirements are
that the positions of certain properties are defined. Some the variables, such as pos_epsp and pos_temp are
overlapping with "Material properties". They are used for multiple purposes.
Class member variables
int pos_epsp
int pos_depsp
int pos_rate
int pos_temp
int pos_evol
int pos_damage1
int pos_damage2
Description
int pos_epsp
Location of epsp in the state variables array.
int pos_depsp
Location of depsp in the state variables array.
int pos_rate
Location of rate in the state variables array.
int pos_temp
Location of temp in the state variables array.
int pos_evol
Location of evol in the state variables array.
int pos_damage1
Location of damage1 in the state variables array.
int pos_damage2
Location of damage2 in the state variables array.
Example
mat_metal.cpp
MatMetal::MatMetal()
{
// Enable GPU implementation of this material model
gpu_enabled = true;
// Material configuration
num_hist = 6; // Number of state variables per integration point
pos_epsp = 0; // For prop damage, sensor output
pos_temp = 4; // For prop damage, prop thermal
pos_texture = -1; // For texture, not used by this material model
pos_damage = 1; // For damage, position in the state variable array
erode_flag = 2; // Erosion flag (0=no erosion, 1=element erosion, 2=node splitting)
pos_erode_flag = -1; // Not used, since this material model does not use element erosion
pos_depsp = 5; // For prop damage
pos_rate = -1; // For prop damage
pos_evol = -1; // For prop damage
pos_damage1 = 1; // For prop damage
pos_damage2 = 2; // For prop damage
// Curve definitions
curve.emplace_back(6); // A curve ID is present at position 6 in the material parameter array (cmat)
// Add the location in the state variable array and the output name for contour plot
contour.emplace_back(std::make_pair(0, "Effective plastic strain"));
contour.emplace_back(std::make_pair(1, "Damage"));
contour.emplace_back(std::make_pair(4, "Temperature"));
// Stiffness reduction (for GPU only)
stiff_shear = NO_REDUCTION;
stiff_bulk = NO_REDUCTION;
stiff_xi = NO_REDUCTION;
stiff_bfac = NO_REDUCTION;
}
Material initialization
This function is called during the initialization phase of the simulation. You can use it to set up any necessary data structures or perform other initialization tasks, such as the inital state of the material and, if required, calculate material parameters.
This intialization function is called for both CPU and GPU implementations. Future implementations will enable the
runInitGPU function when utilizing GPU acceleration.
runInit()
void runInit(UserMatInitCPU data) const
data
- Type: UserMatInitCPU
- Description: User material host data structure. See UserMatInitCPU for more information on the data structure.
Example
void MatMetal::runInit(UserMatInitCPU data) const
{
double young = data.p_cmat[1];
double pr = data.p_cmat[2];
double bulk = young / (3.0 * (1.0 - 2.0 * pr));
double G = young / (2.0 * (1.0 + pr));
// Set material properties
data.p_cmat[43] = bulk;
// Stress-strain curve
int idlc = static_cast<int>(round(data.p_cmat[6]));
// Yield stress
double sigy0 = data.p_history[3];
// Set initial yield stress (if not already defined)
if (sigy0 == 0.0) {
// load material properties from database
sigy0 = mat::load_curve(data.p_curve_data, data.p_curve_val, idlc, 0.0);
data.p_history[3] = sigy0;
}
// Stiffness matrix
data.p_stiffness[0] = G;
data.p_stiffness[1] = bulk;
}
Material response
Material response is the central mechanism for defining a material's complex behavior under varying conditions. Key aspects include the constitutive response, which governs how the material reacts to stress and strain and the failure criteria. These components can be tailored to accurately capture the unique properties of your needs.
This framework offers two interfaces, one for CPU (C++) and another for GPU (CUDA).
CPU implementation
This function is called for each integration point during each time step of the simulation on the CPU. It defines the behavior of the material, such as its constitutive response, failure criteria, and post-failure behavior.
runMat()
void runMat(UserMatCPU data) const
data
- Type: UserMatCPU
- Description: User material data structure. See UserMatCPU for more information on the data structure.
Example
void MatRubber::runMat(UserMatCPU data) const
{
// Load material parameters
double C1 = data.p_cmat[6];
double C2 = data.p_cmat[7];
double bulk = data.p_cmat[43];
double fmat[9], c[6], eval[3];
// Load deformation gradient matrix
for (int i = 0; i < 9; ++i) {
fmat[i] = data.p_f_mat[i];
}
// Volumetric strain and pressure
double evol = data.p_strain[0] + data.p_strain[1] + data.p_strain[2];
double pressure = -bulk * evol;
mat::cauchy_green_tensor(fmat, c);
// Re-using fmat array as eigen vector array since fmat array is no longer needed
double* evec = fmat;
// Get principal stretches (squared)
mat::calc_eigen_values(c, eval);
mat::calc_eigen_vectors(c, eval, evec);
double alpha = 2.0 * C1 * (eval[0] - eval[2]) - 2.0 * C2 * (1.0 / eval[0] - 1.0 / eval[2]);
double beta = 2.0 * C1 * (eval[1] - eval[2]) - 2.0 * C2 * (1.0 / eval[1] - 1.0 / eval[2]);
eval[0] = (2.0 * alpha - 1.0 * beta) / 3.0 - pressure;
eval[1] = (-1.0 * alpha + 2.0 * beta) / 3.0 - pressure;
eval[2] = (-1.0 * alpha - 1.0 * beta) / 3.0 - pressure;
// Re-using c array as local stress array since c array is not longer needed
double* stress = c;
for (int i = 0; i < 6; ++i) {
stress[i] = 0.0;
}
for (int i = 0; i < 3; ++i) {
int vector_offset = i * 3;
stress[0] += eval[i] * pow(evec[vector_offset + 0], 2);
stress[1] += eval[i] * pow(evec[vector_offset + 1], 2);
stress[2] += eval[i] * pow(evec[vector_offset + 2], 2);
stress[3] += eval[i] * evec[vector_offset + 0] * evec[vector_offset + 1];
stress[4] += eval[i] * evec[vector_offset + 1] * evec[vector_offset + 2];
stress[5] += eval[i] * evec[vector_offset + 2] * evec[vector_offset + 0];
}
// Save eigen values for contour plot output
for (int i = 0; i < 3; ++i) {
data.p_history[i] = eval[i];
}
// Save new stress
for (int i = 0; i < 6; ++i) {
data.p_stress[i] = stress[i];
}
}
CPU memory access
In order to maximize performance, consider adopting a data transfer approach that involves loading data from heap memory
(data arrays from UserMatCPU) into local stack variables at beginning of a function. By doing so, you can take
advantage of the much faster memory access speeds from the stack compared to accessing data stored on the heap. This
strategy allows for efficient processing and minimizes overhead associated with frequent heap memory accesses.
Data access pattern
In the UserMatCPU, arrays with the p_ prefix are data pointers to arrays used for calculation of material response.
Since arrays are structured per integration point for CPU implementation, there is no need to calculate global offsets.
For instance, p_strain[0] through p_strain[5] relate to the current integration point.
GPU implementation
This function is called for each integration point during each time step of the simulation on the CPU. It defines the behavior of the material, such as its constitutive response, failure criteria, and post-failure behavior.
CUDA kernels (GPU code) cannot be implemented as a member functions in classes. The solution here is to implement them in separate files but invoked from the class function. To protecct against naming conflicts, we define the same namespace for both the class and the kernel that belong to each other.
runMatGPU()
void runMatGPU(UserMatHost host, UserMatDevice device, cudaStream_t stream) const
host
- Type: UserMatHost
- Description: User material host data structure. See UserMatHost for more information on the data structure.
device
- Type: UserMatDevice
- Description: User material device data structure. See UserMatDevice for more information on the data structure.
stream
- Type: cudaStream_t
- Description: CUDA stream used by the material to perform its calculations.
Example
The following example calls the function mat_user(), which itself is part of the same
namespace. We do not have to explicitly call the namespace, since both the
class already belongs to the namespace. In our provided sample, the CUDA implementation is in a file
called kernel_mat_rubber.cu.
void MatRubber::runMatGPU(UserMatHost host, UserMatDevice device, cudaStream_t stream) const
{
mat_user(host, device, stream);
}
CUDA programming model
Kernels
Functions marked with __global__ are kernels. A kernel is a special type of function that runs on the GPU and is
executed in parallel by many threads. Kernels are the core mechanism by which you offload computation from the CPU
(host) to the GPU (device).
Functions
A __device__ function in CUDA is a type of function that is executed on the GPU and is callable only from other GPU
functions, such as __global__ or other __device__ functions. These functions cannot be called directly from the host
(CPU) code, unless they are also marked with __host__. This will make them callable from both the host and the device.
Example
__device__
int sum(int x, int y) {
return x + y;
}
__global__
void kernel_add_vectors(int* a, int* b, int *c, int num_tasks) {
int idx = threadIdx.x + blockIdx.x * blockDim.x;
if (idx >= num_tasks) return;
c[idx] = sum(a[idx], b[idx]);
}
More information
For a more comprehensive understanding of CUDA, we recommend exploring the official NVIDIA CUDA documentation, which provides detailed guides, tutorials and reference materials.
CUDA kernel files
The CUDA API is an extension of the C / C++ programming language and just like C / C++, CUDA uses header files. Each material model consists of two files, both defined within the same namespace:
kernel_mat_XXXX.h(header)kernel_mat_XXXX.cu(source)
Below is an example of our typical implementation, with logical code omitted for clarity and to focus on the interface. Further details will be provided in subsequent subsections.
kernel_mat_XXXX.h
The header file introduces a namespace and provides forward declarations of its host functions, enabling other files like to call them.
namespace mat_kfc
{
void mat_user(UserMatHost host, UserMatDevice device, cudaStream_t stream);
}
kernel_mat_XXXX.cu
The source file is wrapped in the same namespace as the header file. It contains the CUDA kernel function with the
__global__ prefix, and a host function. The host function mat_user is used to launch the kernel function.
Additionally, the file includes a __constant__ declaration of the the material properties struct. By utilizing the
__constant__ memory space, we effectively cache frequently accessed data (such as material parameters).
namespace mat_kfc
{
__constant__ double cmat[200];
__global__
void kernel_sanders(UserMatDevice device)
{
/*
⣿⣿⣿⣿⣿⣿⣿⡿⢟⣋⣭⣥⣭⣭⣍⡉⠉⠙⠛⠻⠿⣿⣿⣿⣿⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⡏⠁⠠⠶⠛⠻⠿⣿⣿⣿⣿⣷⡄⠄⠄⠄⠄⠉⠻⢿⣿⣿⣿⣿⣿
⣿⣿⣿⣿⠟⠄⢀⡴⢊⣴⣶⣶⣾⣿⣿⣿⣿⢿⡄⠄⠄⠄⠄⠄⠄⠙⢿⣿⣿⣿
⣿⣿⡿⠁⠄⠙⡟⠁⣾⣿⣿⣿⣿⣿⣿⣿⣿⣎⠃⠄⠄⠄⠄⠄⠄⠄⠈⢻⣿⣿
⣿⡟⠄⠄⠄⠄⡇⠰⠟⠛⠛⠿⠿⠟⢋⢉⠍⢩⣠⡀⠄⠄⠄⠄⠄⠄⠄⠄⢹⣿
⣿⠁⠄⠄⠄⠄⠰⠁⣑⣬⣤⡀⣾⣦⣶⣾⣖⣼⣿⠁⠄⠄⠄⠄⠄⠄⠄⠄⠄⢿
⡏⠄⠄⠄⠄⠄⠄⠄⠨⣿⠟⠰⠻⠿⣣⡙⠿⣿⠋⠄⢀⡀⣀⠄⣀⣀⢀⣀⣀⢸
⡇⠄⠄⠄⠄⠄⠄⠄⠄⣠⠄⠚⠛⠉⠭⣉⢁⣿⠄⢀⡿⢾⣅⢸⡗⠂⢿⣀⡀⢸
⡇⠄⠄⠄⠄⠄⠄⠄⠄⠘⢧⣄⠄⣻⣿⣿⣾⠟⣀⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⢸
⣿⠄⠄⠄⠄⠄⠄⠄⠄⢠⡀⠄⠄⣿⣿⠟⢁⣴⣿⢸⡄⠄⢦⣤⣤⣤⣤⣄⡀⣼
⣿⣧⠄⠄⠄⠄⠄⠄⢠⡸⣿⠒⠄⠈⠛⠄⠁⢹⡟⣾⡇⠄⠈⢿⣿⣿⣿⣿⣿⣿
⣿⣿⣧⣠⣴⣦⠄⠄⢸⣷⡹⣧⣖⡔⠄⠱⣮⣻⣷⣿⣿⠄⠄⠘⣿⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⡇⠄⠄⠸⠿⠿⠚⠛⠁⠂⠄⠉⠉⡅⢰⡆⢰⡄⠄⠘⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⣷⣤⡀⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⣿⠄⣷⠘⣧⣠⣾⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⣿⣿⣿⣷⣦⣤⣄⣀⣀⡀⠄⣀⣀⣹⣦⣽⣾⣿⣿⣿⣿⣿⣿⣿⣿
*/
}
void kernelMaterial(UserMatHost host, UserMatDevice device, cudaStream_t stream)
{
const unsigned int block_size = 128;
const unsigned int num_blocks = calcNumBlocks(device.num_tasks, block_size);
cudaMemcpyToSymbolAsync(cmat, host.p_material, sizeof(double) * 200, 0, cudaMemcpyHostToDevice, stream);
kernel_sanders<<<num_blocks, block_size, 0, stream>>>(device);
}
} // namespace mat_kfc
CUDA kernel launch
mat_user()
__constant__ MatParams cmat;
void kernelMaterial(UserMatHost host, UserMatDevice device, cudaStream_t stream)
{
cudaMemcpyToSymbolAsync(cmat, host.p_material, sizeof(double) * 200, 0, cudaMemcpyHostToDevice, stream);
const unsigned int block_size = 128;
const unsigned int num_blocks = calcNumBlocks(data.num_tasks, block_size);
kernel_material<<<num_blocks, block_size, 0, stream>>>(device);
kernel_error_check(stream, __FILE__, __LINE__);
}
Copy material parameters array to constant memory space
To use the material parameters in the kernel, we copy the data from host to device, more specifically the __constant__
memory space. This is done using the cudaMemcpyToSymbolAsync function
(read more here).
Launching the kernel
Number of blocks and threads per block
Note: When launching a CUDA kernel, the total number of threads launched is determined by the product of the block size and the number of blocks. As a result, more threads are usually launched than necessary to process the data, since execution occurs at the block level rather than individual threads.
const unsigned int block_size = 128;
const unsigned int num_blocks = calcNumBlocks(data.num_tasks, block_size);
This code executes a GPU kernel to perform parallel computations on a large dataset. In CUDA, thread launching is
achieved by executing multiple blocks, each with a specified size. The block size should be a multiple of 32, with a
maximum of 1024 threads per block. Here, we use 128 threads per block and utilize the calcNumBlocks macro to calculate
the number of blocks required based on the block size and the number of tasks to be processed.
Kernel launch
Note: CUDA kernels are launched asynchronously, meaning that they do not wait for the kernel to finish before continuing. To ensure that the kernel is finished before continuing with the application, synchronization is required. This can be done by calling the
cudaStreamSynchronizefunction (read more here).
kernel_material<<<num_blocks, block_size, 0, stream>>>(device);
kernel_material<<< num_blocks, block_size, 0, stream >>>(); is the kernel launch statement. The syntax <<<...>>> is
called the execution configuration and it specifies how the kernel should be executed on the GPU. Any calls to a
__global__ function must specify the execution configuration.
To specify the execution configuration, you insert an expression in the format
<<< gridDim, blockDim, sharedMem, stream >>> immediately after the function name and before the opening parenthesis of
the argument, where:
gridDim: The number of blocks in the grid (calculated by thecalcNumBlocksmacro);blockDim: The number of threads in each block (128 in this example);sharedMem: The number of bytes in shared memory that is dynamically allocated per block;stream: A CUDA stream object that manages the execution of the kernel.streamis an optional argument which defaults to 0.
Error checking and synchronisation
kernel_error_check(stream, __FILE__, __LINE__);
Finally, we perform error checking using the kernel_error_check wrapper function. It checks for CUDA error codes,
printing out an error message with file and line number if one occurs. This function also invoke stream synchronization
(waits until the kernel has finished before continuing).
Thread identifier
CUDA threads are organized hierarchically into blocks, and these blocks are further organized into a grid. To efficiently map threads to data elements in parallel programming, CUDA provides a way to identify each thread uniquely using three built-in variables:
threadIdx.x: This variable identifies the thread's index within a block. In a 1D block,threadIdx.xranges from0toblockDim.x - 1.blockDim.x: This variable represents the total number of threads in each block along the x-axis. It is a constant value for all blocks within the grid.blockIdx.x: This variable identifies the block's index within the grid along the x-axis. In a 1D grid,blockIdx.xranges from0togridDim.x - 1, wheregridDim.xis the total number of blocks in the grid along the x-axis.
Compute a unique index
Each thread in the grid can be uniquely identified by a combination of these three variables. The most common way to compute a unique index for a thread in a 1D execution is by combining these values as follows:
int idx = threadIdx.x + blockIdx.x * blockDim.x;
Preventing out-of-bounds access
The number of threads you launch might not perfectly match the number of tasks or data elements you want to process. For
example, if you have a small number of tasks (num_tasks), but you launch a large number of threads, some of those
threads would end up with no valid data to process.
By checking if idx is greater than or equal to num_tasks, we can ensure that only the threads with valid work continue
processing. This check effectively "turns off" threads that don't have anything to do, allowing the program to run
safely and correctly.
if (idx >= num_tasks) return;
Memory access
General memory access guidelines
The preferred strategy is to load data from global memory at the beginning of a kernel and storing it locally, then writing it back at the end. This is mainly due to the significant difference in memory access speeds between different types of memory.
Global memory latency
Global memory is the main memory of the GPU, and it has a high latency, typically around 400 to 800 clock cycles. If every thread in a CUDA kernel were to access global memory frequently, it would lead to a significant slowdown due to this latency.
Local and shared memory speed
Local memory and shared memory are much faster than global memory. Shared memory, in particular, is on-chip and has a latency comparable to registers (typically around 1 clock cycle). By loading data from global memory into shared memory or registers at the beginning of a kernel, threads can then operate on this data using the much faster local or shared memory.
By working with data locally (in registers or shared memory) during most of the kernel's execution, we minimize the need to repeatedly access global memory. This not only reduces latency but also reduces contention and traffic on the memory bus. Writing back the results to global memory at the end of the kernel also allows for a more structured approach.
Further reading
For more information on memory access, see section 7.2. Variable Memory Space Specifiers:
Data access pattern
In the UserMatDevice, arrays with the dp_ prefix are device pointers, referring to memory allocated on the GPU.
Device pointers are passed as arguments to the kernel.
Per-thread arrays
Each thread is responsible for one element of the array, meaning that each thread operates on a unique index within the array.
// Read
int fail = dp_internal_fail[idx];
// Write
dp_internal_fail[idx] = fail;
Per-thread tensor arrays
In contrast to individual threads operating on single data points, threads process fixed-size blocks of data. Specifically, each thread handles six data points that collectively represent a tensor's components.
double stress[6];
// Read
for (int i = 0; i < 6; ++i) {
stress[i] = dp_stress[idx * 6 + i];
}
// Write
for (int i = 0; i < 6; ++i) {
dp_stress[idx * 6 + i] = stress[i];
}
Per-thread state variable arrays
Similar to per-thread tensor arrays, but with a fixed-size block length of num_history. This value is obtained from
the UserMatDevice properties struct.
// Read
double epsp = dp_history[idx * num_history + 0];
double damage = dp_history[idx * num_history + 1];
// Write
dp_history[idx * num_history + 0] = epsp;
dp_history[idx * num_history + 1] = damage;
Per-thread 3x3 matrix arrays
In this case, we employ a stride pattern. Stride access is a memory access pattern where threads access memory locations
at uniform intervals, rather than sequential locations. Use the stride variable defined in the
UserMatDevice properties struct.
double fmat[9];
// Read
for (int i = 0; i < 9; ++i) {
fmat[i] = dp_f_mat[stride * i + idx];
}
// Write
for (int i = 0; i < 9; ++i) {
dp_f_mat[stride * i + idx] = fmat[i];
}
Subgroup-shared array (element level)
// Read
int eroded = dp_eroded[idx / num_ip];
// Write
dp_eroded[idx / num_ip] = eroded;
Important: Accessing subgroup-shared arrays may be subject to race conditions when reading or writing.
Curve arrays
Curve arrays are used with the load_curve function to load a curve from a curve array into a variable. See the
load_curve for more information.
double sigy1 = mat::load_curve(dp_curve_data, dp_curve_val, cmat.idlc, epsp);
Property coupling
This section explains how to integrate user material models with damage, thermal and equation-of-state properties.
Damage properties
User material models can be combined with the *PROP_DAMAGE_[XXXX] commands, enabling the use of the build-in damage
failure criterion models. This feature is ideal if one want to use well known and defined damage models.
When a damage ID is specified, our software automatically uses the built-in implementation. The only requirements are
that the positions of certain properties are defined. Some the variables, such as pos_epsp and pos_temp are
overlapping with "Material properties". They are used for multiple purposes.
Class member variables
int pos_epsp
int pos_depsp
int pos_rate
int pos_temp
int pos_evol
int pos_damage1
int pos_damage2
Description
int pos_epsp
Location of epsp in the state variables array.
int pos_depsp
Location of depsp in the state variables array.
int pos_rate
Location of rate in the state variables array.
int pos_temp
Location of temp in the state variables array.
int pos_evol
Location of evol in the state variables array.
int pos_damage1
Location of damage1 in the state variables array.
int pos_damage2
Location of damage2 in the state variables array.
Use of thermal property
When a thermal command ID is detected in the input data, it will store the relevant thermal properties within the
material properties (host) array. The positions used for this are 70-74.
C++
double hexp = data.p_cmat[70]; // Heat expansion coefficient
double Cp = data.p_cmat[71]; // Thermal heat capacity
double hcond = data.p_cmat[72]; // Thermal conductivity
double k = data.p_cmat[73]; // Taylor-Quinney coefficient
double Tref = data.p_cmat[74]; // Reference temperature for heat expansion
CUDA
Since we're using __constant__ memory that resides in file scope, we can access the array directly from the kernel.
double hexp = cmat[70]; // Heat expansion coefficient
double Cp = cmat[71]; // Thermal heat capacity
double hcond = cmat[72]; // Thermal conductivity
double k = cmat[73]; // Taylor-Quinney coefficient
double Tref = cmat[74]; // Reference temperature for heat expansion
Use of equation-of-state
The equation-of-state (EOS) ID is used to determine the EOS used for the material model. We currently only support *EOS_GRUNEISEN.
GPU
The EOS calculates the pressure at a given volume. This pressure is needed by the material model. You must handle this pressure manually in your material model, replacing the pressure that's normally part of the stress tensor.
To check if EOS is enabled, use the eos parameter and the dp_eos_pressure array in the UserMatDevice struct.
// Pressure based on volumetric strain or EOS
if (eos == 0) {
pressure = -cmat.bulk * (dp_strain[offset + 0] + dp_strain[offset + 1] + dp_strain[offset + 2]);
} else {
pressure = dp_eos_pressure[idx];
}
CPU
The calculated pressure from EOS is applied automatically to the stress tensor and requires no actions in the implementation.
Examples
We provide four examples of material models, each with its own implementation.
Metal 🤘
The metal material model is designed to capture the elastic-plastic behavior of metallic materials. This model is well-suited for simulating materials such as steel, aluminium, and other common metallic alloys.
Command structure
*MAT_USER_1
mid, dens, young, pr, did, tid, eos
idlc, epsf
Source code
user-material/kernel_mat_metal.cuuser-material/kernel_mat_metal.huser-material/mat_metal.cppuser-material/mat_metal.h
Concrete
The concrete material model is a simplified version of the Holmquist-Johnson-Cook concrete model. The model includes a pressure dependent yield surface and inelastic compaction of the material.
Command structure
*MAT_USER_2
mid, dens, G
A, B, n, f_c, T, epsf_min, p_c, p_l
epsv_c_l, D1, D2, K
Source code
user-material/kernel_mat_concrete.cuuser-material/kernel_mat_concrete.huser-material/mat_concrete.cppuser-material/mat_concrete.h
Rubber
The rubber material model is a hyperelastic implementation that describes the non-linear elastic behavior of rubber-like materials. This is essentially the classical Mooney-Rivlin model.
Command structure
*MAT_USER_3
mid, dens, bulk
C1, C2
Source code
user-material/kernel_mat_rubber.cuuser-material/kernel_mat_rubber.huser-material/mat_rubber.cppuser-material/mat_rubber.h
Orthotropic
The orthotropic material model is specialised for materials that have different properties in three mutually perpendicular directions. It can be used for composite materials like wood and laminated composites, where directional dependency of mechanical properties is significant.
Command structure
*MAT_USER_4
mid, dens
E1, E2, G12, pr12, pr23
c, cdec, Xt, Xc, Yt, Yc, beta, S
erode, residual
Source code
user-material/kernel_mat_orthotropic.cuuser-material/kernel_mat_orthotropic.huser-material/mat_orthotropic.cppuser-material/mat_orthotropic.h
C++ API
These are the build-in API calls and data structures that are provided for the C++ interface.
Error check
This function performs error CUDA kernels, printing out an error message with file and line number if error occurs. It function also waits until the kernel has finished before continuing with the program.
void kernel_error_check(cudaStream_t stream, const char* file, int line)
stream
- CUDA stream.
file
- File name of source file.
line
- Line number.
Data structures
These data structures are mainly used in the interface between IMPETUS Solver and the user material interface.
UserMatCPU
Structure
int* p_internal_fail
int* p_face_update
int* p_curve_data
double* p_curve_val
double* p_strain
double* p_dstrain
double* p_stress
double* p_f_mat
double* p_u_mat
double* p_history
double* p_cmat
double* p_stiffness
double elem_volume0
double elem_volume
double ip_volume0
double ip_volume
int elem_id
int ip_id
int type
double dt1
double tt
int part_id
int* p_face_ids
double* p_face_area
double* p_face_normal
Variables
int* p_internal_fail
Integration point failure flag.
int* p_face_update
Force update of faces (used with erosion).
int* p_curve_data
Curve data, used together with load_curve.
double* p_curve_val
Curve data, used together with load_curve.
double* p_strain
Strain tensor.
double* p_dstrain
Strain increment tensor.
double* p_stress
Stress tensor.
double* p_f_mat
Deformation gradient matrix.
double* p_u_mat
Right stretch tensor.
double* p_history
History state variable array.
double* p_cmat
Material parameters array.
double* p_stiffness
Used for time step calculation.
0 = shear1 = bulk2 = xi3 = bfac
double elem_volume0
Inital volume of the element.
double elem_volume
Current volume of the element.
double ip_volume0
Initial volume of the integration point.
double ip_volume
Current volume of the integration point.
int elem_id
Element ID.
int ip_id
Integration point ID.
int type
Element type.
double dt1
Current time step size.
double tt
Current time.
int type
Part ID.
int* p_face_ids
Array of (maximum 3) face IDs tied to the current integration point. Non-zero value for each external face.
double* p_face_area
Area of the corresponding external face from p_face_ids. Use the index of p_face_ids where the value is not 0.
double* p_face_normal
Normal vector of the corresponding external face from p_face_ids. Use the index of p_face_ids where the value is not 0.
UserMatInitCPU
Structure
int* p_curve_data
double* p_curve_val
double* p_strain
double* p_stress
double* p_coord
double* p_history
double* p_cmat
double* p_stiffness
int elem_id
int ip_id
int type
int part_id
int* p_face_ids
double* p_face_area
double* p_face_normal
Variables
int* p_curve_data
Curve data, used together with load_curve.
double* p_curve_val
Curve data, used together with load_curve.
double* p_strain
Strain tensor.
double* p_stress
Stress tensor.
double* p_coord
Node coordinate.
double* p_history
History state variable array.
double* p_cmat
Material parameters array.
double* p_stiffness
Used for time step calculation.
0 = shear1 = bulk2 = xi3 = bfac
int elem_id
Element ID.
int ip_id
Integration point ID.
int type
Element type.
int part_id
Part ID.
int* p_face_ids
Array of (maximum 3) face IDs tied to the current integration point. Non-zero value for each external face.
double* p_face_area
Area of the corresponding external face from p_face_ids. Use the index of p_face_ids where the value is not 0.
double* p_face_normal
Normal vector of the corresponding external face from p_face_ids. Use the index of p_face_ids where the value is not 0.
UserMatDevice
Structure
int* dp_internal_fail
int* dp_eroded
int* dp_curve_data
double* dp_curve_val
double* dp_strain
double* dp_dstrain
double* dp_stress
double* dp_f_mat
double* dp_u_mat
double* dp_init_volume
double* dp_volume
double* dp_history
double* dp_eos_pressure
double* dp_cmat
double* dp_shear
double* dp_bulk
double* dp_xi
double* dp_bfac
int eos
int num_history
size_t num_ip
size_t num_tasks
size_t stride
double dt0
double dt
double tt
Variables
int* dp_internal_fail
Integration point failure flag.
Access pattern: Per-thread arrays
int* dp_eroded
Element eroded flag.
Access pattern: Subgroup-shared array (element level)
int* dp_curve_data
Curve data.
Access pattern: Curve arrays
double* dp_curve_val
Curve data.
Access pattern: Curve arrays
double* dp_strain
Strain tensor.
Access pattern: Per-thread tensor arrays
double* dp_dstrain
Strain increment tensor.
Access pattern: Per-thread tensor arrays
double* dp_stress
Stress tensor.
Access pattern: Per-thread tensor arrays
double* dp_f_mat
Deformation gradient matrix. This array is in stride format.
Access pattern: Per-thread 3x3 matrix arrays
double* p_u_mat
Right stretch tensor.
Access pattern: Per-thread tensor arrays
double* dp_init_volume
Initial volume of integration point.
Access pattern: Per-thread arrays
double* dp_volume
Current volume of integration point.
Access pattern: Per-thread arrays
double* dp_history
State variable array.
Access pattern: Per-thread arrays
double* dp_eos_pressure
If Grüneisen equation of state is active, use is the pressure.
Access pattern: Per-thread arrays
double* dp_cmat
Not used, currently a nullptr
Note: Reserved for future implementation.
double* dp_shear
Shear array used for critical time step calculation.
Access pattern: Per-thread arrays
double* dp_bulk
Bulk array used for critical time step calculation.
Access pattern: Per-thread arrays
double* dp_xi
Xi array used for critical time step calculation.
Access pattern: Per-thread arrays
double* dp_bfac
Bfac array used for critical time step calculation.
Access pattern: Per-thread arrays
int eos
Flag to indicate whether Grüneisen equation-of-state is active or not.
Use this flag to determine whether to use pressure from dp_eos_pressure or not.
int num_history
Number of state variables for the current material model.
size_t num_ip
Number of integration points of the current element.
size_t num_tasks
Number of threads to be launched for the current material model.
size_t stride
Stride used for arrays in stride format.
double dt0
Previous time step.
double dt1
Current time step.
double tt
Current time.
UserMatHost
Structure
double* p_material
Variables
double* p_material
Material parameters (host array).
C++ / CUDA API
These built-in APIs provide pre-defined functions for performing mathematical and material-related tasks. All functions
are declared as both __host__ and __device__, which means they will work on both C++ and CUDA. Implementation of the
functions are in mat_user_functions.h and are part of the mat namespace. To use them, prepend mat:: before each
function.
double sig_eff = mat::mat_effective_stress(stress);
Cauchy-Green tensor
The Cauchy-Green tensor describes the deformation of a material body, playing a key role in nonlinear elasticity and finite deformations.
template <typename T>
inline __host__ __device__ void cauchy_green_tensor(T* mat, T* c)
Parameters
mat
- Input: 3x3 matrix
c
- Output: C (symmetric tensor)
Cross product
This function will return the cross product between a1 & a2 in a3.
template <typename T>
inline __host__ __device__ void cross_product(T* a1, T* a2, T* a3)
Parameters
a1
- Input: vector 1
a2
- Input: vector 2
a3
- Output: vector 3
Effective strain rate
This function calculates and returns the effective strain rate.
template <typename T>
inline __host__ __device__ T mat_effective_strain_rate(T* strain_rate)
Parameters
strain_rate
- Input: Strain rate tensor
Returns
Effective strain rate
Effective stress
This function calculates and returns the effective stress.
template <typename T>
inline __host__ __device__ T mat_effective_stress(T* strain_rate)
Parameters
stress
- Input: Stress tensor
Returns
Effective stress
Eigenvalues
This function calculates and returns the eigenvalues eval
of the input tensor s.
template <typename T>
inline __host__ __device__ void calc_eigen_values(T* s, T* eval)
Parameters
s
- Input: Tensor to be decomposed into eigenvalues
eval
- Output: Eigenvalues of the input tensor
Eigenvectors
This function calculates and returns the eigenvectors evec
of the input tensor s and its eigenvalues eval.
template <typename T>
inline __host__ __device__ void calc_eigen_vectors(T* s, T* eval, T* evec)
Parameters
s
- Input: Tensor to be decomposed into eigenvalues
eval
- Input: Eigenvalues of the input tensor
eval
- Output: Eigenvectors of the input tensor
Invert 3x3 matrix
Invert a 3x3 matrix.
template <typename T>
inline __host__ __device__ double invert_3x3(T* mat, T* mat_inv)
Parameters
mat
- Input: 9-component 1D (3x3 matrix) in column major order
mat_inv
- Output: 9-component 1D (3x3 matrix) inverted matrix in column major order
Returns
Determinate of the input matrix.
Load Curve / Function
Returns the ordinate value of x in the defined id.
template <typename T1, typename T2>
inline __host__ __device__ double load_curve(T1* curve_data, T2* curve_val, int id, double x)
Parameters
curve_data
- Input: Array of curve data
curve_val
- Input: Array of curve values
id
- Input: Function or Curve ID
x
- Input: Value of
xin functionf(x)
Returns
The ordinate value of x.
Transform tensor
Transform tensor to a fiber direction.
template <typename T, int transpose>
inline __host__ __device__ void transform_tensor(T* s, T* r, T* b)
Note: This function requires specification of the template parameter
typename Tandint transpose.
Parameters
s
- Input: Tensor to be transformed
r
- Input: fiber direction
b
- Output: Transformed tensor
von Mises (step 1)
von Mises yield criterion (step 1)
template <typename T>
inline __host__ __device__ void mat_yield_von_mises_1(T shear, T sigy0, T* sig_eff, T* stress,
T* deps, int* yield)
Parameters
shear
- Input: Shear
sigy0
- Input: Current yield stress
sig_eff
- Output: Effective stress
stress
- Input: Stress tensor
deps
- Output: Strain increment
yield
- Output: Flag to indicate if we have plasticity or not
von Mises (step 2)
von Mises yield criterion (step 2)
template <typename T>
inline __host__ __device__ void mat_yield_von_mises_2(T shear, T sigy0, T sigy1, T* sig_eff, T* stress,
T* deps, T* epsp, T* depsp)
Parameters
shear
- Input: Shear
sigy0
- Input: Current yield stress
sigy1
- Input: Yield stress at strain increment
sig_eff
- Output: Effective stress
stress
- Input: Stress tensor
deps
- Output: Strain increment
epsp
- Output: Effective plastic strain
depsp
- Output: Strain incremment