Introduction

IMPETUS Solver provides a framework for custom material models through dynamic-link library (DLL) or shared object files under Linux. Users can compile their own user-defined materials, which are then linked to the software at the start of a simulation. This approach offers several advantages, including seamless compatibility with any version of the software, provided there have been no major changes to the interface.

Users can take advantage of the latest software updates without requiring significant modifications to their custom material models. By incorporating specialized knowledge and expertise into their simulations, users can leverage proprietary material properties or complex constitutive relationships that may not be readily available in standard material models.

The ability to link up to eight user-defined material models also provides flexibility in modeling complex systems comprising diverse materials. This feature enables users to tailor the solver's behavior to meet specific simulation requirements, thereby increasing its overall utility and effectiveness.

Licence

This IMPETUS User material package and documentation are released under the MIT License.

MIT License

Copyright (c) 2024 IMPETUS

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

Installation

To compile and build this framework, you need to have the necessary compilation tools installed. This includes standard compilers for the languages used in the framework.

Prerequisites

Windows

  • Visual Studio 2022 (with Visual C++)
  • NVIDIA CUDA Toolkit (CUDA 12.x)

Linux

  • GNU Make (make)
  • GNU Compiler Collection (GCC) (g++)
  • NVIDIA CUDA Toolkit (CUDA 12.x)

Structure

This framework contains the following folders and files:

user-material-models/
├── docs/
├── example-models/
├── output/
└── user-material/
    ├── gpu_error.cpp
    ├── gpu_error.h
    ├── interface.cpp
    ├── interface.h
    ├── kernel_mat_XXXX.cu
    ├── kernel_mat_XXXX.h
    ├── mat_user_defines.h
    ├── mat_user_functions.h
    ├── mat_user_structs_cpu.h
    ├── mat_user_structs_gpu.h
    ├── mat_user.h
    ├── mat_XXXX.cpp
    ├── mat_XXXX.h
    ├── user-material.vcxproj.filters
    └── user-material.vcxproj
├── user-material.sln
├── CMakeLists.txt
└── Makefile
  • The docs directory is where this documentation is located.

  • The example-models directory contains example models that can be used to test the provided user material models.

  • The output directory serves as the destination for the compiled DLL / shared object file if compilation is successful.

  • The user-material directory contains C++ and CUDA code.

  • The user-material.sln file is a Visual Studio 2022 solution file. It contains the compilation and execution configurations for the framework.

  • The CMakeLists.txt provides a set of rules for building the user material project using Cmake

  • The Makefile provides a set of rules for building the user material project under Linux. It contains the compilation and execution configurations for the framework.

Compilation

Windows

This guide explains how to compile a "Release" version in Visual Studio, producing a DLL file named user-material.dll in the output folder.

  1. Open the Solution in Visual Studio

    • Launch Visual Studio.
    • Navigate to File > Open > Project/Solution....
    • Browse and select the solution file (impetus-user-material.sln), then click Open.
  2. Select the "Release" Configuration

    • In the toolbar, find the "Solution Configurations" dropdown (usually located at the top of the window).
    • Click the dropdown and select Release. This setting ensures that the build is optimised for performance.
  3. Build the Solution

    • In the toolbar, click Build > Build Solution.
    • Visual Studio will compile the solution using the "Release" configuration.
  4. Verify the Output

    • After the build completes (if successful), navigate to the output directory output\.
    • Confirm that the user-material.dll file is present.

Linux

This guide explains how to compile your project on Linux using make. The output will be a shared library (DLL equivalent) named user-material.so located in the output directory.

  1. Navigate to the Project Directory

    • Open a terminal window.
    • Use the cd command to navigate to the directory containing your Makefile.
    cd /path/to/your/project
    
  2. Run the make Command

    • Type make in the terminal and press Enter. This command will compile your project using the settings defined in the Makefile.
    make
    
  3. Verify the Output

    • Once the build completes, open the output/ directory to confirm that the user-material.so file has been generated.
    ls output/
    
    • You should see user-material.so in the directory.

Usage

The parameter input for user defined material models are defined using the *MAT_USER_[X] command in the input deck. See the *MAT_USER_X documentation for more information.

Start simulation with your DLL / shared object in the GUI

  1. In IMPETUS Solver GUI, navigate to Solve and then in the Ribbon menu click on Start simulation button.
  2. Browse for your input file.
  3. Check the "Use defined material model" check box.
  4. Browse for your DLL / shared object file.
  5. Click Start button to start the simulation.

Start simulation with your DLL / shared object in CLI

To start simulations in CLI, use the impstart command, which is located in [installation_dir]/Solver/gui. This tool add simulations to the job queue and status can be seen in the Solve tab of IMPETUS Solver GUI.

Usage

impstart [options] INPUTFILE

Impstart is a command line tool and its options are obtained by executing it without arguments.

Use the --material-path option to specify where to find the material DLL / shared object file.

Example

impstart --material-path /path/to/user-material.so /path/to/input-file.k

Overview

Our user-defined material models framework is built on a C++ architecture, making it easy for users to develop their own custom material models. The framework provides a seamless interface for implementing materials behavior that is thoroughly integrated into the software.

Our framework also incorporates a GPU-accelerated implementation that leverages Nvidia's CUDA architecture. By offloading computationally intensive material model to the GPU, users can achieve substantial speedups over traditional CPU-based implementations. Our CUDA interface is easy to use and enables users to simulate complex material behavior at unprecedented scales and speeds.

MatUser class

To create your own custom material model using our framework, derive a new class from the MatUser base class, which serves as a foundation for defining user-defined materials. The MatUser class provides a basic structure through its virtual functions, enabling you to override and customize the behavior of your material model.

To get started, create a new C++ class that inherits from MatUser. For example:

#include "mat_user.h"

class MatMyModel : public MatUser {
    // Your implementation goes here...
};

Configuration

Each material must define its material configuration. This is done by setting the class member variables in the constructor of your material model class. See Material configuration section for more information on how to configure a material model.

Virtual functions to override

void runInit(UserMatInitCPU data) const

This function is invoked during the initialization phase of the simulation. You can use it to set up any necessary data structures or perform other initialization tasks, such as set the inital state of the material and, if required, calculate material parameters.


void runMat(UserMatCPU data) const

This function is executed each time step of the simulation if GPU acceleration in not available /enabled. It is effectively defining the behavior of the material. The constitutive response, failure criteria, and post-failure behavior are all critical components that can be tailored to capture the unique characteristics of various materials.


void runInitGPU(UserMatHost host, UserMatDevice device, cudaStream_t stream) const

Currently not used, reserved for future implementation


void runMatGPU(UserMatHost host, UserMatDevice device, cudaStream_t stream) const

This is similar to runCPU but for GPU accelerated computation. Unlike the CPU-based executions, where member functions can be directly defined within the class structure, CUDA kernel implementations must reside in separate files due to the language's inherent limitations. As a result, this function acts as an interface, invoking the relevant GPU-accelerated implementation from the external file.

Material model interface

The material model interface is a set of C functions that are used to configure, initialize and run a material model. These functions are implemented in interface.cpp.

Overview of interface functions

Each user material consists of five functions, which are all called by IMPETUS Solver:

  1. mat_user_X_config: Get the configuration parameters required for the material model.
  2. init_mat_user_X: Initializes the material state.
  3. mat_user_X: Evaluates the material response.
  4. init_mat_user_X_gpu: Currently not being used, but reserved for future GPU implementation.
  5. mat_user_X_gpu: Evaluates the material behavior on a GPU device.

Implementing Your Own User Material

  1. Create an instance of your derived MatUser class.
  2. Choose an available slot (e.g., mat_user_1).
  3. Link the the following functions with your material model's corresponding functions:
    • mat_user_X_config
    • init_mat_user_X
    • mat_user_X
    • init_mat_user_X_gpu
    • mat_user_X_gpu

Example Use Case

Suppose you want to link your user material (e.g. MatMetal). You would:

  1. Create an instance of MatMetal class (don't forget the namespace).
  2. Choose slot mat_user_1.
  3. Link each of the functions for this material ID with its corresponding class functions:
static user_metal::MatMetal metal;

void mat_user_1_config(UserMatConfig* config, UserMatProp* properties)
{
    metal.getConfig(config, properties);
}

void init_mat_user_1(UserMatInitCPU data)
{
    metal.runInit(data);
}

void mat_user_1(UserMatCPU data)
{
    metal.runMat(data);
}

void init_mat_user_1_gpu(UserMatHost host, UserMatDevice device, cudaStream_t stream)
{
    metal.runInitGPU(host, device, stream);
}

void mat_user_1_gpu(UserMatHost host, UserMatDevice device, cudaStream_t stream)
{
    metal.runMatGPU(host, device, stream);
}

Namespaces

Namespaces allow us to use the same name for different functions or variables in different contexts without causing ambiguity. For example, consider two material models: metal and rubber. Both may have a function named execute. Without namespaces, defining these functions with the same name in a global scope would lead to a naming conflict. By placing execute within respective namespaces, we can easily tell them apart.

Namespaces are a powerful tool that enables us to define multiple functions or variables with the same name without causing ambiguity. This is particularly useful when working with different material models, such as metal and rubber, which may share common functionality but require distinct implementations.

For instance, consider two material models: mat_metal and mat_rubber, each of which might have a function named execute. Without namespaces, defining these functions in the global scope would result in a naming conflict, and compilation errors.

By placing the execute function within its respective namespace, we can easily differentiate between the two implementations. This not only improves code organization and maintainability but also reduces the likelihood of bugs and errors.

namespace user_metal
{
    void execute()
    {
    }
} // namespace user_metal

namespace user_rubber
{
    void execute()
    {
    }
} // namespace user_rubber

int function()
{
    // Call metal execution
    user_metal::execute();

    // Call rubber execution
    user_rubber::execute();
}

Tensor and Matrix order

Tensor Notation

All symmetric tensors in IMPETUS Solver uses an alternative notation of the Voigt notation.

\( \begin{bmatrix} \sigma_{1} & \sigma_{6} & \sigma_{5} \\ & \sigma_{2} & \sigma_{4} \\ & & \sigma_{3} \end{bmatrix} \) Traditional voigt notation: 11, 22, 33, 23, 13, 12.

\( \begin{bmatrix} \sigma_{1} & \sigma_{4} & \sigma_{6} \\ & \sigma_{2} & \sigma_{5} \\ & & \sigma_{3} \end{bmatrix} \) Our definition is: 11, 22, 33, 12, 23, 13.

Matrix access order

Matrices are stored in column major order.

\( \begin{bmatrix} \sigma_{11} & \sigma_{12} & \sigma_{13} \\ \sigma_{21} & \sigma_{22} & \sigma_{23} \\ \sigma_{31} & \sigma_{32} & \sigma_{33} \end{bmatrix} \) The order of the matrix is: 11, 21, 31, 12, 22, 32, 13, 23, 33

Material configuration

To properly configure each material, you must implement its properties. This is achieved through the constructor of your material model class, where you can set the relevant class member variables.

The following sections will provide detailed information on the parameters that need to be configured for each material. By following these guidelines and setting the required parameters in the constructor of your material model class, you can ensure that each material is accurately represented and prepared for simulation.

Material properties

Class member variables

bool gpu_enabled;
int num_hist;
int pos_epsp;
int pos_temp;
int pos_texture;
int pos_damage;
int erode_flag;
int pos_erode_flag;
std::vector<int> curve;
std::vector<std::pair<int, std::string>> contour;

Description

bool gpu_enabled

Set to true if GPU acceleration should be used, otherwise false.


int num_hist

The number of state variables used by each material model. The state variable array (p_history) is a container where one can store data for each integration point. These variables can change throughout the simulation.


int pos_epsp

Specifies where in the state variables array (p_history) the effective plastic strain variable is located. This is also used for damage properties.


int pos_temp

Thermal definition from for example *INITIAL_TEMPERATURE can be obtained through the pos_temp value. The value set will specify where in the state variables array (p_history) to set the initial temperature. This is also used for damage properties.


int pos_texture

When implementing material models using fiber direction (for example fiber composites) with *INITIAL_MATERIAL_DIRECTION_WRAP, a texture location must be set. The location tells IMPETUS Solver where in the state variables array (p_history) to set the initial fiber direction. It requires 9 slots (3 vectors) in the state variables array (from the specified index). Make sure to allocate enough space when setting num_hist.


int pos_damage

The location in the state variable array (p_history) where damage is stored.


int erode_flg

Specifies how the material model should handle damage evolution.

  • 0: no element erosion / node splitting
  • 1: element erosion
  • 2: node splitting

int erode_flg_pos

Specifies the position of the erosion flag defined in material parameter list (p_cmat / cmat). If erode_flg_pos is set, the value in this position in cmat will be copied to cmat[80].


std::vector<int> curve

Specifies where in the material parameter list (p_cmat / cmat) the curve / function ID's are located. Up to 50 curves can be specified per material models. Use curve.emplace_back(id) to add new entries.


std::vector<std::pair<int, std::string>> contour

Set a specific state variable for output, which can then be visualised as a contour plot in the GUI. Use contour.emplace_back(std::make_pair(id, "Contour plot name")); to add new entries.

Stiffness properties

Note: This is optional. NO_REDUCTION is applied by default.

Setting the stiffness arrays and these options will allow for IMPETUS Solver to control the critical time step calculations. If set, reduction operations (per element) will be performed for the specific data array(s).

Options for each stiffness variable:

  • NO_REDUCTION: No changes are made to the original values
  • MAX_REDUCTION: The maximum value reduction is applied for each element
  • MIN_REDUCTION: The minimum value reduction is applied for each element

These reduction operations affect the following data arrays defined in UserMatDevice struct:

  • stiff_shear affects device.dp_shear
  • stiff_bulk affects device.dp_bulk
  • stiff_xi affects device.dp_xi
  • stiff_bfac affects device.dp_bfac

Class member variables

 int stiff_shear
 int stiff_bulk
 int stiff_xi
 int stiff_bfac

Description

bool stiff_shear

Setting this property will apply reduction operations on the data.dp_shear array.

Options:

  • NO_REDUCTION: No changes are made to the original values
  • MAX_REDUCTION: The maximum value reduction is applied for each element
  • MIN_REDUCTION: The minimum value reduction is applied for each element

int stiff_bulk

Setting this property will apply reduction operations on the data.dp_shear array.bulk` array.

Options:

  • NO_REDUCTION: No changes are made to the original values
  • MAX_REDUCTION: The maximum value reduction is applied for each element
  • MIN_REDUCTION: The minimum value reduction is applied for each element

int stiff_xi

Setting this property will apply reduction operations on the data.dp_shear array.xi` array.

Options:

  • NO_REDUCTION: No changes are made to the original values
  • MAX_REDUCTION: The maximum value reduction is applied for each element
  • MIN_REDUCTION: The minimum value reduction is applied for each element

int stiff_bfac

Setting this property will apply reduction operations on the data.dp_shear array.bfac` array.

Options:

  • NO_REDUCTION: No changes are made to the original values
  • MAX_REDUCTION: The maximum value reduction is applied for each element
  • MIN_REDUCTION: The minimum value reduction is applied for each element

Damage properties

User material models can be combined with the *PROP_DAMAGE_[XXXX] commands, enabling the use of the build-in damage failure criterion models. This feature is ideal if one want to use well known and defined damage models.

When a damage ID is specified, our software automatically uses the built-in implementation. The only requirements are that the positions of certain properties are defined. Some the variables, such as pos_epsp and pos_temp are overlapping with "Material properties". They are used for multiple purposes.

Class member variables

 int pos_epsp
 int pos_depsp
 int pos_rate
 int pos_temp
 int pos_evol
 int pos_damage1
 int pos_damage2

Description

int pos_epsp

Location of epsp in the state variables array.


int pos_depsp

Location of depsp in the state variables array.


int pos_rate

Location of rate in the state variables array.


int pos_temp

Location of temp in the state variables array.


int pos_evol

Location of evol in the state variables array.


int pos_damage1

Location of damage1 in the state variables array.


int pos_damage2

Location of damage2 in the state variables array.

Example

mat_metal.cpp

MatMetal::MatMetal()
{
    // Enable GPU implementation of this material model
    gpu_enabled = true;

    // Material configuration
    num_hist       =  6;    // Number of state variables per integration point
    pos_epsp       =  0;    // For prop damage, sensor output
    pos_temp       =  4;    // For prop damage, prop thermal
    pos_texture    = -1;    // For texture, not used by this material model
    pos_damage     =  1;    // For damage, position in the state variable array
    erode_flag     =  2;    // Erosion flag (0=no erosion, 1=element erosion, 2=node splitting)
    pos_erode_flag = -1;    // Not used, since this material model does not use element erosion
    pos_depsp      =  5;    // For prop damage
    pos_rate       = -1;    // For prop damage
    pos_evol       = -1;    // For prop damage
    pos_damage1    =  1;    // For prop damage
    pos_damage2    =  2;    // For prop damage

    // Curve definitions
    curve.emplace_back(6);     // A curve ID is present at position 6 in the material parameter array (cmat)

    // Add the location in the state variable array and the output name for contour plot
    contour.emplace_back(std::make_pair(0, "Effective plastic strain"));
    contour.emplace_back(std::make_pair(1, "Damage"));
    contour.emplace_back(std::make_pair(4, "Temperature"));

    // Stiffness reduction (for GPU only)
    stiff_shear = NO_REDUCTION;
    stiff_bulk  = NO_REDUCTION;
    stiff_xi    = NO_REDUCTION;
    stiff_bfac  = NO_REDUCTION;
}

Material initialization

This function is called during the initialization phase of the simulation. You can use it to set up any necessary data structures or perform other initialization tasks, such as the inital state of the material and, if required, calculate material parameters.

This intialization function is called for both CPU and GPU implementations. Future implementations will enable the runInitGPU function when utilizing GPU acceleration.

runInit()

void runInit(UserMatInitCPU data) const

data

Example

void MatMetal::runInit(UserMatInitCPU data) const
{
    double young = data.p_cmat[1];
    double pr    = data.p_cmat[2];
    double bulk  = young / (3.0 * (1.0 - 2.0 * pr));
    double G     = young / (2.0 * (1.0 + pr));

    // Set material properties
    data.p_cmat[43] = bulk;

    // Stress-strain curve
    int idlc = static_cast<int>(round(data.p_cmat[6]));

    // Yield stress
    double sigy0 = data.p_history[3];

    // Set initial yield stress (if not already defined)
    if (sigy0 == 0.0) {
        // load material properties from database
        sigy0 = mat::load_curve(data.p_curve_data, data.p_curve_val, idlc, 0.0);
        data.p_history[3] = sigy0;
    }

    // Stiffness matrix
    data.p_stiffness[0] = G;
    data.p_stiffness[1] = bulk;
}

Material response

Material response is the central mechanism for defining a material's complex behavior under varying conditions. Key aspects include the constitutive response, which governs how the material reacts to stress and strain and the failure criteria. These components can be tailored to accurately capture the unique properties of your needs.

This framework offers two interfaces, one for CPU (C++) and another for GPU (CUDA).

CPU implementation

This function is called for each integration point during each time step of the simulation on the CPU. It defines the behavior of the material, such as its constitutive response, failure criteria, and post-failure behavior.

runMat()

void runMat(UserMatCPU data) const

data

  • Type: UserMatCPU
  • Description: User material data structure. See UserMatCPU for more information on the data structure.

Example

void MatRubber::runMat(UserMatCPU data) const
{
    // Load material parameters
    double C1   = data.p_cmat[6];
    double C2   = data.p_cmat[7];
    double bulk = data.p_cmat[43];

    double fmat[9], c[6], eval[3];

    // Load deformation gradient matrix
    for (int i = 0; i < 9; ++i) {
        fmat[i] = data.p_f_mat[i];
    }

    // Volumetric strain and pressure
    double evol = data.p_strain[0] + data.p_strain[1] + data.p_strain[2];
    double pressure = -bulk * evol;

    mat::cauchy_green_tensor(fmat, c);

    // Re-using fmat array as eigen vector array since fmat array is no longer needed
    double* evec = fmat;

    // Get principal stretches (squared)
    mat::calc_eigen_values(c, eval);
    mat::calc_eigen_vectors(c, eval, evec);

    double alpha = 2.0 * C1 * (eval[0] - eval[2]) - 2.0 * C2 * (1.0 / eval[0] - 1.0 / eval[2]);
    double beta = 2.0 * C1 * (eval[1] - eval[2]) - 2.0 * C2 * (1.0 / eval[1] - 1.0 / eval[2]);

    eval[0] = (2.0 * alpha - 1.0 * beta) / 3.0 - pressure;
    eval[1] = (-1.0 * alpha + 2.0 * beta) / 3.0 - pressure;
    eval[2] = (-1.0 * alpha - 1.0 * beta) / 3.0 - pressure;

    // Re-using c array as local stress array since c array is not longer needed
    double* stress = c;
    for (int i = 0; i < 6; ++i) {
        stress[i] = 0.0;
    }

    for (int i = 0; i < 3; ++i) {
        int vector_offset = i * 3;
        stress[0] += eval[i] * pow(evec[vector_offset + 0], 2);
        stress[1] += eval[i] * pow(evec[vector_offset + 1], 2);
        stress[2] += eval[i] * pow(evec[vector_offset + 2], 2);
        stress[3] += eval[i] * evec[vector_offset + 0] * evec[vector_offset + 1];
        stress[4] += eval[i] * evec[vector_offset + 1] * evec[vector_offset + 2];
        stress[5] += eval[i] * evec[vector_offset + 2] * evec[vector_offset + 0];
    }

    // Save eigen values for contour plot output
    for (int i = 0; i < 3; ++i) {
        data.p_history[i] = eval[i];
    }

    // Save new stress
    for (int i = 0; i < 6; ++i) {
        data.p_stress[i] = stress[i];
    }
}

CPU memory access

In order to maximize performance, consider adopting a data transfer approach that involves loading data from heap memory (data arrays from UserMatCPU) into local stack variables at beginning of a function. By doing so, you can take advantage of the much faster memory access speeds from the stack compared to accessing data stored on the heap. This strategy allows for efficient processing and minimizes overhead associated with frequent heap memory accesses.

Data access pattern

In the UserMatCPU, arrays with the p_ prefix are data pointers to arrays used for calculation of material response.

Since arrays are structured per integration point for CPU implementation, there is no need to calculate global offsets. For instance, p_strain[0] through p_strain[5] relate to the current integration point.

GPU implementation

This function is called for each integration point during each time step of the simulation on the CPU. It defines the behavior of the material, such as its constitutive response, failure criteria, and post-failure behavior.

CUDA kernels (GPU code) cannot be implemented as a member functions in classes. The solution here is to implement them in separate files but invoked from the class function. To protecct against naming conflicts, we define the same namespace for both the class and the kernel that belong to each other.

runMatGPU()

void runMatGPU(UserMatHost host, UserMatDevice device, cudaStream_t stream) const

host

  • Type: UserMatHost
  • Description: User material host data structure. See UserMatHost for more information on the data structure.

device

  • Type: UserMatDevice
  • Description: User material device data structure. See UserMatDevice for more information on the data structure.

stream

  • Type: cudaStream_t
  • Description: CUDA stream used by the material to perform its calculations.

Example

The following example calls the function mat_user(), which itself is part of the same namespace. We do not have to explicitly call the namespace, since both the class already belongs to the namespace. In our provided sample, the CUDA implementation is in a file called kernel_mat_rubber.cu.

void MatRubber::runMatGPU(UserMatHost host, UserMatDevice device, cudaStream_t stream) const
{
    mat_user(host, device, stream);
}

CUDA programming model

Kernels

Functions marked with __global__ are kernels. A kernel is a special type of function that runs on the GPU and is executed in parallel by many threads. Kernels are the core mechanism by which you offload computation from the CPU (host) to the GPU (device).

Functions

A __device__ function in CUDA is a type of function that is executed on the GPU and is callable only from other GPU functions, such as __global__ or other __device__ functions. These functions cannot be called directly from the host (CPU) code, unless they are also marked with __host__. This will make them callable from both the host and the device.

Example

__device__
int sum(int x, int y) {
    return x + y;
}

__global__
void kernel_add_vectors(int* a, int* b, int *c, int num_tasks) {
    int idx = threadIdx.x + blockIdx.x * blockDim.x;
    if (idx >= num_tasks) return;

    c[idx] = sum(a[idx], b[idx]);
}

More information

For a more comprehensive understanding of CUDA, we recommend exploring the official NVIDIA CUDA documentation, which provides detailed guides, tutorials and reference materials.

CUDA kernel files

The CUDA API is an extension of the C / C++ programming language and just like C / C++, CUDA uses header files. Each material model consists of two files, both defined within the same namespace:

  • kernel_mat_XXXX.h (header)
  • kernel_mat_XXXX.cu (source)

Below is an example of our typical implementation, with logical code omitted for clarity and to focus on the interface. Further details will be provided in subsequent subsections.

kernel_mat_XXXX.h

The header file introduces a namespace and provides forward declarations of its host functions, enabling other files like to call them.

namespace mat_kfc
{

void mat_user(UserMatHost host, UserMatDevice device, cudaStream_t stream);

}

kernel_mat_XXXX.cu

The source file is wrapped in the same namespace as the header file. It contains the CUDA kernel function with the __global__ prefix, and a host function. The host function mat_user is used to launch the kernel function.

Additionally, the file includes a __constant__ declaration of the the material properties struct. By utilizing the __constant__ memory space, we effectively cache frequently accessed data (such as material parameters).

namespace mat_kfc
{
__constant__ double cmat[200];

__global__
void kernel_sanders(UserMatDevice device)
{
    /*
    ⣿⣿⣿⣿⣿⣿⣿⡿⢟⣋⣭⣥⣭⣭⣍⡉⠉⠙⠛⠻⠿⣿⣿⣿⣿⣿⣿⣿⣿⣿
    ⣿⣿⣿⣿⣿⡏⠁⠠⠶⠛⠻⠿⣿⣿⣿⣿⣷⡄⠄⠄⠄⠄⠉⠻⢿⣿⣿⣿⣿⣿
    ⣿⣿⣿⣿⠟⠄⢀⡴⢊⣴⣶⣶⣾⣿⣿⣿⣿⢿⡄⠄⠄⠄⠄⠄⠄⠙⢿⣿⣿⣿
    ⣿⣿⡿⠁⠄⠙⡟⠁⣾⣿⣿⣿⣿⣿⣿⣿⣿⣎⠃⠄⠄⠄⠄⠄⠄⠄⠈⢻⣿⣿
    ⣿⡟⠄⠄⠄⠄⡇⠰⠟⠛⠛⠿⠿⠟⢋⢉⠍⢩⣠⡀⠄⠄⠄⠄⠄⠄⠄⠄⢹⣿
    ⣿⠁⠄⠄⠄⠄⠰⠁⣑⣬⣤⡀⣾⣦⣶⣾⣖⣼⣿⠁⠄⠄⠄⠄⠄⠄⠄⠄⠄⢿
    ⡏⠄⠄⠄⠄⠄⠄⠄⠨⣿⠟⠰⠻⠿⣣⡙⠿⣿⠋⠄⢀⡀⣀⠄⣀⣀⢀⣀⣀⢸
    ⡇⠄⠄⠄⠄⠄⠄⠄⠄⣠⠄⠚⠛⠉⠭⣉⢁⣿⠄⢀⡿⢾⣅⢸⡗⠂⢿⣀⡀⢸
    ⡇⠄⠄⠄⠄⠄⠄⠄⠄⠘⢧⣄⠄⣻⣿⣿⣾⠟⣀⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⢸
    ⣿⠄⠄⠄⠄⠄⠄⠄⠄⢠⡀⠄⠄⣿⣿⠟⢁⣴⣿⢸⡄⠄⢦⣤⣤⣤⣤⣄⡀⣼
    ⣿⣧⠄⠄⠄⠄⠄⠄⢠⡸⣿⠒⠄⠈⠛⠄⠁⢹⡟⣾⡇⠄⠈⢿⣿⣿⣿⣿⣿⣿
    ⣿⣿⣧⣠⣴⣦⠄⠄⢸⣷⡹⣧⣖⡔⠄⠱⣮⣻⣷⣿⣿⠄⠄⠘⣿⣿⣿⣿⣿⣿
    ⣿⣿⣿⣿⣿⡇⠄⠄⠸⠿⠿⠚⠛⠁⠂⠄⠉⠉⡅⢰⡆⢰⡄⠄⠘⣿⣿⣿⣿⣿
    ⣿⣿⣿⣿⣿⣷⣤⡀⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⣿⠄⣷⠘⣧⣠⣾⣿⣿⣿⣿⣿
    ⣿⣿⣿⣿⣿⣿⣿⣿⣷⣦⣤⣄⣀⣀⡀⠄⣀⣀⣹⣦⣽⣾⣿⣿⣿⣿⣿⣿⣿⣿
    */
}

void kernelMaterial(UserMatHost host, UserMatDevice device, cudaStream_t stream)
{
    const unsigned int block_size = 128;
    const unsigned int num_blocks = calcNumBlocks(device.num_tasks, block_size);

    cudaMemcpyToSymbolAsync(cmat, host.p_material, sizeof(double) * 200, 0, cudaMemcpyHostToDevice, stream);
    kernel_sanders<<<num_blocks, block_size, 0, stream>>>(device);
}
} // namespace mat_kfc

CUDA kernel launch

mat_user()

__constant__ MatParams cmat;

void kernelMaterial(UserMatHost host, UserMatDevice device, cudaStream_t stream)
{
    cudaMemcpyToSymbolAsync(cmat, host.p_material, sizeof(double) * 200, 0, cudaMemcpyHostToDevice, stream);

    const unsigned int block_size = 128;
    const unsigned int num_blocks = calcNumBlocks(data.num_tasks, block_size);

    kernel_material<<<num_blocks, block_size, 0, stream>>>(device);
    kernel_error_check(stream, __FILE__, __LINE__);
}

Copy material parameters array to constant memory space

To use the material parameters in the kernel, we copy the data from host to device, more specifically the __constant__ memory space. This is done using the cudaMemcpyToSymbolAsync function (read more here).

Launching the kernel

Number of blocks and threads per block

Note: When launching a CUDA kernel, the total number of threads launched is determined by the product of the block size and the number of blocks. As a result, more threads are usually launched than necessary to process the data, since execution occurs at the block level rather than individual threads.

const unsigned int block_size = 128;
const unsigned int num_blocks = calcNumBlocks(data.num_tasks, block_size);

This code executes a GPU kernel to perform parallel computations on a large dataset. In CUDA, thread launching is achieved by executing multiple blocks, each with a specified size. The block size should be a multiple of 32, with a maximum of 1024 threads per block. Here, we use 128 threads per block and utilize the calcNumBlocks macro to calculate the number of blocks required based on the block size and the number of tasks to be processed.

Kernel launch

Note: CUDA kernels are launched asynchronously, meaning that they do not wait for the kernel to finish before continuing. To ensure that the kernel is finished before continuing with the application, synchronization is required. This can be done by calling the cudaStreamSynchronize function (read more here).

kernel_material<<<num_blocks, block_size, 0, stream>>>(device);

kernel_material<<< num_blocks, block_size, 0, stream >>>(); is the kernel launch statement. The syntax <<<...>>> is called the execution configuration and it specifies how the kernel should be executed on the GPU. Any calls to a __global__ function must specify the execution configuration.

To specify the execution configuration, you insert an expression in the format <<< gridDim, blockDim, sharedMem, stream >>> immediately after the function name and before the opening parenthesis of the argument, where:

  • gridDim: The number of blocks in the grid (calculated by the calcNumBlocks macro);
  • blockDim: The number of threads in each block (128 in this example);
  • sharedMem: The number of bytes in shared memory that is dynamically allocated per block;
  • stream: A CUDA stream object that manages the execution of the kernel. stream is an optional argument which defaults to 0.
Error checking and synchronisation
kernel_error_check(stream, __FILE__, __LINE__);

Finally, we perform error checking using the kernel_error_check wrapper function. It checks for CUDA error codes, printing out an error message with file and line number if one occurs. This function also invoke stream synchronization (waits until the kernel has finished before continuing).

Thread identifier

CUDA threads are organized hierarchically into blocks, and these blocks are further organized into a grid. To efficiently map threads to data elements in parallel programming, CUDA provides a way to identify each thread uniquely using three built-in variables:

  1. threadIdx.x: This variable identifies the thread's index within a block. In a 1D block, threadIdx.x ranges from 0 to blockDim.x - 1.
  2. blockDim.x: This variable represents the total number of threads in each block along the x-axis. It is a constant value for all blocks within the grid.
  3. blockIdx.x: This variable identifies the block's index within the grid along the x-axis. In a 1D grid, blockIdx.x ranges from 0 to gridDim.x - 1, where gridDim.x is the total number of blocks in the grid along the x-axis.

Compute a unique index

Each thread in the grid can be uniquely identified by a combination of these three variables. The most common way to compute a unique index for a thread in a 1D execution is by combining these values as follows:

int idx = threadIdx.x + blockIdx.x * blockDim.x;

Preventing out-of-bounds access

The number of threads you launch might not perfectly match the number of tasks or data elements you want to process. For example, if you have a small number of tasks (num_tasks), but you launch a large number of threads, some of those threads would end up with no valid data to process.

By checking if idx is greater than or equal to num_tasks, we can ensure that only the threads with valid work continue processing. This check effectively "turns off" threads that don't have anything to do, allowing the program to run safely and correctly.

if (idx >= num_tasks) return;

Memory access

General memory access guidelines

The preferred strategy is to load data from global memory at the beginning of a kernel and storing it locally, then writing it back at the end. This is mainly due to the significant difference in memory access speeds between different types of memory.

Global memory latency

Global memory is the main memory of the GPU, and it has a high latency, typically around 400 to 800 clock cycles. If every thread in a CUDA kernel were to access global memory frequently, it would lead to a significant slowdown due to this latency.

Local and shared memory speed

Local memory and shared memory are much faster than global memory. Shared memory, in particular, is on-chip and has a latency comparable to registers (typically around 1 clock cycle). By loading data from global memory into shared memory or registers at the beginning of a kernel, threads can then operate on this data using the much faster local or shared memory.

By working with data locally (in registers or shared memory) during most of the kernel's execution, we minimize the need to repeatedly access global memory. This not only reduces latency but also reduces contention and traffic on the memory bus. Writing back the results to global memory at the end of the kernel also allows for a more structured approach.

Further reading

For more information on memory access, see section 7.2. Variable Memory Space Specifiers:

Data access pattern

In the UserMatDevice, arrays with the dp_ prefix are device pointers, referring to memory allocated on the GPU. Device pointers are passed as arguments to the kernel.

Per-thread arrays

Each thread is responsible for one element of the array, meaning that each thread operates on a unique index within the array.

// Read
int fail = dp_internal_fail[idx];

// Write
dp_internal_fail[idx] = fail;

Per-thread tensor arrays

In contrast to individual threads operating on single data points, threads process fixed-size blocks of data. Specifically, each thread handles six data points that collectively represent a tensor's components.

double stress[6];

// Read
for (int i = 0; i < 6; ++i) {
    stress[i] = dp_stress[idx * 6 + i];
}

// Write
for (int i = 0; i < 6; ++i) {
    dp_stress[idx * 6 + i] = stress[i];
}

Per-thread state variable arrays

Similar to per-thread tensor arrays, but with a fixed-size block length of num_history. This value is obtained from the UserMatDevice properties struct.

// Read
double epsp   = dp_history[idx * num_history + 0];
double damage = dp_history[idx * num_history + 1];

// Write
dp_history[idx * num_history + 0] = epsp;
dp_history[idx * num_history + 1] = damage;

Per-thread 3x3 matrix arrays

In this case, we employ a stride pattern. Stride access is a memory access pattern where threads access memory locations at uniform intervals, rather than sequential locations. Use the stride variable defined in the UserMatDevice properties struct.

double fmat[9];

// Read
for (int i = 0; i < 9; ++i) {
    fmat[i] = dp_f_mat[stride * i + idx];
}

// Write
for (int i = 0; i < 9; ++i) {
    dp_f_mat[stride * i + idx] = fmat[i];
}

Subgroup-shared array (element level)

// Read
int eroded = dp_eroded[idx / num_ip];

// Write
dp_eroded[idx / num_ip] = eroded;

Important: Accessing subgroup-shared arrays may be subject to race conditions when reading or writing.

Curve arrays

Curve arrays are used with the load_curve function to load a curve from a curve array into a variable. See the load_curve for more information.

double sigy1 = mat::load_curve(dp_curve_data, dp_curve_val, cmat.idlc, epsp);

Property coupling

This section explains how to integrate user material models with damage, thermal and equation-of-state properties.

Damage properties

User material models can be combined with the *PROP_DAMAGE_[XXXX] commands, enabling the use of the build-in damage failure criterion models. This feature is ideal if one want to use well known and defined damage models.

When a damage ID is specified, our software automatically uses the built-in implementation. The only requirements are that the positions of certain properties are defined. Some the variables, such as pos_epsp and pos_temp are overlapping with "Material properties". They are used for multiple purposes.

Class member variables

 int pos_epsp
 int pos_depsp
 int pos_rate
 int pos_temp
 int pos_evol
 int pos_damage1
 int pos_damage2

Description

int pos_epsp

Location of epsp in the state variables array.


int pos_depsp

Location of depsp in the state variables array.


int pos_rate

Location of rate in the state variables array.


int pos_temp

Location of temp in the state variables array.


int pos_evol

Location of evol in the state variables array.


int pos_damage1

Location of damage1 in the state variables array.


int pos_damage2

Location of damage2 in the state variables array.

Use of thermal property

When a thermal command ID is detected in the input data, it will store the relevant thermal properties within the material properties (host) array. The positions used for this are 70-74.

C++

double hexp  = data.p_cmat[70];    // Heat expansion coefficient
double Cp    = data.p_cmat[71];    // Thermal heat capacity
double hcond = data.p_cmat[72];    // Thermal conductivity
double k     = data.p_cmat[73];    // Taylor-Quinney coefficient
double Tref  = data.p_cmat[74];    // Reference temperature for heat expansion

CUDA

Since we're using __constant__ memory that resides in file scope, we can access the array directly from the kernel.

double hexp  = cmat[70];    // Heat expansion coefficient
double Cp    = cmat[71];    // Thermal heat capacity
double hcond = cmat[72];    // Thermal conductivity
double k     = cmat[73];    // Taylor-Quinney coefficient
double Tref  = cmat[74];    // Reference temperature for heat expansion

Use of equation-of-state

The equation-of-state (EOS) ID is used to determine the EOS used for the material model. We currently only support *EOS_GRUNEISEN.

GPU

The EOS calculates the pressure at a given volume. This pressure is needed by the material model. You must handle this pressure manually in your material model, replacing the pressure that's normally part of the stress tensor.

To check if EOS is enabled, use the eos parameter and the dp_eos_pressure array in the UserMatDevice struct.

    // Pressure based on volumetric strain or EOS
    if (eos == 0) {
        pressure = -cmat.bulk * (dp_strain[offset + 0] + dp_strain[offset + 1] + dp_strain[offset + 2]);
    } else {
        pressure = dp_eos_pressure[idx];
    }

CPU

The calculated pressure from EOS is applied automatically to the stress tensor and requires no actions in the implementation.

Examples

We provide four examples of material models, each with its own implementation.

Metal 🤘

The metal material model is designed to capture the elastic-plastic behavior of metallic materials. This model is well-suited for simulating materials such as steel, aluminium, and other common metallic alloys.

Command structure
*MAT_USER_1
mid, dens, young, pr, did, tid, eos
idlc, epsf
Source code
  • user-material/kernel_mat_metal.cu
  • user-material/kernel_mat_metal.h
  • user-material/mat_metal.cpp
  • user-material/mat_metal.h

Concrete

The concrete material model is a simplified version of the Holmquist-Johnson-Cook concrete model. The model includes a pressure dependent yield surface and inelastic compaction of the material.

Command structure
*MAT_USER_2
mid, dens, G
A, B, n, f_c, T, epsf_min, p_c, p_l
epsv_c_l, D1, D2, K
Source code
  • user-material/kernel_mat_concrete.cu
  • user-material/kernel_mat_concrete.h
  • user-material/mat_concrete.cpp
  • user-material/mat_concrete.h

Rubber

The rubber material model is a hyperelastic implementation that describes the non-linear elastic behavior of rubber-like materials. This is essentially the classical Mooney-Rivlin model.

Command structure
*MAT_USER_3
mid, dens, bulk
C1, C2
Source code
  • user-material/kernel_mat_rubber.cu
  • user-material/kernel_mat_rubber.h
  • user-material/mat_rubber.cpp
  • user-material/mat_rubber.h

Orthotropic

The orthotropic material model is specialised for materials that have different properties in three mutually perpendicular directions. It can be used for composite materials like wood and laminated composites, where directional dependency of mechanical properties is significant.

Command structure
*MAT_USER_4
mid, dens
E1, E2, G12, pr12, pr23
c, cdec, Xt, Xc, Yt, Yc, beta, S
erode, residual
Source code
  • user-material/kernel_mat_orthotropic.cu
  • user-material/kernel_mat_orthotropic.h
  • user-material/mat_orthotropic.cpp
  • user-material/mat_orthotropic.h

C++ API

These are the build-in API calls and data structures that are provided for the C++ interface.

Error check

This function performs error CUDA kernels, printing out an error message with file and line number if error occurs. It function also waits until the kernel has finished before continuing with the program.

void kernel_error_check(cudaStream_t stream, const char* file, int line)

stream

  • CUDA stream.

file

  • File name of source file.

line

  • Line number.

Data structures

These data structures are mainly used in the interface between IMPETUS Solver and the user material interface.

UserMatCPU

Structure

   int* p_internal_fail
   int* p_face_update
   int* p_curve_data
double* p_curve_val
double* p_strain
double* p_dstrain
double* p_stress
double* p_f_mat
double* p_u_mat
double* p_history
double* p_cmat
double* p_stiffness
 double elem_volume0
 double elem_volume
 double ip_volume0
 double ip_volume
    int elem_id
    int ip_id
    int type
 double dt1
 double tt
    int part_id
   int* p_face_ids
double* p_face_area
double* p_face_normal

Variables

int* p_internal_fail

Integration point failure flag.


int* p_face_update

Force update of faces (used with erosion).


int* p_curve_data

Curve data, used together with load_curve.


double* p_curve_val

Curve data, used together with load_curve.


double* p_strain

Strain tensor.


double* p_dstrain

Strain increment tensor.


double* p_stress

Stress tensor.


double* p_f_mat

Deformation gradient matrix.


double* p_u_mat

Right stretch tensor.


double* p_history

History state variable array.


double* p_cmat

Material parameters array.


double* p_stiffness

Used for time step calculation.

  • 0 = shear
  • 1 = bulk
  • 2 = xi
  • 3 = bfac

double elem_volume0

Inital volume of the element.


double elem_volume

Current volume of the element.


double ip_volume0

Initial volume of the integration point.


double ip_volume

Current volume of the integration point.


int elem_id

Element ID.


int ip_id

Integration point ID.


int type

Element type.


double dt1

Current time step size.


double tt

Current time.


int type

Part ID.


int* p_face_ids

Array of (maximum 3) face IDs tied to the current integration point. Non-zero value for each external face.


double* p_face_area

Area of the corresponding external face from p_face_ids. Use the index of p_face_ids where the value is not 0.


double* p_face_normal

Normal vector of the corresponding external face from p_face_ids. Use the index of p_face_ids where the value is not 0.

UserMatInitCPU

Structure

   int* p_curve_data
double* p_curve_val
double* p_strain
double* p_stress
double* p_coord
double* p_history
double* p_cmat
double* p_stiffness
    int elem_id
    int ip_id
    int type
    int part_id
   int* p_face_ids
double* p_face_area
double* p_face_normal

Variables

int* p_curve_data

Curve data, used together with load_curve.


double* p_curve_val

Curve data, used together with load_curve.


double* p_strain

Strain tensor.


double* p_stress

Stress tensor.


double* p_coord

Node coordinate.


double* p_history

History state variable array.


double* p_cmat

Material parameters array.


double* p_stiffness

Used for time step calculation.

  • 0 = shear
  • 1 = bulk
  • 2 = xi
  • 3 = bfac

int elem_id

Element ID.


int ip_id

Integration point ID.


int type

Element type.


int part_id

Part ID.


int* p_face_ids

Array of (maximum 3) face IDs tied to the current integration point. Non-zero value for each external face.


double* p_face_area

Area of the corresponding external face from p_face_ids. Use the index of p_face_ids where the value is not 0.


double* p_face_normal

Normal vector of the corresponding external face from p_face_ids. Use the index of p_face_ids where the value is not 0.

UserMatDevice

Structure

   int* dp_internal_fail
   int* dp_eroded
   int* dp_curve_data
double* dp_curve_val
double* dp_strain
double* dp_dstrain
double* dp_stress
double* dp_f_mat
double* dp_u_mat
double* dp_init_volume
double* dp_volume
double* dp_history
double* dp_eos_pressure
double* dp_cmat
double* dp_shear
double* dp_bulk
double* dp_xi
double* dp_bfac
    int eos
    int num_history
 size_t num_ip
 size_t num_tasks
 size_t stride
 double dt0
 double dt
 double tt

Variables

int* dp_internal_fail

Integration point failure flag.

Access pattern: Per-thread arrays


int* dp_eroded

Element eroded flag.

Access pattern: Subgroup-shared array (element level)


int* dp_curve_data

Curve data.

Access pattern: Curve arrays


double* dp_curve_val

Curve data.

Access pattern: Curve arrays


double* dp_strain

Strain tensor.

Access pattern: Per-thread tensor arrays


double* dp_dstrain

Strain increment tensor.

Access pattern: Per-thread tensor arrays


double* dp_stress

Stress tensor.

Access pattern: Per-thread tensor arrays


double* dp_f_mat

Deformation gradient matrix. This array is in stride format.

Access pattern: Per-thread 3x3 matrix arrays


double* p_u_mat

Right stretch tensor.

Access pattern: Per-thread tensor arrays


double* dp_init_volume

Initial volume of integration point.

Access pattern: Per-thread arrays


double* dp_volume

Current volume of integration point.

Access pattern: Per-thread arrays


double* dp_history

State variable array.

Access pattern: Per-thread arrays


double* dp_eos_pressure

If Grüneisen equation of state is active, use is the pressure.

Access pattern: Per-thread arrays


double* dp_cmat

Not used, currently a nullptr

Note: Reserved for future implementation.


double* dp_shear

Shear array used for critical time step calculation.

Access pattern: Per-thread arrays


double* dp_bulk

Bulk array used for critical time step calculation.

Access pattern: Per-thread arrays


double* dp_xi

Xi array used for critical time step calculation.

Access pattern: Per-thread arrays


double* dp_bfac

Bfac array used for critical time step calculation.

Access pattern: Per-thread arrays


int eos

Flag to indicate whether Grüneisen equation-of-state is active or not.

Use this flag to determine whether to use pressure from dp_eos_pressure or not.


int num_history

Number of state variables for the current material model.


size_t num_ip

Number of integration points of the current element.


size_t num_tasks

Number of threads to be launched for the current material model.


size_t stride

Stride used for arrays in stride format.


double dt0

Previous time step.


double dt1

Current time step.


double tt

Current time.

UserMatHost

Structure

double* p_material

Variables

double* p_material

Material parameters (host array).

C++ / CUDA API

These built-in APIs provide pre-defined functions for performing mathematical and material-related tasks. All functions are declared as both __host__ and __device__, which means they will work on both C++ and CUDA. Implementation of the functions are in mat_user_functions.h and are part of the mat namespace. To use them, prepend mat:: before each function.

double sig_eff = mat::mat_effective_stress(stress);

Cauchy-Green tensor

The Cauchy-Green tensor describes the deformation of a material body, playing a key role in nonlinear elasticity and finite deformations.

template <typename T>
inline __host__ __device__ void cauchy_green_tensor(T* mat, T* c)

Parameters

mat

  • Input: 3x3 matrix

c

  • Output: C (symmetric tensor)

Cross product

This function will return the cross product between a1 & a2 in a3.

template <typename T>
inline __host__ __device__ void cross_product(T* a1, T* a2, T* a3)

Parameters

a1

  • Input: vector 1

a2

  • Input: vector 2

a3

  • Output: vector 3

Effective strain rate

This function calculates and returns the effective strain rate.

template <typename T>
inline __host__ __device__ T mat_effective_strain_rate(T* strain_rate)

Parameters

strain_rate

  • Input: Strain rate tensor

Returns

Effective strain rate

Effective stress

This function calculates and returns the effective stress.

template <typename T>
inline __host__ __device__ T mat_effective_stress(T* strain_rate)

Parameters

stress

  • Input: Stress tensor

Returns

Effective stress

Eigenvalues

This function calculates and returns the eigenvalues eval of the input tensor s.

template <typename T>
inline __host__ __device__ void calc_eigen_values(T* s, T* eval)

Parameters

s

  • Input: Tensor to be decomposed into eigenvalues

eval

  • Output: Eigenvalues of the input tensor

Eigenvectors

This function calculates and returns the eigenvectors evec of the input tensor s and its eigenvalues eval.

template <typename T>
inline __host__ __device__ void calc_eigen_vectors(T* s, T* eval, T* evec)

Parameters

s

  • Input: Tensor to be decomposed into eigenvalues

eval

  • Input: Eigenvalues of the input tensor

eval

  • Output: Eigenvectors of the input tensor

Invert 3x3 matrix

Invert a 3x3 matrix.

template <typename T>
inline __host__ __device__ double invert_3x3(T* mat, T* mat_inv)

Parameters

mat

  • Input: 9-component 1D (3x3 matrix) in column major order

mat_inv

  • Output: 9-component 1D (3x3 matrix) inverted matrix in column major order

Returns

Determinate of the input matrix.

Load Curve / Function

Returns the ordinate value of x in the defined id.

template <typename T1, typename T2>
inline __host__ __device__ double load_curve(T1* curve_data, T2* curve_val, int id, double x)

Parameters

curve_data

  • Input: Array of curve data

curve_val

  • Input: Array of curve values

id

  • Input: Function or Curve ID

x

  • Input: Value of x in function f(x)

Returns

The ordinate value of x.

Transform tensor

Transform tensor to a fiber direction.

template <typename T, int transpose>
inline __host__ __device__ void transform_tensor(T* s, T* r, T* b)

Note: This function requires specification of the template parameter typename T and int transpose.

Parameters

s

  • Input: Tensor to be transformed

r

  • Input: fiber direction

b

  • Output: Transformed tensor

von Mises (step 1)

von Mises yield criterion (step 1)

template <typename T>
inline __host__ __device__ void mat_yield_von_mises_1(T shear, T sigy0, T* sig_eff, T* stress,
                                             T* deps, int* yield)

Parameters

shear

  • Input: Shear

sigy0

  • Input: Current yield stress

sig_eff

  • Output: Effective stress

stress

  • Input: Stress tensor

deps

  • Output: Strain increment

yield

  • Output: Flag to indicate if we have plasticity or not

von Mises (step 2)

von Mises yield criterion (step 2)

template <typename T>
inline __host__ __device__ void mat_yield_von_mises_2(T shear, T sigy0, T sigy1, T* sig_eff, T* stress,
                                             T* deps, T* epsp, T* depsp)

Parameters

shear

  • Input: Shear

sigy0

  • Input: Current yield stress

sigy1

  • Input: Yield stress at strain increment

sig_eff

  • Output: Effective stress

stress

  • Input: Stress tensor

deps

  • Output: Strain increment

epsp

  • Output: Effective plastic strain

depsp

  • Output: Strain incremment