accelerInt
v0.1
|
The van der Pol problem is a commonly used stiff test problem for ODE solvers. The stiffness is 'tunable' via modification of parameter \(\mu\) A complete reference can be found in:
Hairer, Noersett, Wanner: Solving Ordinary Differential Equations II, p 157. Springer-Verlag 1991.
or for a practical example in Matlab, see here.
The van der Pol problem can be written as a set of two first-order of non-linear ODE equations:
\begin{align} & y_1^\prime =&& y_2 \, & y_1(0) =&& y_1^\circ \nonumber \\ & y_2^\prime =&& \mu \left(1 - y_1^2\right)y_2 - y_1 \, & y_2(0) =&& y_2^\circ \nonumber \end{align}
For our purposes, we shall use:
\begin{align} y_1^\circ &= 2 \nonumber \\ y_2^\circ &= 0 \nonumber \\ \mu&= 1000 \nonumber \end{align}
The format of arrays expected by accelerInt
is of note.
The expected Jacobian ordering for the C-solvers follows pyJac v1's jacobian pattern, that is:
\begin{align} \mathcal{J}_{i, j} &= \frac{ \partial y_i^\prime } { \partial y_j } \end{align}
Pratically speaking, for the van der Pol problem, this translates to the Jacobian:
\begin{align} \mathcal{J} & = \left[ \begin{array}{cc} \frac {\partial y_1^\prime } {\partial y_1} & \frac {\partial y_1^\prime } {\partial y_2} \\ \frac {\partial y_2^\prime } {\partial y_1} & \frac {\partial y_2^\prime } {\partial y_2} \end{array} \right] \end{align}
This is then flattened in column-major (F) order, such that the flatten Jacobian array reads:
\begin{align} \vec{\mathcal{J}} & = \left\{ \frac {\partial y_1^\prime } {\partial y_1}, \frac {\partial y_2^\prime } {\partial y_1}, \frac {\partial y_1^\prime } {\partial y_2}, \frac {\partial y_2^\prime } {\partial y_2} \right\} \end{align}
This ordering can be seen in action in van_der_pol.eval_jacob()
For CUDA, the state-vector and Jacobian arrays remain in global memory by necessity due to limitations on the size of local memory per-thread. Here, the IVP index does factor into indexing concerns, and can be accessed using the built-in CUDA thread indexing parameters (e.g. threadIdx.x
, etc.).
Let's say we are solving \(N_{p}\) separate problems (e.g., differing initial conditions, stiffness parameter, etc.). Our Jacobian is now transformed to:
\begin{align} \mathcal{J}_{n, i, j} &= \frac{ \partial y_{n, i}^\prime } { \partial y_{n, j} } \end{align}
where \(y_{n, i}$\) is the i-th entry in the state vector of the n-th problem. Our Jacobian is again flattened in column major order, such that the flat vector is:
\begin{align} \vec{\mathcal{J}} & = \left\{ \frac {\partial y_{1, 1}^\prime } {\partial y_{1, 1}}, \frac {\partial y_{2, 1}^\prime } {\partial y_{2, 1}}, \ldots, \frac {\partial y_{N_{p}, 1}^\prime } {\partial y_{N_{p}, 1}}, \frac {\partial y_{1, 2}^\prime } {\partial y_{1, 1}}, \frac {\partial y_{2, 2}^\prime } {\partial y_{2, 1}}, \ldots \frac {\partial y_{N_{p}, 2}^\prime } {\partial y_{N_{p}, 1}}, \frac {\partial y_{1, 2}^\prime } {\partial y_{1, 1}}, \ldots \frac {\partial y_{N_{p}, 2}^\prime } {\partial y_{N_{p}, 2}} \right\} \end{align}
This section details the implementation files required for the C and CUDA solvers.
First, both languages need a dydt() function implementation. Both a header and implementation file are required.
If FINITE_DIFFERENCE is not specified in the SCons Options, an analytical Jacobian implementation must be must be specified. Both a header and implementation file are required:
The ODE header.h file contains several important macros and definitions.
The header.h file also contains header definitions for several functions dealing with initial conditions, set_same_initial_conditions(), apply_mask() and apply_reverse_mask().
The CUDA header.cuh file is similar to the C-version, with a few additional methods regarding GPU Memory Initialization. It contains the mechanism_memory struct, which defines memory needed for RHS and Jacobian evaluation, as well as the initialize_gpu_memory(), free_gpu_memory(), and required_mechanism_size() methods, which initialize, free and calculate the required GPU memory respectively.
In order to operate the CUDA solvers, the user must implement some additional functions designed to properly initialize the required GPU memory. accelerInt
pre-allocates the required memory for the solver and ODE model in the GPU's global memory to get around use of local per-thread (stack) memory allocation which becomes limiting for large ODE systems.
The launch_bounds.cuh file contains definitions for a few variables that control CUDA kernel launches:
The initial condition reader was designed with pyJac in mind, it is currently quite cumbersome to use for other data formats.
Additionally, the initial conditions reader expects the following data format:
\begin{align} \nonumber \text{time}, y_1, \text{parameter}, y_2,& y_3, ... y_{NSP} \text{ (State 1)}\\ \nonumber \text{time}, y_1, \text{parameter}, y_2,& y_3, ... y_{NSP} \text{ (State 2)}\\ \nonumber \vdots& \end{align}
An example of how to write a file in this format can be found in generate_ics.py
The set_same_initial_conditions(), defined in ics.c and ics.cu defines a simple function to set the initial conditions of the state vector if the SCons option SAME_IC is used.
Additionally, ics.c / ics.cu defines the dummy methods apply_mask() and apply_reverse_mask(), which are not used for the van der Pol problem.
Finally, the sparse_multiplier.c, sparse_multiplier.cu, sparse_multiplier.h and sparse_multiplier.cuh provide headers and definitions for a "sparse" (unrolled) matrix multiplier. This is used by the exponential solvers (and may provide speedups if a sparse Jacobian is used).
We compile the problem with the SCons call:
scons SAME_IC=True LOG_OUTPUT=True LOG_END_ONLY=False t_end=2000.0 t_step=1.0 mechanism_dir=examples/van_der_pol/ -j 2
This turns on the initial conditions discussed in Problem Definition.
j
parameter allows scons to use 2 threads during compilation (vary as necessary)mechanism_dir
option tells scons to look in the van_der_pol
example directory for Implementation implementation of the required functions described above.accelerInt-root/log/solvername.bin
), for plotting.ign_data.bin
file to the root accelerInt
folder, and compiling without the SAME_IC option.The CPU solvers can be called using:
./solver_name [num_threads] [num_odes]
While the GPU solvers are called via:
./solver_name [num_odes]
For example, we call:
./exprb43-int-gpu 1 ./cvodes-int 1 1 ./radau2a-int 1 1
Next, we load the data and plot (plotter.py), resulting in: