rsLQR  0.1
The rsLQR Algorithm

Algorithm Overview

The rsLQR algorithm works by recursively solving 3x3 Schur compliments of the form:

\[ \begin{bmatrix} A & D \\ D^T & B & E^T \\ & E & C \\ \end{bmatrix} \begin{bmatrix} x \\ y \\ z \end{bmatrix} \begin{bmatrix} a \\ b \\ c \end{bmatrix} \]

When solving this system, we need to solve equations of the form:

\[ A \bar{a} = a \]

\[ C \bar{c} = c \]

For a banded system, these solves also take the same form, so we can recursively solve these 3x3 Schur compliments, stopping the recursion when the \( A \) and \( C \) are small or dense enough to solve using e.g. a dense Cholesky decomposition. The full details of the algorithm are provided in the paper, which provides some of the details on "flattening" the algorithm to make it more amenable to parallelization on a many-core CPU. The following sections provide some details on the structure of the actual implementation.

Memory layout

Most of the matrix information stored by the algorithm is stored in arrays of the following form:

\[\begin{bmatrix} \Lambda_1^{(1)} & \Lambda_1^{(2)} & \dots & \Lambda_1^{(K)} \\ X_1^{(1)} & X_1^{(2)} & \dots & X_1^{(K)} \\ U_1^{(1)} & U_1^{(2)} & \dots & U_1^{(K)} \\ \Lambda_2^{(1)} & \Lambda_2^{(2)} & \dots & \Lambda_2^{(K)} \\ X_2^{(1)} & X_2^{(2)} & \dots & X_2^{(K)} \\ U_2^{(1)} & U_2^{(2)} & \dots & U_2^{(K)} \\ \vdots & \vdots & \ddots & \vdots \\ \Lambda_N^{(1)} & \Lambda_N^{(2)} & \dots & \Lambda_N^{(K)} \\ X_N^{(1)} & X_N^{(2)} & \dots & X_N^{(K)} \\ U_N^{(1)} & U_N^{(2)} & \dots & U_N^{(K)} \\ \end{bmatrix} \]

where \( N \) is the horizon length, \( K = \log_2{N} \) is the depth of the tree, and \( \Lambda_k^{(j)} \in \mathbb{R}^{n \times w}, X_k^{(j)} \in \mathbb{R}^{n \times w} U_k^{(j)} \in \mathbb{R}^{m \times w} \) are the blocks associated with dual variables, state variables, and control variables for knot point \( k \) and level \( j \), respectively. If a block has bar, e.g. \( \bar{X}_i^{(j)} \), it represents a piece of the factorized matrix information, whereas any block without is part of the original matrix data.

These blocks make up the \( D_i^{(j)}, E_i^{(j)}, a_i^{(j,p)}, \) and \( c_i^{(j,p)} \) blocks referenced in the original paper. At higher levels, each of these blocks get larger, and are therefore made up of more of these basic blocks indexed by knot point index instead of leaf index. This allows the entire algorithm to work on small blocks of similar size, decreasing the likelihood of cache misses. It also simplifies many of the parallel loops and resulting computational kernels.

High level overview

Pseudocode for the algorithm, invoked by ndlqr_Solve(), is included below:

int K = tree.depth;
// In parallel
for (int index = 0; index < N; ++index) {
ndlqr_SolveLeaf(solver, index)
}
// Sync threads
// Compute matrix factorization
for (int level = 0; level < K; ++level) {
int L = NumLeaves(level, K);
// In parallel
for (int i = 0; i < L; ++i) {
for (int p = level; p < K; ++p) {
int k = ndlqr_GetIndexFromLeaf(tree, leaf, level);
}
}
// Sync threads
// In parallel
for (int i = 0; i < L; ++i) {
FactorizeBbar(i, level); // not an actually implemented method
}
// Sync threads
// In parallel
for (int i = 0; i < L; ++i) {
for (int p = level + 1; level < K; ++p) {
int k = ndlqr_GetIndexFromLeaf(tree, i, level);
ndlqr_SolveCholeskyFactor(solver, k, level, p)
}
}
// Sync threads
// In parallel
for (int k = 0; k < N; ++k) {
for (int p = level; p < K; ++p) {
ndlqr_UpdateShurFactor(solver, k, level, p);
}
}
// Sync threads
}
// Solve for solution vector
for (int level = 0; level < K; ++level) {
int L = NumLeaves(level, K);
// In Parallel
for (int i = 0; i < L; ++i) {
int k = ndlqr_GetIndexFromLeaf(i, level);
ndlqr_FactorInnerProduct(k, level, K+1);
}
// Sync threads
// In Parallel
for (int i = 0; i < L; ++i) {
int k = ndlqr_GetIndexFromLeaf(i, level);
ndlqr_SolveCholeskyFactor(k, level, K+1);
}
// Sync threads
// In parallel
for (int k = 0; k < N; ++k) {
ndlqr_UpdateShurFactor(k, level, K+1);
}
// Sync threads
}
ndlqr_UpdateShurFactor
int ndlqr_UpdateShurFactor(NdData *fact, NdData *soln, int index, int i, int level, int upper_level, bool calc_lambda)
Calculates and to complete the factorization at the current level.
Definition: nested_dissection.c:154
ndlqr_GetIndexFromLeaf
int ndlqr_GetIndexFromLeaf(const OrderedBinaryTree *tree, int leaf, int level)
Get the knot point index given the leaf index at a given level.
Definition: binary_tree.c:65
ndlqr_FactorInnerProduct
int ndlqr_FactorInnerProduct(NdData *data, NdData *fact, int index, int data_level, int fact_level)
Calculates one of the inner products needed at level data_level.
Definition: nested_dissection.c:114
ndlqr_SolveCholeskyFactor
int ndlqr_SolveCholeskyFactor(NdData *fact, CholeskyInfo *cholinfo, int index, int level, int upper_level)
Use the precomputed Cholesky factorization to solve for y at each parent level.
Definition: nested_dissection.c:136
ndlqr_SolveLeaf
int ndlqr_SolveLeaf(NdLqrSolver *solver, int index)
Solve all the equations for the lowest-level diagonal blocks, by timestep.
Definition: nested_dissection.c:10