Towards modular Hierarchical Poincaré-Steklov solvers

$$ \newcommand\scalemath[2]{\scalebox{#1}{\mbox{\ensuremath{\displaystyle#2}}}} $$

Abstract

We revisit the Hierarchical Poincaré–Steklov (HPS) method for the Poisson equation using standard Q1 finite elements, building on the original in [1]. While corner degrees of freedom were implicitly handled in that work, subsequent spectral-element implementations have typically avoided them. In Q1-FEM, however, corner coupling cannot be factored out, and we show how the HPS merge procedure naturally accommodates it when corners are enclosed by elements. This clarification bridges a conceptual gap between algebraic Schur-complement methods and operator-based formulations, providing a consistent path for the FEM community to adopt HPS to retain the Poincaré–Steklov interpretation at both continuous and discrete levels.

Introduction

Hierarchical Poincaré--Steklov (HPS) solvers are a class of hierarchical direct solvers designed for elliptic PDEs; the name was coined in [2, 3] but some of the ideas are in [4, 5, 6, 7]. Starting with many subdomains, the goal is to recursively merge local boundary operators -- typically Dirichlet-to-Neumann (DtN) or Impedance-to-Impedance (ItI) maps -- constructing a global one that we apply to the problem. The HPS approach achieves high accuracy and near-optimal complexity, combining ideas present in hierarchical matrix computations ($\mathcal{H}$), domain decomposition (DD) and direct solvers, i.e., it is poised to be the keystone connecting several communities, e.g., BDDC or FETI; in words of one of the authors after reading [7]: "For the love of God, they need to start talking to each other!". In our opinion this communication has been limited also due to the strong spectral element methods (SEM) background of the HPS community; the formulation, discretization and computation in HPS are often entangled together, making it difficult to relate pros and cons of the "package" to its parts. We want to provide a modular alternative, approachable for readers across multiple communities and for the sake of space we focus on the corner points ¹, which are routinely considered an obstacle in the HPS context [2, 3, 11, 12, 13, 14]. Many aspects of what follows can be found somewhere in the literature, sometimes with limited references to the other fields but, to the best of our knowledge, a modular HPS exposition is nowhere to be found in the HPS literature.

As mentioned, the HPS community is using predominantly SEM on tensor product grids -- it offers high (possible) accuracy and lets us avoid the corner points, e.g., with the Gauss-Legendre points. If the corner points appear, in SEM they usually come decoupled from the interior nodes or can be avoided altogether by modifying the spectral discretization, see, e.g., [2, 11, 12, 13]. HPS using finite differences or finite volumes, [7, 15], also rely on avoiding the corner-coupling issue that arises in, e.g., FEM. The rationale is both analytical but also practical: the used Poincaré-Steklov (PS) operators need not be well-defined in the presence of corners and the tensor-product basis naturally isolates corner DoFs [16, 17, 18, 19]. Hence, for many new readers, the HPS methods are intrinsically connected with such discretization schemes.

However, essentially the same problems have been studied also from the algebraic perspective, e.g., nested dissection, hierarchical semi-separable and hierarchical multifrontal techniques, e.g., [20, 21, 22 23], are purely algebraic: they operate directly on the discrete system, exploiting observed numerical blockwise low-rankness for compression and factorization. Although the foundational work on hierarchical matrices, see e.g., [24 25], is built on the continuous operators, to the best of our knowledge, it does not include PS operators, nor incorporate static condensation or skeletonization. The recursive skeletonization can be viewed within the multilevel DD or multigrid framework -- in [26], HPS has been identified with a specific multigrid V-cycle.

Our primary goal below is to separate the discretization method and the way in which the method treats the corner points, thereby helping to build the modular view of HPS. For that reason we choose the standard Poisson problem on a rectangle and use the Q1-FEM discretization on a tensor product grid, where the basis functions firmly couple the corner point DoFs with others. In the HPS community, this would be considered a major issue as it prevents a straightforward definition of the local DtN. However, having discretized we show this can be resolved with little extra effort. We are not aware of the HPS and Q1-FEM coupling (or other simple low-order FEM) anywhere in the literature; this set-up should also provide a simple entry point into HPS methods for broader audience and FEM enthusiasts will notice that we do not rely on the Q1 elements in any way. We again highlight that in different communities and in different context similar ideas already exists, see, e.g. [8], where the authors consider mixed-order curl-conforming FEM discretization for time-harmonic Maxwell equations in $\mathbb{R}^3$ -- an involved setting in which HPS is not mentioned but corner points and edge points are considered.

The HPS method with corners

As noted above, we consider the simplest model problem \begin{equation}\label{eqn_secHPS_PoissPrblm} \Delta u =f \quad \text{ in } \Omega:=(\alpha,\beta)\times(\gamma,\delta) \quad \text{ and } \quad u=g \quad \text{ on } \partial \Omega, \end{equation} and start by outlining the structure of a general HPS method:

Partition — partition the domain Ω into subdomains.
Discretization & Assembly — formulate, discretize, and assemble the subdomain solution boundary operators for the subdomains.
Merge — merge the neighboring solution boundary operators and store the result.
Recursion — recurse and continue merging until we reach the entire domain Ω.
Application — given data, apply the global solution boundary operator and calculate the solution on the boundaries of the subdomains.
Reconstruction — reconstruct the solution in the subdomains from the boundaries.

As per the partition stage, we the standard, grid-like set-up \begin{equation*} \Omega_{e} = [a^{(x_1)}_{e}, b^{(x_1)}_{e}] \times [a^{(x_2)}_{e}, b^{(x_2)}_{e}] \subset \Omega, \end{equation*} see Figure 1 -right, forming a non-overlapping decomposition of $\Omega$ with corner points; other decompositions can be treated identically [8,Figure 2].

Figure 1 Left: two neighbouring subdomains $\Omega_1,\Omega_2$ with the index sets of the grids. We have $\iota_1^I\equiv \iota_1^R$ and $\iota_2^I\equiv \iota_2^L$ and also see the corner index sets $\iota_1^C,\iota_2^C$, although not separately highlighted. Finally, we set $\iota^{\partial}_{e} := \iota^{L}_{e} \cup \iota^{R}_{e} \cup \iota^{T}_{e} \cup \iota^{B}_{e} \cup \iota^{C}_{e}$ so that $\iota_{e} = \iota^{\text{int}}_{e} \cup \iota^{\partial}_{e}$ for any $e$. Right: (incomplete) illustration of the nested dissection merge hierarchy ordering, see [1,Appendix A].

The discretization & assembly

The analytical background. We are interested in constructing the subdomain solution boundary operators -- dealing with the Poisson problem, those are the subdomain DtNs. Let $u_{e}$ denote the solution on the subdomain $\Omega_{e}$, i.e., \begin{equation}\label{eqn_secAssmbly_LocalPoissonPrblm} -\Delta u_{e} = f \quad \text{ in }\; \Omega_e, \quad \text{ and } \quad u_{e} = g_e \quad \text{ on }\; \partial\Omega_e. \end{equation} We can split $u_{e}$ into the sum of the harmonic lift of the boundary data $g_e$, denoted by $u^{(g)}_e$ and the particular solution of the interior load $f$, denoted by $u^{(f)}_e$, obtaining \begin{equation*} -\Delta u^{(g)}_e = 0 \text{ in } \Omega_{e} \quad \& \quad u^{(g)}_e = g_e \text{ on } \partial\Omega_{e} \quad \text{ and } \quad -\Delta u^{(f)}_e = f \text{ in } \Omega_{e} \quad \& \quad u^{(f)}_e = 0 \text{ on } \partial\Omega_{e}. \end{equation*} Analogously, we also split the Neumann trace of the solution, denoted by $\partial_n u_{e}$, \begin{equation}\label{eqn_secAssmbly_LocalFlxsAsDtN_ContFromultn} \partial_n u_{e} = \partial_n u^{(g)}_e + \partial_n u^{(f)}_e =: \Lambda_{e} g_e + q_{e} \end{equation} featuring homogeneous DtN $\Lambda_{e}$ and the particular Neumann trace $q_{e}$ on $\Omega_{e}$.

Discretization. We first introduce grid nodes in $\Omega^{e}$ in a tensor-product manner along the $x_1$ and $x_2$ axis. On this grid we consider the $Q_1$ finite element discretization of \eqref{eqn_secAssmbly_LocalPoissonPrblm} and index the local DoFs by integer pairs $\iota_{e}=(i,j)$; see Figure 1 for the details. Applying integration by parts to the continuous weak form of \eqref{eqn_secAssmbly_LocalPoissonPrblm} gives \begin{equation*} \int_{\Omega_e} \nabla u_{e} \cdot \nabla \phi_m\,\text{d}\mathbf{x} = \int_{\Omega_e} f\,\phi_m\,\text{d}\mathbf{x} + \int_{\partial\Omega_e} (\partial_n u_{e})\,\phi_m\,\text{d}s, \quad m\in \iota_{e} \end{equation*} and then, after approximating $u_{e}$ in the Q1-FEM basis and reordering the DoFs, we get the discretized system for the unknown coefficients $\mathbf{u}_e$ \begin{equation}\label{eqn_secAssmbly_DiscrtzdIntgrtnByParts} \begin{bmatrix} A_{e} & B_{e}\\[3pt] C_{e} & D_{e} \end{bmatrix} \begin{bmatrix} \mathbf{u}_{e}^{\text{int}}\\[3pt] \mathbf{u}_{e}^{\partial} \end{bmatrix} = \begin{bmatrix} \mathbf{f}_{e}^{\text{int}}\\[3pt] \mathbf{f}_{e}^{\partial} \end{bmatrix} + \begin{bmatrix} \mathbf{0} \\[3pt] \partial_n \mathbf{u}_{e} \end{bmatrix}, \end{equation} where $\partial_n \mathbf{u}_{e}$ is the Q1-FEM discretization of the Neumann trace of $\partial_n u_{e}$ at $\iota_{e}^{\text{int}}$.

Assembly. Equation \eqref{eqn_secAssmbly_DiscrtzdIntgrtnByParts} shows that the negative residual along the interface} is the finite element representation of the approximate normal fluxes along the boundary. As these fluxes are unknown, the equations in the second block-row give the formula, \begin{equation*} \partial_n \mathbf{u}_{e} = C_e \mathbf{u}_{e}^{\text{int}} + D_e \mathbf{u}_{e}^{\partial} - \mathbf{f}_{e}^{\partial}, \end{equation*} which, after elimination of the interior DoFs, becomes \begin{equation}\label{eqn_secAssmbly_LocalFlxsAsDtN_DiscrtMirroring} \partial_n \mathbf{u}_{e} \equiv \mathbf{r}_{e}^{\partial} = \left( D_{e} - C_{e} A_{e}^{-1} B_{e} \right) \mathbf{u}_{e}^{\partial} + C_{e}A_{e}^{-1} \mathbf{f}_{e}^{\text{int}} - \mathbf{f}_{e}^{\partial} =: S_{e} \mathbf{u}_{e}^{\partial} + \mathbf{h}_{e}. \end{equation} This relation mirrors the continuous decomposition \eqref{eqn_secAssmbly_LocalFlxsAsDtN_ContFromultn}: the Schur complement $S_{e}$ acts as the discrete homogeneous DtN operator mapping boundary values $\mathbf{u}_{e}^{\partial}$ to their induced boundary fluxes, while $\mathbf{h}_{e}$ represents the discrete flux produced by the interior load under homogeneous Dirichlet conditions. The assembly stage of HPS methods consists of computing (or approximating) the matrices $S_{e}$ so that the right-hand side of \eqref{eqn_secAssmbly_LocalFlxsAsDtN_DiscrtMirroring} can be evaluated rapidly in the application stage.

The merge stage

Having two subdomains, say $\Omega_1,\Omega_2$, with finished assembly stage that share an interface, we want to assemble the solution boundary operator for $\Omega_1 \cup\Omega_2$. The ordering in which we will pick the subdomain pairs matters as it highly influences the parallelizability of the resulting solver; we follow the nested dissection ordering as illustrated in Figure 1-right; first we merge horizontally and then vertically.

Horizontal merge (left $\Omega_1$ and right $\Omega_2$). The true solution of \eqref{eqn_secHPS_PoissPrblm} is continuous and has balanced fluxes (i.e., residuals) across the interface, i.e., \begin{equation}\label{eqn_secMerge_Hor_ContOfSolBalancOfFlxs} \mathbf{u}_1(\iota_1^{I}) = \mathbf{u}_2(\iota_2^{I}) \quad \text{ and } \quad \mathbf{r}_1(\iota_1^{I}) + \mathbf{r}_2(\iota_2^{I}) = 0. \end{equation}

Recalling \eqref{eqn_secAssmbly_LocalFlxsAsDtN_DiscrtMirroring} and blocking it according to $\iota_e^{\partial} = \iota_e^{I} \cup \iota_e^{\partial\backslash I}$ for $e=1,2$ gives \begin{equation}\label{eqn_secMerge_Hor_GnrlResdlEqnAlongIntrfc} \begin{bmatrix} \mathbf{r}_e(\iota_e^{\partial\setminus I})\\ \mathbf{r}_e(\iota_e^{I}) \end{bmatrix} = \begin{bmatrix} S_e(\iota_e^{\partial\setminus I},\iota_e^{\partial\setminus I}) & S_e(\iota_e^{\partial\setminus I},\iota_e^{I})\\ S_e(\iota_e^{I},\iota_e^{\partial\setminus I}) & S_e(\iota_e^{I},\iota_e^{I}) \end{bmatrix} \begin{bmatrix} \mathbf{u}_e(\iota_e^{\partial\setminus I})\\ \mathbf{u}_e(\iota_e^{I}) \end{bmatrix} + \begin{bmatrix} \mathbf{h}_e(\iota_e^{\partial\setminus I})\\ \mathbf{h}_e(\iota_e^{I}) \end{bmatrix}. \end{equation} Summing the second block-rows for $e=1,2$, using \eqref{eqn_secMerge_Hor_ContOfSolBalancOfFlxs} and reordering gives \begin{equation}\label{eqn_secMerge_Hor_SolOnIntrfcIndsFrml} \left( S_1(\iota_1^{I},\iota_1^{I}) + S_2(\iota_2^{I},\iota_2^{I}) \right) \mathbf{u}_1(\iota_1^{I}) = - \sum\limits_{e=1,2} \mathbf{h}_e(\iota_e^{I}) + S_e(\iota_e^{I},\iota_e^{\partial\setminus I})\mathbf{u}_e(\iota_e^{\partial\setminus I}). \end{equation} Returning to \eqref{eqn_secMerge_Hor_GnrlResdlEqnAlongIntrfc}, we concatenate the equations for the residuals on the "merged boundary" $r_e(\iota_e^{\partial\setminus I}), e=1,2$, use \eqref{eqn_secMerge_Hor_SolOnIntrfcIndsFrml} in both and reorder so as to obtain \begin{equation}\label{eqn_secMerge_Hor_ResOnExteriorIndsAsDtN} \begin{bmatrix} \mathbf{r}_1(\iota_1^{\partial\setminus I})\\[2pt] \mathbf{r}_2(\iota_2^{\partial\setminus I}) \end{bmatrix} = S^{\text{H}} \begin{bmatrix} \mathbf{u}_1(\iota_1^{\partial\setminus I})\\[2pt] \mathbf{u}_2(\iota_2^{\partial\setminus I}) \end{bmatrix} + \mathbf{h}^{\text{H}}, \end{equation} i.e., the horizontally merged boundary solution operators as in \eqref{eqn_secAssmbly_LocalFlxsAsDtN_DiscrtMirroring} with \begin{equation*} \begin{aligned} S^{\text{H}} =& \begin{bmatrix} S_1(\iota_1^{\partial\setminus I},\iota_1^{\partial\setminus I}) & 0\\ 0 & S_2(\iota_2^{\partial\setminus I},\iota_2^{\partial\setminus I}) \end{bmatrix} - \\ &\begin{bmatrix} S_1(\iota_1^{\partial\setminus I},\iota_1^{I})\\ S_2(\iota_2^{\partial\setminus I},\iota_2^{I}) \end{bmatrix} \big(S_1(\iota_1^{I},\iota_1^{I}) + S_2(\iota_2^{I},\iota_2^{I})\big)^{-1} \begin{bmatrix} S_1(\iota_1^{I},\iota_1^{\partial\setminus I}) & S_2(\iota_2^{I},\iota_2^{\partial\setminus I}) \end{bmatrix},\\[1ex] \mathbf{h}^{\text{H}} =& \begin{bmatrix} \mathbf{h}_1(\iota_1^{\partial\setminus I})\\[2pt] \mathbf{h}_2(\iota_2^{\partial\setminus I}) \end{bmatrix} - \begin{bmatrix} S_1(\iota_1^{\partial\setminus I},\iota_1^{I})\\[2pt] S_2(\iota_2^{\partial\setminus I},\iota_2^{I}) \end{bmatrix} \big(S_1(\iota_1^{I},\iota_1^{I}) + S_2(\iota_2^{I},\iota_2^{I})\big)^{-1} \big(\mathbf{h}_1(\iota_1^{I}) + \mathbf{h}_2(\iota_2^{I})\big). \end{aligned} \end{equation*}

Vertical merge (bottom $\Omega_1$ and top $\Omega_2$). Say we have "horizontally" merged the boundary solution operators for two couples of subdomains $\Omega_{1L},\Omega_{1R}$ and $\Omega_{2L}, \Omega_{2R}$, e.g. the merges $\enclose{circle}{1}$ and $\enclose{circle}{3}$ in Figure 1, and we are ready to merge along the vertical interface -- labeled $\fbox{9}$ and $\fbox{10}$ -- andthen also at the corner point enclosed between the already merged interfaces}. First, keeping the enclosed corner DoF, indexed² by $\iota^{C}_{\Gamma}\equiv c$, uneliminated, the steps in merges $\fbox{9}$ and $\fbox{10}$ carry through identically to the horizontal merges $\enclose{circle}{1}$ or $\enclose{circle}{3}$, only now the index set $\iota_e^{I}$ is disjoint, e.g., $\iota_1^{I} = \iota_{1L}^{T} \cup \iota_{1R}^{T}$, and the index sets $\iota_1^{\partial\backslash I},\iota_2^{\partial\backslash I}$ contain the enclosed corner index $\iota^{C}_{\Gamma}$. That is, we have \begin{equation}\label{eqn_secMerge_Ver_ResOnExteriorIndsAsDtN} \begin{bmatrix} \mathbf{r}_1(\iota_1^{\partial\setminus I})\\[2pt] \mathbf{r}_2(\iota_2^{\partial\setminus I}) \end{bmatrix} = S^{\text{V}} \begin{bmatrix} \mathbf{u}_1(\iota_1^{\partial\setminus I})\\[2pt] \mathbf{u}_2(\iota_2^{\partial\setminus I}) \end{bmatrix} + \mathbf{h}^{\text{V}}, \end{equation} with the vertically merged boundary solution operators as in \eqref{eqn_secAssmbly_LocalFlxsAsDtN_DiscrtMirroring}, i.e., \begin{equation*} \begin{aligned} S^{\text{V}} =& \begin{bmatrix} S_1(\iota_1^{\partial\setminus I},\iota_1^{\partial\setminus I}) & 0\\ 0 & S_2(\iota_2^{\partial\setminus I},\iota_2^{\partial\setminus I}) \end{bmatrix} - \\ &\begin{bmatrix} S_1(\iota_1^{\partial\setminus I},\iota_1^{I})\\ S_2(\iota_2^{\partial\setminus I},\iota_2^{I}) \end{bmatrix} \big(S_1(\iota_1^{I},\iota_1^{I}) + S_2(\iota_2^{I},\iota_2^{I})\big)^{-1} \begin{bmatrix} S_1(\iota_1^{I},\iota_1^{\partial\setminus I}) & S_2(\iota_2^{I},\iota_2^{\partial\setminus I}) \end{bmatrix},\\[1ex] \mathbf{h}^{\text{V}} =& \begin{bmatrix} \mathbf{h}_1(\iota_1^{\partial\setminus I})\\[2pt] \mathbf{h}_2(\iota_2^{\partial\setminus I}) \end{bmatrix} - \begin{bmatrix} S_1(\iota_1^{\partial\setminus I},\iota_1^{I})\\[2pt] S_2(\iota_2^{\partial\setminus I},\iota_2^{I}) \end{bmatrix} \big(S_1(\iota_1^{I},\iota_1^{I}) + S_2(\iota_2^{I},\iota_2^{I})\big)^{-1} \big(\mathbf{h}_1(\iota_1^{I}) + \mathbf{h}_2(\iota_2^{I})\big). \end{aligned} \end{equation*}

Corner merge (corner of $\Omega_{1L}, \Omega_{1R}, \Omega_{2L}, \Omega_{2R}$). Analogously to \eqref{eqn_secMerge_Hor_ContOfSolBalancOfFlxs}, we have \begin{equation}\label{eqn_secMerge_Cor_ContOfSolAndBalancOfFlxs} \mathbf{u}_{1}(c)=\mathbf{u}_{2}(c)=\mathbf{u}(c) \quad \text{ and } \quad \sum\limits_{e=1,2} \mathbf{r}_e(c) = 0. \end{equation} Collecting the $\iota^{C}_{\Gamma}\equiv c$ equations from \eqref{eqn_secMerge_Ver_ResOnExteriorIndsAsDtN} and inserting them into \eqref{eqn_secMerge_Cor_ContOfSolAndBalancOfFlxs} gives \begin{equation*} \sum_{e=1,2} S_e(c,c)\,\mathbf{u}(c) = - \sum_{e=1,2} S_e(c,E_e)\mathbf{u}_e(E_e) + \mathbf{h}_e(c), \end{equation*} where $E_e$ are the indices of points on $\partial\Omega_e \cap \partial(\Omega_1\cup \Omega_2)$. Solving for $\mathbf{u}(c)$ and inserting back in \eqref{eqn_secMerge_Ver_ResOnExteriorIndsAsDtN} gives a system on the exterior index set $E:= E_1\cup E_2$ for fluxes \begin{equation*} \begin{gathered} \mathbf{r}(E) = S^{\text{corner}}\,\mathbf{u}(E) + \mathbf{h}^{\text{corner}} \; \text{with}\\ \begin{aligned} S^{\text{corner}} &= S^{\text{V}}(E,E) - S^{\text{V}}(E,c) \Big(S^{\text{V}}(c,c)\Big)^{-1} S^{\text{V}}(c,E), \\ \mathbf{h}^{\text{corner}} &= \mathbf{h}^{\text{V}}(E) - S^{\text{V}}(E,c) \Big(S^{\text{V}}(c,c)\Big)^{-1} \mathbf{h}^{\text{V}}(c), \end{aligned} \end{gathered} \end{equation*} with identical structure of the resulting boundary solution operator as in \eqref{eqn_secMerge_Ver_ResOnExteriorIndsAsDtN} or \eqref{eqn_secMerge_Hor_ResOnExteriorIndsAsDtN}.

What is the point? First, this treatment of the corner points is fundamentally different to the "change of basis" approach used, e.g., in [3] -- no retabulation, rather following the same ground ideas behind HPS. Second, it is also fundamentally different from the "ignore" approach used, e.g., in [16], as that is simply not an option due to the corner point DoFs coupling. Third, this is in fact very similar to [1], but outlined only within the SEM context with the aforementioned benefits. Our point is that the corner points should be merged once the surrounding interfaces have been merged to maximize the efficiency and doing that follows analogous steps used before, even when using fully coupled corner DoFs of FEM.

The recursion

Having successfully eliminated the interface and enclosed corner DoFs, we recurse and continue until reaching a problem on $\partial \Omega$, where the Dirichlet trace is known. Let $\Omega = \bigcup_e \Omega_e$ be the decomposition into subdomains, each with its local DtN $(S_e,\mathbf{h}_e)$. The global system on the skeleton -- i.e., on $\bigcup_e \partial\Omega_e$ -- reads \begin{equation*} S\, \mathbf{u}^{(\partial)} = \mathbf{h}^{(\partial)}, \end{equation*} where $S$ is built from $S_e$ based on the interface continuity and flux balances. We order the boundary indices $\iota^{(\partial)} = \bigcup_e \iota_e^{\partial}$ by the merging hierarchy: first domain boundaries, then merged interfaces and enclosed corners (following the nested dissection ordering). Hence the to internalize indices come after the active exterior indices, \begin{equation*} \iota^{(\partial)} = \big( \iota^{(1)}_{\text{ext}},\ \iota^{(1)}_{\text{merge}},\ \iota^{(2)}_{\text{merge}},\ \dots,\ \iota^{(L)}_{\text{merge}} \big), \end{equation*} where $\iota^{(\ell)}_{\text{merge}}$ are the indices eliminated at recursion level $\ell$. Then $S$ has the structure \begin{equation*} S = \begin{bmatrix} S_{EE} & S_{EM}\\ S_{ME} & S_{MM} \end{bmatrix}, \end{equation*} where $E$ and $M$ representing the exterior and to-be-merged blocks. The merge step corresponds precisely to eliminating the $S_{MM}$ block via its Schur complement: \begin{equation*} \widehat{S}_{EE} = S_{EE} - S_{EM}\,S_{MM}^{-1}\,S_{ME} \quad \text{ and } \quad \widehat{\mathbf{h}}_E = \mathbf{h}_E - S_{EM}\,S_{MM}^{-1}\,\mathbf{h}_M, \end{equation*} where $(\widehat{S}_{EE},\widehat{\mathbf{h}}_E)$ defines the reduced DtN operator and right-hand side of the updated skeleton after that merge. Proceeding recursively the calculation always follows the same two-domain pattern, possibly extended by enclosed-corner junctions. That is the HPS skeleton solver can be interpreted as a single recursive Schur complement elimination applied to the global skeleton matrix $S$. Each recursion step in HPS corresponds to eliminating the block $(\iota^{(\ell)}_{\text{merge}},\iota^{(\ell)}_{\text{merge}})$ corresponding to indices merged at that level of the hierarchy. At the top of the recursion we get the final reduced operator $S^{(L)}$ on $\partial\Omega$, whose equilibrium equation represents the DtN map of $\Omega$.

Numerical illustration

Table 1. Average speedup (left) and break-even solves (right) relative to MATLAB's backslash.

	Speedup				Break-even solves
# subdomains # elements per subdomain	4×4	8×8	16×16	32×32	4×4	8×8	16×16	32×32
4×4	1	3	5	12	N/A	4	2	2
8×8	2	6	9	17	5	4	3	2
16×16	3	7	21	28	5	5	4	3
32×32	4	11	26	35	7	5	4	4

We conclude by showcasing the performance of HPS, implemented in MATLAB with very few optimizations. The build stage is computationally costly so the method is useful when we have several different right-hand sides. We run our tests on a laptop with 32GB RAM and a i7-12700H Intel microprocessor with six 4.7 GHz performance cores, eight 3.5GHz efficient cores and twenty total threads with performance-core hyperthreading. We take the MATLAB's backslash for the skeleton problem as the benchmark (ignoring the reconstruction). This comparison is stricter than a full solution comparison, where backslash would process a significantly larger operator.

^{1. We note that in the DD community we
usually use the term cross points,
e.g.,
[9,,
10].

↩}

^{2. We use $\iota^{C}_{\Gamma}\equiv c$ as an
absolute index across the indexing in the four subdomains
$\Omega_{1L}, \Omega_{1R}, \Omega_{2L}, \Omega_{2R}$, even though
the point has likely a different index in each of them.

↩}

Bibliography

[1] P.G. Martinsson, J. of Comp. Phy. 242, 460 (2013)

[2] A. Gillman, P.G. Martinsson, SIAM J. on Sci. Comp. 36(4), A2023 (2014)

[3] P.G. Martinsson, arXiv:1506.01308 (2015)

[4] Y. Chen, Adv. in Comp. Math. 16(2–3), 175 (2002)

[5] P.G. Martinsson, V. Rokhlin, J. of Comp. Phy. 205(1), 1 (2005)

[6] L. Greengard, D. Gueyffier, P.G. Martinsson, V. Rokhlin, Acta Num. 18, 243–275 (2009)

[7] A. Gillman, Fast direct solvers for elliptic partial differential equations. Ph.D. Thesis, University of Colorado Boulder (2011)

[8] J. Lu, J.F. Lee, Journal of Computational Physics 503, 112824 (2024)

[9] M.J. Gander, K. Santugini-Repiquet, ETNA 45, 219 (2016)

[10] X. Claeys, E. Parolin, Numer. Math. 151, 405 (2022)

[11] A. Gillman, A.H. Barnett, P.G. Martinsson, BIT Num. Math. 55(1), 141 (2015)

[12] T. Babb, A. Gillman, S. Hao, P.G. Martinsson, BIT Num. Math. 58(4), 851 (2018)

[13] N.N. Beams, A. Gillman, R.J. Hewett, Comp. & Math. w. Appl. 79(4), 996 (2020)

[14] J.P. Lucero Lorca, N. Beams, D. Beecroft, A. Gillman, SIAM J. Sci. Comp. 46(1), A80 (2024)

[15] D. Chipman, D. Calhoun, C. Burstedde, arXiv:2402.14936 (2024)

[16] P. Geldermans, A. Gillman, SIAM J. on Sci. Comp. 41(1), A292 (2019)

[17] D. Fortunato, SIAM J. on Sci. Comp. 46(4), A2582 (2024)

[18] D. Melia, D. Fortunato, A. Gillman, M. O’Neil, arXiv:2503.17535 (2025)

[19] Y. Kump, A. Yesypenko, P.G. Martinsson, arXiv:2503.04033 (2025)

[20] A. George, SIAM J. on Num. Anal. 10(2), 345 (1973)

[21] L. Grasedyck, R. Kriemann, S. Le Borne, Numerische Math. 112(4), 565 (2009)

[22] J. Xia, S. Chandrasekaran, Y. Gu, X.S. Li, SIAM J. on Matrix Anal. Appl. 31(3), 1382 (2009)

[23] P.G. Schmitz, L. Ying, J. of Comp. Phy. 231(4), 1314 (2012)

[24] S. Börm, L. Grasedyck, W. Hackbusch, Hierarchical Matrices: Lecture notes no. 21 (2006)

[25] W. Hackbusch, Hierarchical Matrices: Algorithms and Analysis (Springer Berlin, 2015)

[26] J.P. Lucero Lorca, Submitted in the proceedings of DD29 (2025)