Towards a Multigrid Preconditioner Interpretation of Hierarchical Poincaré-Steklov Solvers

$$ \newcommand\scalemath[2]{\scalebox{#1}{\mbox{\ensuremath{\displaystyle#2}}}} $$

Abstract

We revisit the Hierarchical Poincaré--Steklov (HPS) method within a preconditioned iterative framework. Originally introduced as a direct solver for elliptic boundary-value problems, the HPS method combines nested dissection with tensor-product spectral element discretizations, even though it has been shown in other contexts[1]. Building on the iterative variant proposed in [2], we reinterpret the hierarchical merge structure of HPS as a natural multigrid preconditioner. This perspective unifies direct and iterative formulations of HPS connecting it to multigrid domain decomposition. The resulting formulation preserves the high accuracy of spectral discretizations while enabling flexible iterative solution strategies. Numerical experiments in two dimensions demonstrate the performance and convergence behavior of the proposed approach.

Introduction

The Hierarchical Poincaré–Steklov (HPS) method was introduced by Martinsson [3] [4] as a direct solver for elliptic boundary-value problems, combining nested dissection with spectral element discretizations on tensor-product grids. Subsequent extensions adapted this framework to variable-coefficient Helmholtz equations, demonstrating high accuracy and computational efficiency for large-scale problems [5] [6] [7]. In the impedance-to-impedance (ItI) formulation—based on the discretization introduced by Després [8]—Dirichlet and Neumann traces are replaced by local impedance maps, providing a closed interface representation suitable for high-frequency and heterogeneous media.

Building on the iterative variant introduced in [2], the present work reformulates the HPS framework within a preconditioned iterative setting. The hierarchical merging structure of HPS naturally provides the multilevel organization required for such solvers.

The iterative variant presented in [2] employed GMRES with a block-Jacobi preconditioner that exploited the tensor-product structure of the local spectral operators to enable a fast application of the preconditioner. That study focused on three-dimensional Helmholtz problems, demonstrating how the local separability of the discretization could be leveraged for efficiency.

The present work builds on the observation, already noted in the literature, that the Hierarchical Poincaré–Steklov method can be viewed as a nested-dissection solver for a spectral element discretization. The main contribution lies in recognizing that this structure naturally defines a multilevel preconditioner, linking the direct and iterative viewpoints within a unified framework. Numerical experiments in two dimensions illustrate the resulting formulation and assess its convergence behavior.

While the present work focuses on the iterative reformulation of HPS as a multilevel preconditioner, a complementary study [1] analyzes the modular structure of HPS, bridging finite-element and domain-decomposition perspectives.

Model problem

We consider the variable-coefficient Helmholtz equation with impedance boundary conditions

\begin{align} -\Delta u - \underbrace{\kappa^2 \left(1-b(\boldsymbol{x})\right)}_{:=c(\boldsymbol{x})}u = s,\quad \boldsymbol{x}\in\Omega \qquad \text{ and } \qquad \frac{\partial u}{\partial n} + i\eta u = t, \quad \boldsymbol{x}\in\partial\Omega, \label{eqn:diffhelm} \end{align}

where $\Omega=(0,1)^2\subset\mathbb{R}^2$ and $u:\Omega\to\mathbb{C}$ is the unknown field, $\eta\in\mathbb{R}$ chosen equal to $\kappa\in\mathbb{R}$ the wavenumber, $b(\boldsymbol{x})$ a smooth coefficient, and $s,t$ smooth source and boundary data. Impedance boundary conditions of this form are widely used in diffraction, acoustics, and electromagnetic scattering [9] [10] [11] [12]; see also [§1.1,§1.2,13] for an overview.

Discretization

Consider a structured spectral element mesh, $\Omega=(0,1)^2$ is divided into a square grid of square elements, each with a tensor–product Gauss–Legendre–Lobatto (GLL) grid of order $N$. This construction allows high-order local operators from the tensor product of 1D differentiation and mass matrices while preserving continuity of impedance data on shared edges (see [8]). Local ItI maps are assembled element-by-element and coupled through interface conditions as described in the following sections.

Local discretization

Each element problem is represented by \begin{align} \tilde{L} =& K_x\otimes M_y + M_x\otimes K_y + \operatorname{diag}\!\bigl(c(x_i,y_j)\bigr)(M_x\otimes M_y), \end{align} where $M_x=\operatorname{diag}(w_i)$ and $M_y=\operatorname{diag}(w_j)$ are 1D GLL mass matrices, $K_x=D_x^{\top}M_xD_x$ and $K_y=D_y^{\top}M_yD_y$ are stiffness matrices, and $D_x,D_y$ are the 1D differentiation matrices. The diagonal operator contains the coefficient $c$ evaluated on the tensor grid $\{(x_i,y_j)\}_{i,j=1}^{N+1}$.

Following [2], the corner nodes are removed from the discretization, since they can be recalculated later in post-processing --- this is a property of tensor-product spectral methods. The boundary index sets are denoted $\iota_{l},\iota_{r},\iota_{b},\iota_{t}$ for the left, right, bottom, and top edges, and their union is $\iota_{\Gamma}$. The inner index set, denoted $\iota_{i}$, contains all remaining nodes strictly inside the element. The outgoing and incoming impedance operators are \begin{align} \mathcal{I}_o =& \left[\begin{smallmatrix} -D_x \otimes I\\ D_x \otimes I\\ -I \otimes D_y\\ I \otimes D_y \end{smallmatrix}\right](\iota_{\Gamma},:) -\eta\,I(\iota_{\Gamma},:), & \mathcal{I}_i =& \left[\begin{smallmatrix} -D_x \otimes I\\ D_x \otimes I\\ -I \otimes D_y\\ I \otimes D_y \end{smallmatrix}\right](\iota_{\Gamma},:) +\eta\,I(\iota_{\Gamma},:), \end{align} where $I$ is the identity of appropriate size and $\iota_{\Gamma}=\iota_l\cup\iota_r\cup\iota_b\cup\iota_t$ denotes the boundary indices.

To apply incoming impedance conditions, the boundary rows of $\tilde L$ are replaced by $\mathcal I_i$, \begin{align} L(\iota_{\Gamma},:) =& \mathcal I_i , & L(\iota_i,:) =& \tilde L(\iota_i,:) . \end{align} The local Impedance-to-Impedance operator and interior contribution are \begin{align} T =& \mathcal I_o L^{-1} I(:,\iota_{\Gamma}) , & H =& \mathcal I_o L^{-1} I(:,\iota_i)\,\tilde b(\iota_i) , \end{align} where $\tilde b$ contains the local right-hand-side values.

The operators $T$ and $H$ yield the closed impedance relation \begin{align} \mathcal I_o u(\iota_{\Gamma}) =& T\,\mathcal I_i u(\iota_{\Gamma}) + H , \end{align} from which the full element solution follows by \begin{align} L u =& b, \end{align} where $b(\iota_{\Gamma}) = \mathcal I_i u(\iota_{\Gamma})$ and $b(\iota_i) = \tilde b(\iota_i)$.

Global discretization

For each element $e$, let $\alpha,\beta\in\{l,r,b,t\}$ denote its sides. The local relation between outgoing and incoming impedance data is \begin{align}\label{eqn:localiti} (\mathcal I_o u)^{(e)}_{\alpha} =& \sum_{\beta\in\{l,r,b,t\}} T^{(e)}_{\alpha\beta}(\mathcal I_i u)^{(e)}_{\beta} + H^{(e)}_{\alpha} , \end{align} where $T^{(e)}_{\alpha\beta}\in\mathbb C^{(N-1)\times(N-1)}$ and $H^{(e)}_{\alpha}\in\mathbb C^{(N-1)}$.

Transmission conditions enforce continuity of impedance data across shared faces: \begin{align} (\mathcal I_i u)^{(e_2)}_{\beta} =& (\mathcal I_o u)^{(e_1)}_{\alpha}, & (\mathcal I_i u)^{(e_1)}_{\alpha} =& (\mathcal I_o u)^{(e_2)}_{\beta}. \end{align} Combining these with the local ItI maps gives the face system \begin{align} \left[\begin{smallmatrix} I & -T^{(e_1)}_{\alpha\alpha}\\ -T^{(e_2)}_{\beta\beta} & I \end{smallmatrix}\right] \left[\begin{smallmatrix} (\mathcal I_i u)^{(e_2)}_{\beta}\\ (\mathcal I_i u)^{(e_1)}_{\alpha} \end{smallmatrix}\right] - \left[\begin{smallmatrix} \sum_{\gamma\ne\alpha} T^{(e_1)}_{\alpha\gamma}(\mathcal I_i u)^{(e_1)}_{\gamma}\\ \sum_{\gamma\ne\beta} T^{(e_2)}_{\beta\gamma}(\mathcal I_i u)^{(e_2)}_{\gamma} \end{smallmatrix}\right] =& \left[\begin{smallmatrix} H^{(e_1)}_{\alpha}\\ H^{(e_2)}_{\beta} \end{smallmatrix}\right]. \label{eqn:faceequations} \end{align}

Assembling all face equations yields the sparse global skeleton system \begin{align} M g =& \mathrm{RHS} , \end{align} where $g$ collects all interior incoming impedances and $\mathrm{RHS}$ stacks the local $H^{(e)}_{\alpha}$ contributions. Physical boundary sides contribute directly to the right-hand side.

Solver

The HPS solver applies the nested-dissection procedure to the spectral element system described above. This section details the face ordering that enables its direct solution, later recasted as a relaxation scheme.

Nested dissection

Local scheme

Let two elements $e_1$ and $e_2$ share an interior face $f$ through sides $\alpha$ of $e_1$ and $\beta$ of $e_2$. Their face equations (from \eqref{eqn:faceequations}) are \begin{align} (\mathcal I_i u)^{(e_2)}_{\beta} - \sum_{\gamma\in\{l,r,b,t\}} T^{(e_1)}_{\alpha\gamma}(\mathcal I_i u)^{(e_1)}_{\gamma} =& H^{(e_1)}_{\alpha},\\ (\mathcal I_i u)^{(e_1)}_{\alpha} - \sum_{\gamma\in\{l,r,b,t\}} T^{(e_2)}_{\beta\gamma}(\mathcal I_i u)^{(e_2)}_{\gamma} =& H^{(e_2)}_{\beta}. \end{align} We now group the internal and external incoming impedances as $x = \left[\begin{smallmatrix} (\mathcal I_i u)^{(e_2)}_{\beta} & (\mathcal I_i u)^{(e_1)}_{\alpha} \end{smallmatrix}\right]^\top$ and $y = \left[\begin{smallmatrix} (\mathcal I_i u)^{(e_1)}_{E_1} & (\mathcal I_i u)^{(e_2)}_{E_2} \end{smallmatrix}\right]^\top$ where $E_1=\{l,r,b,t\}\!\setminus\!\{\alpha\}$ and $E_2=\{l,r,b,t\}\!\setminus\!\{\beta\}$. With this notation the system becomes \begin{align} \underbrace{ \left[\begin{smallmatrix} I & -T^{(e_1)}_{\alpha\alpha}\\ -T^{(e_2)}_{\beta\beta} & I \end{smallmatrix}\right]}_{\scriptstyle A} x =& \underbrace{ \left[\begin{smallmatrix} T^{(e_1)}_{\alpha E_1} & 0\\ 0 & T^{(e_2)}_{\beta E_2} \end{smallmatrix}\right]}_{\scriptstyle B} y + \underbrace{ \left[\begin{smallmatrix} H^{(e_1)}_{\alpha}\\ H^{(e_2)}_{\beta} \end{smallmatrix}\right]}_{\scriptstyle h} . \label{eqn:localfacesystem} \end{align} Eliminating $x$ gives $x = A^{-1}B y + A^{-1}h$, substituting into the outgoing relations \eqref{eqn:localiti} produces the fused pair operator \begin{align} T_{\mathrm{pair}} =& \underbrace{D}_{\scriptstyle \left[\begin{smallmatrix} T^{(e_1)}_{E_1E_1} & 0\\ 0 & T^{(e_2)}_{E_2E_2} \end{smallmatrix}\right]} - \underbrace{C}_{\scriptstyle \left[\begin{smallmatrix} T^{(e_1)}_{E_1\alpha} & 0\\ 0 & T^{(e_2)}_{E_2\beta} \end{smallmatrix}\right]} A^{-1} B , & H_{\mathrm{pair}} =& \left[\begin{smallmatrix} H^{(e_1)}_{E_1}\\ H^{(e_2)}_{E_2} \end{smallmatrix}\right] + C A^{-1} \left[\begin{smallmatrix} H^{(e_1)}_{\alpha}\\ H^{(e_2)}_{\beta} \end{smallmatrix}\right] . \end{align} where $T_{\mathrm{pair}}$ is clearly a Schur complement.

Global scheme

Figure 1
Face merging and sparsity patterns for a $4\times 4$ element mesh
  1. Grid 1. Faces 1 to 8 are eliminated, merging pairs of elements. These faces' dofs form the top left $1\times 1$-face-block diagonal part of $M_1$ since they are not linked directly between each other, but through another face: e.g. face 1 is related to face 2 through face 17.
  2. Grid 2. Faces 1 to 8 are eliminated by pairs, merging $1\times 2$ subdomains by one of their largest sides. These faces' dofs form the top-left $2\times 2$-face-block diagonal matrix.
  3. Grid 3. Faces 1 to 4 are eliminated by pairs, merging $2\times 2$ subdomains. These faces' dofs form the top-left $2\times 2$-face-block diagonal matrix.
  4. Grid 4. Faces 1 to 4 are now fully coupled; $M_4$ is dense.


Figure 1 illustrates the merging procedure of the global nested dissection scheme. The interior faces are ordered so that the first batch $\mathcal F$ forms an independent set under this ordering—% a property essential to eliminate them simultaneously. With this structure, the assembled face system takes the block form \begin{align} M_1 =& \begin{bmatrix} A_1 & B_1\\ C_1 & D_1 \end{bmatrix}, \end{align} where $A_1=\operatorname{blkdiag}\{A_f:\,f\in\mathcal F\}$ collects the local matrices corresponding to $\mathcal F$. Eliminating this batch yields the Schur complement \begin{align} M_2 =& D_1 - C_1 A_1^{-1} B_1 . \end{align} For any fused pair $p$ corresponding to a face $f\in\mathcal F$, let $\iota^{(p)}$ denote the ordered index block in $M_2$ corresponding to the merged element $(e_1,e_2)$. Since $A_1$ is block-diagonal and $\iota^{(p)}$ involves only the face $f$ shared by the pair, the principal block of $M_2$ restricted to these indices is \begin{align} M_2\left(\iota^{(p)},\iota^{(p)}\right) =& D_f - C_f A_f^{-1} B_f = T_{\mathrm{pair}} . \end{align} Therefore, the intra-subdomain blocks of $M_2$ are precisely the fused ItI operators produced by the HPS merge at level 1, while the off-diagonal couplings in $M_2$ connect these merged pairs across the level-1 separators. Because the ordering groups the faces of each fused pair into contiguous index ranges that are graph-separated from one another, the resulting matrix admits the block partition \begin{align} M_2 =& \begin{bmatrix} A_2 & B_2\\ C_2 & D_2 \end{bmatrix}, \end{align} where $A_2=\operatorname{blkdiag}\{T_{\mathrm{pair}}(f): f\in\mathcal F\}$. Repeating this construction on successive independent sets—and using the associativity of Schur complements [14]—recursively yields the nested-dissection factorization of the spectral-element face system.

Solver recast as a multigrid relaxation scheme

The block-inverse relation introduced in [15] takes the form \begin{align} M^{-1} = \begin{bmatrix} A & B\\ C & D \end{bmatrix}^{-1} = \underbrace{\begin{bmatrix} A^{-1} & 0\\ 0 & 0 \end{bmatrix}}_{F} + \underbrace{\begin{bmatrix} -A^{-1}B\\ I \end{bmatrix}}_{P} \underbrace{\left(D - C A^{-1} B\right)^{-1}}_{S^{-1}} \underbrace{\begin{bmatrix} 0 & I \end{bmatrix}}_{R} \left( I - M \underbrace{\begin{bmatrix} A^{-1} & 0\\ 0 & 0 \end{bmatrix}}_{F} \right). \end{align} This identity motivates the definition of a recursive multigrid algorithm without post-smoothing rather than a single relaxation step: the local inversion $A^{-1}$ acts as a smoother, and the reduced system $S$ defines the next level. The recursive iteration reads \begin{align} \mathrm{MG}(M) =& F + P S^{-1} R (I - M F), \end{align} where $S^{-1}$ is obtained by applying the same procedure to $S$. A single coarse call yields a V-cycle; multiple ones define a $\gamma$-cycle—both fully consistent with the hierarchical merging in the HPS method. We employ MG as a preconditioner for flexible GMRES, with the coarse solve performed by a fixed number of unpreconditioned GMRES iterations.

Numerical experiments

We consider one of the problems from [2], with $b(\mathbf{x}) = 1.5e^{-160[(x-0.5)^2+(y-0.5)^2]}$ and $s(\mathbf{x}) = -\kappa^2 b(\mathbf{x}) e^{i\kappa x}$, representing scattering by a Gaussian bump. We use polynomial degree 16, a residual tolerance of $10^{-8}$, and a frequency giving 9.6 points per wavelength, yielding about $10^{-7}$ accuracy for roughly one million degrees of freedom before skeletonization.

PMem: Preconditioner memory footprint [MB], It: Flexible GMRES iterations with restart at 60, Bt: Build time [s], St: Solve time [s], c.i.: coarse GMRES iterations. Results for $10^6$ dofs at $9.6$ points per wavelength.

Case PMem It Bt St
Unpreconditioned 0 669 0 85
Exact coarse space 3108 1 75 4
#levels PMem Bt $\gamma = 1$ $\gamma = 2$
4 c.i. 5 c.i. 6 c.i. 2 c.i. 3 c.i. 4 c.i.
It St It St It St It St It St It St
2 46 6 37 53 22 44 16 45 83 71 32 44 18 39
3 460 15 23 42 15 40 11 40 24 55 11 42 7 42
4 805 20 18 30 12 27 9 27 11 48 6 41 4 43
5 1202 27 13 36 9 34 7 36 5 55 3 71 1 52
6 1527 31 11 22 7 18 5 16 2 47 1 46
7 1897 38 9 28 6 26 4 23 1 90
8 2185 43 8 19 5 15 4 15
9 2502 45 8 28 5 23 4 24
10 2724 52 7 19 5 17 4 15
11 2946 63 7 26 5 24 4 24
12 3051 67 3 11 2 8 1 6
Solution of the variable-coefficient Helmholtz problem.

Figure. Solution of the variable-coefficient Helmholtz problem.



Figure 2 shows the solution, and Tables 1 reports results obtained in MATLAB, varying the number of levels. The table lists memory footprint, build time, total iterations, and solve time for different fixed coarse iteration counts and $\gamma$ values. The problem was run on a laptop with 32 GB RAM and a hybrid processor (6 hyper-threading cores @ 4.7 GHz and 8 cores @ 3.5 GHz). Although cache effects favor certain configurations, an overall timing trend can be observed. The method demonstrates that performance can be tuned to available memory and the number of solves required, while being faster than the unpreconditioned case in many configurations.

Bibliography

[1] M. Outrata, J.P. Lucero Lorca, Submitted in the proceedings of DD29 (2025)

[2] J.P. Lucero Lorca, N. Beams, D. Beecroft, A. Gillman, SIAM Journal on Scientific Computing 46(1), A80 (2024). doi:10.1137/21M1463380

[3] P. Martinsson, Journal of Computational Physics 242, 460 (2013)

[4] P.G. Martinsson, arXiv (2015)

[5] A. Gillman, P.G. Martinsson, SIAM Journal on Scientific Computing 36(4), A2023 (2014)

[6] A. Gillman, A.H. Barnett, P.G. Martinsson, BIT Numerical Mathematics 55(1), 141 (2015)

[7] T. Babb, A. Gillman, S. Hao, P.G. Martinsson, BIT Numerical Mathematics 58(4), 851 (2018)

[8] B. Després, PhD Thesis 2015ISAM0011 (1991)

[9] A.G. Kyurkchan, N.I. Smirnova, Mathematical Modeling in Diffraction Theory (2016)

[10] L.E. Kinsler, A.R. Frey, A.B. Coppens, J.V. Sanders., Fundamentals of acoustics (Wiley, 2000)

[11] L.M. J., Waves in Fluids (Cambridge University Press, Cambridge, England, 1978)

[12] S.N. Chandler-Wilde, Mathematical Methods in the Applied Sciences 20(10), 813 (1997)

[13] I.G. Graham, S.A. Sauter, Mathematics of Computation 89, 105 (2020)

[14] D.E. Crabtree, E.V. Haynsworth, Proc. Am. Math. Soc. 22(2), 364 (1969)

[15] J.P. Lucero Lorca, D. Rosenberg, I. Jankov, C. McCoid, M. Gander, arXiv preprint (2025)