SIAM JOURNAL ON SCIENTIFIC COMPUTING

FAST UPDATING MULTIPOLE COULOMBIC POTENTIAL CALCULATION
HÖft TA and Alpert BK
We present a numerical method to efficiently and accurately re-compute the Coulomb potential of a large ensemble of charged particles after a subset of the particles undergoes a change of position. Errors are bounded even after a large number of such shifts, making it practical for use in Monte Carlo Markov chain methods in molecular dynamics, computational astrophysics, computational chemistry, and other applications. The method uses truncated multipole expansions of the potential energy functional and a tree decomposition of the computational domain to reduce the computational complexity. Computational costs scale logarithmically in the size of the problem. Scaling, accuracy, and efficiency are confirmed with numerical experiments. The new method outperforms a direct calculation for moderate problem sizes.
NONUNIFORM FOURIER TRANSFORMS FOR RIGID-BODY AND MULTI-DIMENSIONAL ROTATIONAL CORRELATIONS
Bajaj C, Bauer B, Bettadapura R and Vollrath A
The task of evaluating correlations is central to computational structural biology. The rigid-body correlation problem seeks the rigid-body transformation (, ), ∈ SO(3), ∈ ℝ that maximizes the correlation between a pair of input scalar-valued functions representing molecular structures. Exhaustive solutions to the rigid-body correlation problem take advantage of the fast Fourier transform to achieve a speedup either with respect to the sought translation or rotation. We present PF, a new exhaustive solution, based on the non-equispaced SO(3) Fourier transform, to the rigid-body correlation problem; unlike previous solutions, ours achieves a combination of translational and rotational speedups without requiring equispaced grids. PF can be straightforwardly applied to a variety of problems in protein structure prediction and refinement that involve correlations under rigid-body motions of the protein. Additionally, we show how it applies, along with an appropriate flexibility model, to analogs of the above problems in which the flexibility of the protein is relevant.
AN EFFICIENT HIGHER-ORDER FAST MULTIPOLE BOUNDARY ELEMENT SOLUTION FOR POISSON-BOLTZMANN BASED MOLECULAR ELECTROSTATICS
Bajaj C, Chen SC and Rand A
In order to compute polarization energy of biomolecules, we describe a boundary element approach to solving the linearized Poisson-Boltzmann equation. Our approach combines several important features including the derivative boundary formulation of the problem and a smooth approximation of the molecular surface based on the algebraic spline molecular surface. State of the art software for numerical linear algebra and the kernel independent fast multipole method is used for both simplicity and efficiency of our implementation. We perform a variety of computational experiments, testing our method on a number of actual proteins involved in molecular docking and demonstrating the effectiveness of our solver for computing molecular polarization energy.
LOCAL ORTHOGONAL CUTTING METHOD FOR COMPUTING MEDIAL CURVES AND ITS BIOMEDICAL APPLICATIONS
Jiao X, Einstein DR and Dyedov V
Medial curves have a wide range of applications in geometric modeling and analysis (such as shape matching) and biomedical engineering (such as morphometry and computer assisted surgery). The computation of medial curves poses significant challenges, both in terms of theoretical analysis and practical efficiency and reliability. In this paper, we propose a definition and analysis of medial curves and also describe an efficient and robust method called local orthogonal cutting (LOC) for computing medial curves. Our approach is based on three key concepts: a local orthogonal decomposition of objects into substructures, a differential geometry concept called the interior center of curvature (ICC), and integrated stability and consistency tests. These concepts lend themselves to robust numerical techniques and result in an algorithm that is efficient and noise resistant. We illustrate the effectiveness and robustness of our approach with some highly complex, large-scale, noisy biomedical geometries derived from medical images, including lung airways and blood vessels. We also present comparisons of our method with some existing methods.
AN EFFICIENT NUMERICAL METHOD FOR THE SOLUTION OF THE L(2) OPTIMAL MASS TRANSFER PROBLEM
Haber E, Rehman T and Tannenbaum A
In this paper we present a new computationally efficient numerical scheme for the minimizing flow approach for the computation of the optimal L(2) mass transport mapping. In contrast to the integration of a time dependent partial differential equation proposed in [S. Angenent, S. Haker, and A. Tannenbaum, SIAM J. Math. Anal., 35 (2003), pp. 61-97], we employ in the present work a direct variational method. The efficacy of the approach is demonstrated on both real and synthetic data.
ALGORITHMS FOR AREA PRESERVING FLOWS
Kublik C, Esedoḡlu S and Fessler JA
We propose efficient and accurate algorithms for computing certain area preserving geometric motions of curves in the plane, such as area preserving motion by curvature. These schemes are based on a new class of diffusion generated motion algorithms using signed distance functions. In particular, they alternate two very simple and fast operations, namely convolution with the Gaussian kernel and construction of the distance function, to generate the desired geometric flow in an unconditionally stable manner. We present applications of these area preserving flows to large scale simulations of coarsening.
A FAST ITERATIVE METHOD FOR SOLVING THE EIKONAL EQUATION ON TETRAHEDRAL DOMAINS
Fu Z, Kirby RM and Whitaker RT
Generating numerical solutions to the eikonal equation and its many variations has a broad range of applications in both the natural and computational sciences. Efficient solvers on cutting-edge, parallel architectures require new algorithms that may not be theoretically , but that are designed to allow asynchronous solution updates and have limited memory access patterns. This paper presents a parallel algorithm for solving the eikonal equation on fully unstructured tetrahedral meshes. The method is appropriate for the type of fine-grained parallelism found on modern massively-SIMD architectures such as graphics processors and takes into account the particular constraints and capabilities of these computing platforms. This work builds on previous work for solving these equations on triangle meshes; in this paper we adapt and extend previous two-dimensional strategies to accommodate three-dimensional, unstructured, tetrahedralized domains. These new developments include a local update strategy with data compaction for tetrahedral meshes that provides solutions on both serial and parallel architectures, with a generalization to inhomogeneous, anisotropic speed functions. We also propose two new update schemes, specialized to mitigate the natural data increase observed when moving to three dimensions, and the data structures necessary for efficiently mapping data to parallel SIMD processors in a way that maintains . Finally, we present descriptions of the implementations for a single CPU, as well as multicore CPUs with shared memory and SIMD architectures, with comparative results against state-of-the-art eikonal solvers.
An Adaptive Moving Mesh Method for Forced Curve Shortening Flow
Mackenzie JA, Nolan M, Rowlatt CF and Insall RH
We propose a novel adaptive moving mesh method for the numerical solution of a forced curve shortening geometric evolution equation. Control of the mesh quality is obtained using a tangential mesh velocity derived from a mesh equidistribution principle, where a positive adaptivity measure or monitor function is approximately equidistributed along the evolving curve. Central finite differences are used to discretize in space the governing evolution equation for the position vector, and a second-order implicit scheme is used for the temporal integration. Simulations are presented indicating the generation of meshes which resolve areas of high curvature and are of second-order accuracy. Furthermore, the new method delivers improved solution accuracy compared to the use of uniform arc-length meshes.
PARAMETER AND UNCERTAINTY ESTIMATION FOR DYNAMICAL SYSTEMS USING SURROGATE STOCHASTIC PROCESSES
Chung M, Binois M, Gramacy RB, Bardsley JM, Moquin DJ, Smith AP and Smith AM
Inference on unknown quantities in dynamical systems via observational data is essential for providing meaningful insight, furnishing accurate predictions, enabling robust control, and establishing appropriate designs for future experiments. Merging mathematical theory with empirical measurements in a statistically coherent way is critical and challenges abound, e.g., ill-posedness of the parameter estimation problem, proper regularization and incorporation of prior knowledge, and computational limitations. To address these issues, we propose a new method for learning parameterized dynamical systems from data. We first customize and fit a surrogate stochastic process directly to observational data, front-loading with statistical learning to respect prior knowledge (e.g., smoothness), cope with challenging data features like heteroskedasticity, heavy tails, and censoring. Then, samples of the stochastic process are used as "surrogate data" and point estimates are computed via ordinary point estimation methods in a modular fashion. Attractive features of this two-step approach include modularity and trivial parallelizability. We demonstrate its advantages on a predator-prey simulation study and on a real-world application involving within-host influenza virus infection data paired with a viral kinetic model, with comparisons to a more conventional Markov chain Monte Carlo (MCMC) based Bayesian approach.
EXPONENTIAL TIME DIFFERENCING FOR HODGKIN-HUXLEY-LIKE ODES
Börgers C and Nectow AR
Several authors have proposed the use of exponential time differencing (ETD) for Hodgkin-Huxley-like partial and ordinary differential equations (PDEs and ODEs). For Hodgkin-Huxley-like PDEs, ETD is attractive because it can deal effectively with the stiffness issues that diffusion gives rise to. However, large neuronal networks are often simulated assuming "space-clamped" neurons, i.e., using the Hodgkin-Huxley ODEs, in which there are no diffusion terms. Our goal is to clarify whether ETD is a good idea even in that case. We present a numerical comparison of first- and second-order ETD with standard explicit time-stepping schemes (Euler's method, the midpoint method, and the classical fourth-order Runge-Kutta method). We find that in the standard schemes, the stable computation of the very rapid rising phase of the action potential often forces time steps of a small fraction of a millisecond. This can result in an expensive calculation yielding greater overall accuracy than needed. Although it is tempting at first to try to address this issue with adaptive or fully implicit time-stepping, we argue that neither is effective here. The main advantage of ETD for Hodgkin-Huxley-like systems of ODEs is that it allows underresolution of the rising phase of the action potential without causing instability, using time steps on the order of one millisecond. When high quantitative accuracy is not necessary and perhaps, because of modeling inaccuracies, not even useful, ETD allows much faster simulations than standard explicit time-stepping schemes. The second-order ETD scheme is found to be substantially more accurate than the first-order one even for large values of Δ.
ANALYSIS AND DESIGN OF JUMP COEFFICIENTS IN DISCRETE STOCHASTIC DIFFUSION MODELS
Meinecke L, Engblom S, Hellander A and Lötstedt P
In computational systems biology, the mesoscopic model of reaction-diffusion kinetics is described by a continuous time, discrete space Markov process. To simulate diffusion stochastically, the jump coefficients are obtained by a discretization of the diffusion equation. Using unstructured meshes to represent complicated geometries may lead to negative coefficients when using piecewise linear finite elements. Several methods have been proposed to modify the coefficients to enforce the nonnegativity needed in the stochastic setting. In this paper, we present a method to quantify the error introduced by that change. We interpret the modified discretization matrix as the exact finite element discretization of a perturbed equation. The forward error, the error between the analytical solutions to the original and the perturbed equations, is bounded by the backward error, the error between the diffusion of the two equations. We present a backward analysis algorithm to compute the diffusion coefficient from a given discretization matrix. The analysis suggests a new way of deriving nonnegative jump coefficients that minimizes the backward error. The theory is tested in numerical experiments indicating that the new method is superior and also minimizes the forward error.
LSRN: A PARALLEL ITERATIVE SOLVER FOR STRONGLY OVER- OR UNDERDETERMINED SYSTEMS
Meng X, Saunders MA and Mahoney MW
We describe a parallel iterative least squares solver named LSRN that is based on random normal projection. LSRN computes the min-length solution to min ‖ - ‖, where ∈ ℝ with ≫ or ≪ , and where may be rank-deficient. Tikhonov regularization may also be included. Since is involved only in matrix-matrix and matrix-vector multiplications, it can be a dense or sparse matrix or a linear operator, and LSRN automatically speeds up when is sparse or a fast linear operator. The preconditioning phase consists of a random normal projection, which is embarrassingly parallel, and a singular value decomposition of size ⌈γ min()⌉ × min(), where γ is moderately larger than 1, e.g., γ = 2. We prove that the preconditioned system is well-conditioned, with a strong concentration result on the extreme singular values, and hence that the number of iterations is fully predictable when we apply LSQR or the Chebyshev semi-iterative method. As we demonstrate, the Chebyshev method is particularly efficient for solving large problems on clusters with high communication cost. Numerical results show that on a shared-memory machine, LSRN is very competitive with LAPACK's DGELSD and a fast randomized least squares solver called Blendenpik on large dense problems, and it outperforms the least squares solver from SuiteSparseQR on sparse problems without sparsity patterns that can be exploited to reduce fill-in. Further experiments show that LSRN scales well on an Amazon Elastic Compute Cloud cluster.
A FAST ITERATIVE METHOD FOR SOLVING THE EIKONAL EQUATION ON TRIANGULATED SURFACES
Fu Z, Jeong WK, Pan Y, Kirby RM and Whitaker RT
This paper presents an efficient, fine-grained parallel algorithm for solving the Eikonal equation on triangular meshes. The Eikonal equation, and the broader class of Hamilton-Jacobi equations to which it belongs, have a wide range of applications from geometric optics and seismology to biological modeling and analysis of geometry and images. The ability to solve such equations accurately and efficiently provides new capabilities for exploring and visualizing parameter spaces and for solving inverse problems that rely on such equations in the forward model. Efficient solvers on state-of-the-art, parallel architectures require new algorithms that are not, in many cases, optimal, but are better suited to synchronous updates of the solution. In previous work [W. K. Jeong and R. T. Whitaker, SIAM J. Sci. Comput., 30 (2008), pp. 2512-2534], the authors proposed the fast iterative method (FIM) to efficiently solve the Eikonal equation on regular grids. In this paper we extend the fast iterative method to solve Eikonal equations efficiently on triangulated domains on the CPU and on parallel architectures, including graphics processors. We propose a new local update scheme that provides solutions of first-order accuracy for both architectures. We also propose a novel triangle-based update scheme and its corresponding data structure for efficient irregular data mapping to parallel single-instruction multiple-data (SIMD) processors. We provide detailed descriptions of the implementations on a single CPU, a multicore CPU with shared memory, and SIMD architectures with comparative results against state-of-the-art Eikonal solvers.
IMAGE-DRIVEN BIOPHYSICAL TUMOR GROWTH MODEL CALIBRATION
Scheufele K, Subramanian S, Mang A, Biros G and Mehl M
We present a novel formulation for the calibration of a biophysical tumor growth model from a single-time snapshot, multiparametric magnetic resonance imaging (MRI) scan of a glioblastoma patient. Tumor growth models are typically nonlinear parabolic partial differential equations (PDEs). Thus, we have to generate a second snapshot to be able to extract significant information from a single patient snapshot. We create this two-snapshot scenario as follows. We use an atlas (an average of several scans of healthy individuals) as a substitute for an earlier, pretumor, MRI scan of the patient. Then, using the patient scan and the atlas, we combine image-registration algorithms and parameter estimation algorithms to achieve a better estimate of the healthy patient scan and the tumor growth parameters that are consistent with the data. Our scheme is based on our recent work (Scheufele et al., to appear), but we apply a different and novel scheme where the tumor growth simulation in contrast to the previous work is executed in the patient brain domain and not in the atlas domain yielding more meaningful patient-specific results. As a basis, we use a PDE-constrained optimization framework. We derive a modified Picard-iteration-type solution strategy in which we alternate between registration and tumor parameter estimation in a new way. In addition, we consider an sparsity constraint on the initial condition for the tumor and integrate it with the new joint inversion scheme. We solve the sub-problems with a reduced space, inexact Gauss-Newton-Krylov/quasi-Newton method. We present results using real brain data with synthetic tumor data that show that the new scheme reconstructs the tumor parameters in a more accurate and reliable way compared to our earlier scheme.
EXPLICIT LEAST-DEGREE BOUNDARY FILTERS FOR DISCONTINUOUS GALERKIN
Nguyen DM and Peters J
Convolving the output of Discontinuous Galerkin (DG) computations using spline filters can improve both smoothness and accuracy of the output. At domain boundaries, these filters have to be one-sided for non-periodic boundary conditions. Recently, position-dependent smoothness-increasing accuracy-preserving (PSIAC) filters were shown to be a superset of the well-known one-sided RLKV and SRV filters. Since PSIAC filters can be formulated symbolically, PSIAC filtering amounts to forming linear products with local DG output and so offers a more stable and efficient implementation. The paper introduces a new class of PSIAC filters NP that have small support and are piecewise constant. Extensive numerical experiments for the canonical hyperbolic test equation show NP filters outperform the more complex known boundary filters. NP filters typically reduce the error in the boundary region below that of the interior where optimally superconvergent symmetric filters of the same support are applied. NP filtering can be implemented as forming linear combinations of the data with short rational weights. Exact derivatives of the convolved output are easy to compute.
A LAGRANGIAN GAUSS-NEWTON-KRYLOV SOLVER FOR MASS- AND INTENSITY-PRESERVING DIFFEOMORPHIC IMAGE REGISTRATION
Mang A and Ruthotto L
We present an efficient solver for diffeomorphic image registration problems in the framework of (LDDMM). We use an optimal control formulation, in which the velocity field of a hyperbolic PDE needs to be found such that the distance between the final state of the system (the transformed/transported template image) and the observation (the reference image) is minimized. Our solver supports both stationary and non-stationary (i.e., transient or time-dependent) velocity fields. As transformation models, we consider both the transport equation (assuming intensities are preserved during the deformation) and the continuity equation (assuming mass-preservation). We consider the reduced form of the optimal control problem and solve the resulting unconstrained optimization problem using a discretize-then-optimize approach. A key contribution is the elimination of the PDE constraint using a Lagrangian hyperbolic PDE solver. Lagrangian methods rely on the concept of characteristic curves. We approximate these curves using a fourth-order Runge-Kutta method. We also present an efficient algorithm for computing the derivatives of the final state of the system with respect to the velocity field. This allows us to use fast Gauss-Newton based methods. We present quickly converging iterative linear solvers using spectral preconditioners that render the overall optimization efficient and scalable. Our method is embedded into the image registration framework FAIR and, thus, supports the most commonly used similarity measures and regularization functionals. We demonstrate the potential of our new approach using several synthetic and real world test problems with up to 14.7 million degrees of freedom.
A SEMI-LAGRANGIAN TWO-LEVEL PRECONDITIONED NEWTON-KRYLOV SOLVER FOR CONSTRAINED DIFFEOMORPHIC IMAGE REGISTRATION
Mang A and Biros G
We propose an efficient numerical algorithm for the solution of diffeomorphic image registration problems. We use a variational formulation constrained by a partial differential equation (PDE), where the constraints are a scalar transport equation. We use a pseudospectral discretization in space and second-order accurate semi-Lagrangian time stepping scheme for the transport equations. We solve for a stationary velocity field using a preconditioned, globalized, matrix-free Newton-Krylov scheme. We propose and test a two-level Hessian preconditioner. We consider two strategies for inverting the preconditioner on the coarse grid: a nested preconditioned conjugate gradient method (exact solve) and a nested Chebyshev iterative method (inexact solve) with a fixed number of iterations. We test the performance of our solver in different synthetic and real-world two-dimensional application scenarios. We study grid convergence and computational efficiency of our new scheme. We compare the performance of our solver against our initial implementation that uses the same spatial discretization but a standard, explicit, second-order Runge-Kutta scheme for the numerical time integration of the transport equations and a single-level preconditioner. Our improved scheme delivers significant speedups over our original implementation. As a highlight, we observe a 20 speedup for a two dimensional, real world multi-subject medical image registration problem.
HIERARCHICAL ORTHOGONAL MATRIX GENERATION AND MATRIX-VECTOR MULTIPLICATIONS IN RIGID BODY SIMULATIONS
Fang F, Huang J, Huber G, McCammon JA and Zhang BO
In this paper, we apply the hierarchical modeling technique and study some numerical linear algebra problems arising from the Brownian dynamics simulations of biomolecular systems where molecules are modeled as ensembles of rigid bodies. Given a rigid body consisting of beads, the 6×3 transformation matrix that maps the force on each bead to 's translational and rotational forces (a 6 × 1 vector), and the row space of , we show how to explicitly construct the (3 - 6) × 3 matrix consisting of (3 - 6) orthonormal basis vectors of (orthogonal complement of ) using only operations and storage. For applications where only the matrix-vector multiplications and are needed, we introduce asymptotically optimal hierarchical algorithms without explicitly forming . Preliminary numerical results are presented to demonstrate the performance and accuracy of the numerical algorithms.
CLAIRE: A DISTRIBUTED-MEMORY SOLVER FOR CONSTRAINED LARGE DEFORMATION DIFFEOMORPHIC IMAGE REGISTRATION
Mang A, Gholami A, Davatzikos C and Biros G
With this work we release CLAIRE, a distributed-memory implementation of an effective solver for constrained large deformation diifeomorphic image registration problems in three dimensions. We consider an optimal control formulation. We invert for a stationary velocity field that parameterizes the deformation map. Our solver is based on a globalized, preconditioned, inexact reduced space Gauss‒Newton‒Krylov scheme. We exploit state-of-the-art techniques in scientific computing to develop an eifective solver that scales to thousands of distributed memory nodes on high-end clusters. We present the formulation, discuss algorithmic features, describe the software package, and introduce an improved preconditioner for the reduced space Hessian to speed up the convergence of our solver. We test registration performance on synthetic and real data. We Demonstrate registration accuracy on several neuroimaging datasets. We compare the performance of our scheme against diiferent flavors of the Demons algorithm for diifeomorphic image registration. We study convergence of our preconditioner and our overall algorithm. We report scalability results on state-of-the-art supercomputing platforms. We Demonstrate that we can solve registration problems for clinically relevant data sizes in two to four minutes on a standard compute node with 20 cores, attaining excellent data fidelity. With the present work we achieve a speedup of (on average) 5× with a peak performance of up to 17× compared to our former work.
MOLNs: A CLOUD PLATFORM FOR INTERACTIVE, REPRODUCIBLE, AND SCALABLE SPATIAL STOCHASTIC COMPUTATIONAL EXPERIMENTS IN SYSTEMS BIOLOGY USING PyURDME
Drawert B, Trogdon M, Toor S, Petzold L and Hellander A
Computational experiments using spatial stochastic simulations have led to important new biological insights, but they require specialized tools and a complex software stack, as well as large and scalable compute and data analysis resources due to the large computational cost associated with Monte Carlo computational workflows. The complexity of setting up and managing a large-scale distributed computation environment to support productive and reproducible modeling can be prohibitive for practitioners in systems biology. This results in a barrier to the adoption of spatial stochastic simulation tools, effectively limiting the type of biological questions addressed by quantitative modeling. In this paper, we present PyURDME, a new, user-friendly spatial modeling and simulation package, and MOLNs, a cloud computing appliance for distributed simulation of stochastic reaction-diffusion models. MOLNs is based on IPython and provides an interactive programming platform for development of sharable and reproducible distributed parallel computational experiments.
FAST EXPANSION INTO HARMONICS ON THE DISK: A STEERABLE BASIS WITH FAST RADIAL CONVOLUTIONS
Marshall NF, Mickelin O and Singer A
We present a fast and numerically accurate method for expanding digitized images representing functions on [-1, 1] supported on the disk in the harmonics (Dirichlet Laplacian eigenfunctions) on the disk. Our method, which we refer to as the Fast Disk Harmonics Transform (FDHT), runs in operations. This basis is also known as the Fourier-Bessel basis, and it has several computational advantages: it is orthogonal, ordered by frequency, and steerable in the sense that images expanded in the basis can be rotated by applying a diagonal transform to the coefficients. Moreover, we show that convolution with radial functions can also be efficiently computed by applying a diagonal transform to the coefficients.