Time-domain event detection using single-instruction, multiple-thread gpGPU architectures in single-molecule biophysical data
Discrete amplitude levels in ordered, time-domain data often represent different underlying latent states of the system that is being interrogated. Analysis and feature extraction from these data sets generally require considering the order of each individual point; this approach cannot take advantage of contemporary general-purpose graphics processing units (gpGPU) and single-instruction multiple-data (SIMD) instruction set architectures. Two sources of such data from single-molecule biological measurements are nanopores and single-molecule field effect transistor (smFET) nanotube devices; both generate streams of time-ordered current or voltage data, typically sampled near 1 MS/s, with run times of minutes, yielding terabyte-scale datasets. Here, we present three gpGPU-based algorithms to overcome limitations associated with serial event detection in time series data, resulting in a 250× improvement in the rate with which we can detect salient features in nanopore and smFET datasets. The code is freely available.
MCGPU-PET: An Open-Source Real-Time Monte Carlo PET Simulator
Monte Carlo (MC) simulations are commonly used to model the emission, transmission, and/or detection of radiation in Positron Emission Tomography (PET). In this work, we introduce a new open-source MC software for PET simulation, MCGPU-PET, which has been designed to fully exploit the computing capabilities of modern GPUs to simulate the acquisition of more than 100 million coincidences per second from voxelized sources and material distributions. The new simulator is an extension of the PENELOPE-based MCGPU code previously used in cone-beam CT and mammography applications. We validated the accuracy of the accelerated code by comparing it to GATE and PeneloPET simulations achieving an agreement within 10 percent approximately. As an example application of the code for fast estimation of PET coincidences, a scan of the NEMA IQ phantom was simulated. A fully 3D sinogram with 6382 million true coincidences and 731 million scatter coincidences was generated in 54 seconds in one GPU. MCGPU-PET provides an estimation of true and scatter coincidences and spurious background (for positron-gamma emitters such as I) at a rate 3 orders of magnitude faster than CPU-based MC simulators. This significant speed-up enables the use of the code for accurate scatter and prompt-gamma background estimations within an iterative image reconstruction process.
: A program for combined quantum mechanical and molecular mechanical modeling and simulations
Combined quantum mechanical and molecular mechanical (QM/MM) methods play an important role in multiscale modeling and simulations. is a general-purpose program for single-point calculations, geometry optimizations, transition state optimizations, and molecular dynamics (MD) at the QM/MM level. It calls a QM package and an MM package to perform the required single-level calculations and combines them into a QM/MM energy by a variety of schemes. supports GAMESS-US, , and ORCA as QM packages and TINKER as the MM package. Four types of treatments are available for embedding the QM subsystem in the MM environment: mechanical embedding with gas-phase calculations of the QM region, electronic embedding that allows polarization of the QM region by the MM environment, polarizable embedding for mutual polarization of the QM and MM regions, and flexible embedding for both mutual polarization and partial charge transfer between the QM and MM regions. Boundaries between QM and MM regions that pass through covalent bonds can be treated by several methods, including the redistributed charge (RC) scheme, redistributed charge and dipole (RCD) scheme, balanced-RC, balanced-RCD, screened charge scheme that takes account of charge penetration effects, and smeared charge scheme that delocalizes the MM charges near the QM-MM boundary. Geometry optimization can be done using the optimizer implemented in or the Berny optimizer in through external calls to . Molecular dynamics simulations can be performed at the pure-MM level, pure-QM level, fixed-partitioning QM/MM level, and adaptive-partitioning QM/MM level. The adaptive-partitioning treatments permit on-the-fly relocation of the QM-MM boundary by dynamically reclassifying atoms or groups into the QM or MM subsystems.
Electrical impulse characterization along actin filaments in pathological conditions
We present an interactive Mathematica notebook that characterizes the electrical impulses along actin filaments in both muscle and non-muscle cells for a wide range of physiological and pathological conditions. The simplicity of the theoretical formulation, and high performance of the Mathematica software, enable the analysis of multiple conditions without computational restrictions. The program is based on a multi-scale (atomic → monomer → filament) approach capable of accounting for the atomistic details of a protein molecular structure, its biological environment, and their impact on the travel distance, velocity, and attenuation of monovalent ionic wave packets propagating along microfilaments. The interactive component allows investigators to choose the experimental conditions (intracellular Vs in vitro), nucleotide state (ATP Vs ADP), actin isoform (alpha, gamma, beta, and muscle or non-muscle cell), as well as a conformation model that covers a variety of mutants and wild-type (the control) actin filament. We used the computational tool to analyze environmental changes such as temperature effects and pH changes of the surrounding solutions, as well as structural changes to an actin monomer due to radius changes. Additionally, we investigated for the first time the electrostatic consequences of actin mutations from different disease conditions. These studies may provide an unprecedented molecular understanding of why and how age, inheritance, and disease conditions induce dysfunctions in the biophysical mechanisms underlying the propagation of electrical signals along actin filaments.
MFC: An open-source high-order multi-component, multi-phase, and multi-scale compressible flow solver
MFC is an open-source tool for solving multi-component, multi-phase, and bubbly compressible flows. It is capable of efficiently solving a wide range of flows, including droplet atomization, shock-bubble interaction, and bubble dynamics. We present the 5- and 6-equation thermodynamically-consistent diffuse-interface models we use to handle such flows, which are coupled to high-order interface-capturing methods, HLL-type Riemann solvers, and TVD time-integration schemes that are capable of simulating unsteady flows with strong shocks. The numerical methods are implemented in a flexible, modular framework that is amenable to future development. The methods we employ are validated via comparisons to experimental results for shock-bubble, shock-droplet, and shock-water-cylinder interaction problems and verified to be free of spurious oscillations for material-interface advection and gas-liquid Riemann problems. For smooth solutions, such as the advection of an isentropic vortex, the methods are verified to be high-order accurate. Illustrative examples involving shock-bubble-vessel-wall and acoustic-bubble-net interactions are used to demonstrate the full capabilities of MFC.
Mi3-GPU: MCMC-based Inverse Ising Inference on GPUs for protein covariation analysis
Inverse Ising inference is a method for inferring the coupling parameters of a Potts/Ising model based on observed site-covariation, which has found important applications in protein physics for detecting interactions between residues in protein families. We introduce Mi3-GPU ("mee-three", for CMC nverse sing nference) software for solving the inverse Ising problem for protein-sequence datasets with few analytic approximations, by parallel Markov-Chain Monte-Carlo sampling on GPUs. We also provide tools for analysis and preparation of protein-family Multiple Sequence Alignments (MSAs) to account for finite-sampling issues, which are a major source of error or bias in inverse Ising inference. Our method is "generative" in the sense that the inferred model can be used to generate synthetic MSAs whose mutational statistics (marginals) can be verified to match the dataset MSA statistics up to the limits imposed by the effects of finite sampling. Our GPU implementation enables the construction of models which reproduce the covariation patterns of the observed MSA with a precision that is not possible with more approximate methods. The main components of our method are a GPU-optimized algorithm to greatly accelerate MCMC sampling, combined with a multi-step Quasi-Newton parameter-update scheme using a "Zwanzig reweighting" technique. We demonstrate the ability of this software to produce generative models on typical protein family datasets for sequence lengths ~ 300 with 21 residue types with tens of millions of inferred parameters in short running times.
Full 3D position reconstruction of a radioactive source based on a novel hyperbolic geometrical algorithm
A new method to locate, with millimetre uncertainty, in 3D, a -ray source emitting multiple -rays in a cascade, employing conventional LaBr(Ce) scintillation detectors, has been developed. Using 16 detectors in a symmetrical configuration the detector energy and time signals, resulting from the -ray interactions, are fed into a new source position reconstruction algorithm. The Monte-Carlo based Geant4 framework has been used to simulate the detector array and a Co source located at two positions within the spectrometer central volume. For a source located at (0,0,0) the algorithm reports X, Y, Z values of -0.3 ± 2.5, -0.4 ± 2.4, and -0.6 ± 2.5 mm, respectively. For a source located at (20,20,20) mm, with respect to the array centre, the algorithm reports X, Y, Z values of 20.2 ± 1.0, 20.2 ± 0.9, and 20.1 ± 1.2 mm. The resulting precision of the reconstruction means that this technique could find application in a number of areas including nuclear medicine, national security, radioactive waste assay and proton beam therapy.
PyLCP: A Python package for computing laser cooling physics
We present a Python object-oriented computer program for simulating various aspects of laser cooling physics. Our software is designed to be both easy to use and adaptable, allowing the user to specify the level structure, magnetic field profile, or the laser beams' geometry, detuning, and intensity. The program contains three levels of approximation for the motion of the atom, applicable in different regimes offering cross checks for calculations and computational efficiency depending on the physical situation. We test the software by reproducing well-known phenomena, such as damped Rabi flopping, electromagnetically induced transparency, stimulated Raman adiabatic passage, and optical molasses. We also use our software package to quantitatively simulate recoil-limited magneto-optical traps, like those formed on the narrow S → P transition in Sr and Sr.
AAfrag: Interpolation routines for Monte Carlo results on secondary production in proton-proton, proton-nucleus and nucleus-nucleus interactions
We provide a compilation of predictions of the QGSJET-II-04m model for the production of secondary species (photons, neutrinos, electrons, positrons, and antinucleons) that are covering a wide range of energies of the beam particles in proton-proton, proton-nucleus, nucleus-proton, and nucleus-nucleus reactions. The current version of QGSJET-II-04m has an improved treatment of the production of secondary particles at low energies: the parameters of the hadronization procedure have been fine-tuned, based on a number of recent benchmark experimental data, notably, from the LHCf, LHCb, and NA61 experiments. Our results for the production spectra are made publicly accessible through the interpolation routines AAfrag which are described below. Besides, we comment on the impact of Feynman scaling violation and isospin symmetry effects on antinucleon production.
A Java Application to Characterize Biomolecules and Nanomaterials in Electrolyte Aqueous Solutions
The electrostatic, entropic and surface interactions between a macroion (nanoparticle or biomolecule), surrounding ions and water molecules play a fundamental role in the behavior and function of colloidal systems. However, the molecular mechanisms governing these phenomena are still poorly understood. One of the major limitations in procuring this understanding is the lack of appropriate computational tools. Additionally, only experts in the field with an extensive background in programming, who are trained in statistical mechanics, and have access to supercomputers are able to study these systems. To overcome these limitations, in this article, we present a free, multiplatform, portable Java software, which provides experts and non-experts in the field an easy and efficient way to obtain an accurate molecular characterization of electrical and structural properties of aqueous electrolyte mixture solutions around both cylindrical- and spherical-like rigid macroions under multiple conditions. These properties include the normalized ions and water density profile distributions, the mean electrostatic potential, the integrated charge, the zeta potential, the electrostatic potential energy, the particle crowding entropy energy, the ion-ion electrostatic direct correlation energy, and the ionic potential of mean force. The Java software does not require outstanding skills and comes with detailed user-guide documentation. The application is based on the so-called Classical Density Functional Theory Solver (CSDFTS), which was successfully applied to a variety of rod-like biopolymers, rigid-like globular proteins, nanoparticles, and nano-rods. CSDFTS implements several electrolyte and macroion models, uses different levels of approximation and takes advantage of high performance Fortran90 routines and optimized libraries. These features enable the software to run on single processor computers at low-to-moderate computational cost depending on the computer performance, the grid resolution, and the characterization of the macroion and the electrolyte solution, among other factors. As a unique feature, the software comes with a graphical user interface (GUI) that allows users to take advantage of the visually guided setup of the required input data to properly characterize the system and configure the solver. Several examples on nanomaterials and biomolecules are provided to illustrate the use of the GUI and the solver performance.
Efficient sampling of spreading processes on complex networks using a composition and rejection algorithm
Efficient stochastic simulation algorithms are of paramount importance to the study of spreading phenomena on complex networks. Using insights and analytical results from network science, we discuss how the structure of contacts affects the efficiency of current algorithms. We show that algorithms believed to require or even operations per update-where is the number of nodes-display instead a polynomial scaling for networks that are either dense or sparse and heterogeneous. This significantly affects the required computation time for simulations on large networks. To circumvent the issue, we propose a node-based method combined with a composition and rejection algorithm, a sampling scheme that has an average-case complexity of per update for general networks. This systematic approach is first set-up for Markovian dynamics, but can also be adapted to a number of non-Markovian processes and can enhance considerably the study of a wide range of dynamics on networks.
A Framework for Comparing Vascular Hemodynamics at Different Points in Time
Computational simulations of blood flow contribute to our understanding of the interplay between vascular geometry and hemodynamics. With an improved understanding of this interplay from computational fluid dynamics (CFD), there is potential to improve basic research and the targeting of clinical care. One avenue for further analysis concerns the influence of time on the vascular geometries used in CFD simulations. The shape of blood vessels changes frequently, as in deformation within the cardiac cycle, and over long periods of time, such as the development of a stenotic plaque or an aneurysm. These changes in the vascular geometry will, in turn, influence flow within these blood vessels. By performing CFD simulations in geometries representing the blood vessels at different points in time, the interplay of these geometric changes with hemodynamics can be quantified. However, performing CFD simulations on different discrete grids leads to an additional challenge: how does one directly and quantitatively compare simulation results from different vascular geometries? In a previous study, we began to address this problem by proposing a method for the simplified case where the two geometries share a common centerline. In this companion paper, we generalize this method to address geometric changes which alter the vessel centerline. We demonstrate applications of this method to the study of wall shear stress in the left coronary artery. First, we compute the difference in wall shear stress between simulations using vascular geometries derived from patient imaging data at two points in the cardiac cycle. Second, we evaluate the relationship between changes in wall shear stress and the progressive development of a coronary aneurysm or stenosis.
RPYFMM: Parallel Adaptive Fast Multipole Method for Rotne-Prager-Yamakawa Tensor in Biomolecular Hydrodynamics Simulations
RPYFMM is a software package for the efficient evaluation of the potential field governed by the Rotne-Prager-Yamakawa (RPY) tensor interactions in biomolecular hydrodynamics simulations. In our algorithm, the RPY tensor is decomposed as a linear combination of four Laplace interactions, each of which is evaluated using the adaptive fast multipole method (FMM) [1] where the exponential expansions are applied to diagonalize the multipole-to-local translation operators. RPYFMM offers a unified execution on both shared and distributed memory computers by leveraging the DASHMM library [2, 3]. Preliminary numerical results show that the interactions for a molecular system of 15 million particles (beads) can be computed within one second on a Cray XC30 cluster using 12, 288 cores, while achieving approximately 54% strong-scaling efficiency.
Sampling random directions within an elliptical cone
This work extends the spherical surface sampling algorithm in order to uniformly generate random directions within an elliptical cone. This has applications in Monte Carlo particle transport simulations, for example modeling asymmetric beam divergence or scattering interactions. Two methods are presented. The first obeys the strict boundary of the elliptical cone. The second relaxes this requirement, increasing the range of generated directions by up to 10% for elliptical cones of extreme eccentricity. However, the second method is able to generate directions beyond the equator.
GPU-accelerated Red Blood Cells Simulations with Transport Dissipative Particle Dynamics
Mesoscopic numerical simulations provide a unique approach for the quantification of the chemical influences on red blood cell functionalities. The transport Dissipative Particles Dynamics (tDPD) method can lead to such effective multiscale simulations due to its ability to simultaneously capture mesoscopic advection, diffusion, and reaction. In this paper, we present a GPU-accelerated red blood cell simulation package based on a tDPD adaptation of our red blood cell model, which can correctly recover the cell membrane viscosity, elasticity, bending stiffness, and cross-membrane chemical transport. The package essentially processes all computational workloads in parallel by GPU, and it incorporates multi-stream scheduling and non-blocking MPI communications to improve inter-node scalability. Our code is validated for accuracy and compared against the CPU counterpart for speed. Strong scaling and weak scaling are also presented to characterizes scalability. We observe a speedup of 10.1 on one GPU over all 16 cores within a single node, and a weak scaling efficiency of 91% across 256 nodes. The program enables quick-turnaround and high-throughput numerical simulations for investigating chemical-driven red blood cell phenomena and disorders.
An adaptable parallel algorithm for the direct numerical simulation of incompressible turbulent flows using a Fourier spectral/ element method and MPI virtual topologies
A hybrid parallelisation technique for distributed memory systems is investigated for a coupled Fourier-spectral/ element discretisation of domains characterised by geometric homogeneity in one or more directions. The performance of the approach is mathematically modelled in terms of operation count and communication costs for identifying the most efficient parameter choices. The model is calibrated to target a specific hardware platform after which it is shown to accurately predict the performance in the hybrid regime. The method is applied to modelling turbulent flow using the incompressible Navier-Stokes equations in an axisymmetric pipe and square channel. The hybrid method extends the practical limitations of the discretisation, allowing greater parallelism and reduced wall times. Performance is shown to continue to scale when both parallelisation strategies are used.
Scalability Test of Multiscale Fluid-Platelet Model for Three Top Supercomputers
We have tested the scalability of three supercomputers: the Tianhe-2, Stampede and CS-Storm with multiscale fluid-platelet simulations, in which a highly-resolved and efficient numerical model for nanoscale biophysics of platelets in microscale viscous biofluids is considered. Three experiments involving varying problem sizes were performed: Exp-S: 680,718-particle single-platelet; Exp-M: 2,722,872-particle 4-platelet; and Exp-L: 10,891,488-particle 16-platelet. Our implementations of multiple time-stepping (MTS) algorithm improved the performance of single time-stepping (STS) in all experiments. Using MTS, our model achieved the following simulation rates: 12.5, 25.0, 35.5 μs/day for Exp-S and 9.09, 6.25, 14.29 μs/day for Exp-M on Tianhe-2, CS-Storm 16-K80 and Stampede K20. The best rate for Exp-L was 6.25 μs/day for Stampede. Utilizing current advanced HPC resources, the simulation rates achieved by our algorithms bring within reach performing complex multiscale simulations for solving vexing problems at the interface of biology and engineering, such as thrombosis in blood flow which combines millisecond-scale hematology with microscale blood flow at resolutions of micro-to-nanoscale cellular components of platelets. This study of testing the performance characteristics of supercomputers with advanced computational algorithms that offer optimal trade-off to achieve enhanced computational performance serves to demonstrate that such simulations are feasible with currently available HPC resources.
GPU-Accelerated Adjoint Algorithmic Differentiation
Many scientific problems such as classifier training or medical image reconstruction can be expressed as minimization of differentiable real-valued cost functions and solved with iterative gradient-based methods. Adjoint algorithmic differentiation (AAD) enables automated computation of gradients of such cost functions implemented as computer programs. To backpropagate adjoint derivatives, excessive memory is potentially required to store the intermediate partial derivatives on a dedicated data structure, referred to as the "tape". Parallelization is difficult because threads need to synchronize their accesses during taping and backpropagation. This situation is aggravated for many-core architectures, such as Graphics Processing Units (GPUs), because of the large number of light-weight threads and the limited memory size in general as well as per thread. We show how these limitations can be mediated if the cost function is expressed using GPU-accelerated vector and matrix operations which are recognized as intrinsic functions by our AAD software. We compare this approach with naive and vectorized implementations for CPUs. We use four increasingly complex cost functions to evaluate the performance with respect to memory consumption and gradient computation times. Using vectorization, CPU and GPU memory consumption could be substantially reduced compared to the naive reference implementation, in some cases even by an order of complexity. The vectorization allowed usage of optimized parallel libraries during forward and reverse passes which resulted in high speedups for the vectorized CPU version compared to the naive reference implementation. The GPU version achieved an additional speedup of 7.5 ± 4.4, showing that the processing power of GPUs can be utilized for AAD using this concept. Furthermore, we show how this software can be systematically extended for more complex problems such as nonlinear absorption reconstruction for fluorescence-mediated tomography.
MPBEC, a Matlab Program for Biomolecular Electrostatic Calculations
One of the most used and efficient approaches to compute electrostatic properties of biological systems is to numerically solve the Poisson-Boltzmann (PB) equation. There are several software packages available that solve the PB equation for molecules in aqueous electrolyte solutions. Most of these software packages are useful for scientists with specialized training and expertise in computational biophysics. However, the user is usually required to manually take several important choices, depending on the complexity of the biological system, to successfully obtain the numerical solution of the PB equation. This may become an obstacle for researchers, experimentalists, even students with no special training in computational methodologies. Aiming to overcome this limitation, in this article we present MPBEC, a free, cross-platform, open-source software that provides non-experts in the field an easy and efficient way to perform biomolecular electrostatic calculations on single processor computers. MPBEC is a Matlab script based on the Adaptative Poisson Boltzmann Solver, one of the most popular approaches used to solve the PB equation. MPBEC does not require any user programming, text editing or extensive statistical skills, and comes with detailed user-guide documentation. As a unique feature, MPBEC includes a useful graphical user interface (GUI) application which helps and guides users to configure and setup the optimal parameters and approximations to successfully perform the required biomolecular electrostatic calculations. The GUI also incorporates visualization tools to facilitate users pre- and post- analysis of structural and electrical properties of biomolecules.
Asynchronous Replica Exchange Software for Grid and Heterogeneous Computing
Parallel replica exchange sampling is an extended ensemble technique often used to accelerate the exploration of the conformational ensemble of atomistic molecular simulations of chemical systems. Inter-process communication and coordination requirements have historically discouraged the deployment of replica exchange on distributed and heterogeneous resources. Here we describe the architecture of a software (named ASyncRE) for performing asynchronous replica exchange molecular simulations on volunteered computing grids and heterogeneous high performance clusters. The asynchronous replica exchange algorithm on which the software is based avoids centralized synchronization steps and the need for direct communication between remote processes. It allows molecular dynamics threads to progress at different rates and enables parameter exchanges among arbitrary sets of replicas independently from other replicas. ASyncRE is written in Python following a modular design conducive to extensions to various replica exchange schemes and molecular dynamics engines. Applications of the software for the modeling of association equilibria of supramolecular and macromolecular complexes on BOINC campus computational grids and on the CPU/MIC heterogeneous hardware of the XSEDE Stampede supercomputer are illustrated. They show the ability of ASyncRE to utilize large grids of desktop computers running the Windows, MacOS, and/or Linux operating systems as well as collections of high performance heterogeneous hardware devices.
A comparison of numerical approaches to the solution of the time-dependent Schrödinger equation in one dimension
We present a simple, one-dimensional model of an atom exposed to a time-dependent intense, short-pulse EM field with the objective of teaching undergraduates how to apply various numerical methods to study the behavior of this system as it evolves in time using several time propagation schemes.In this model, the exact Coulomb potential is replaced by a soft-core interaction to avoid the singularity at the origin. While the model has some drawbacks, it has been shown to be a reasonable representation of what occurs in the fully three-dimensional hydrogen atom.The model can be used as a tool to train undergraduate physics majors in the art of computation and software development.
