CSTG: An Effective Framework for Cost-sensitive Sparse Online Learning
Sparse online learning and cost-sensitive learning are two important areas of machine learning and data mining research. Each has been well studied with many interesting algorithms developed. However, very limited published work addresses the joint study of these two fields. In this paper, to tackle the high-dimensional data streams with skewed distributions, we introduce a framework of cost-sensitive sparse online learning. Our proposed framework is a substantial extension of the influential Truncated Gradient (TG) method by formulating a new convex optimization problem, where the two mutual restraint factors, misclassification cost and sparsity, can be simultaneously and favorably balanced. We theoretically analyze the regret and cost bounds of the proposed algorithm, and pinpoint its theoretical merit compared to the existing related approaches. Large-scale empirical comparisons to five baseline methods on eight real-world streaming datasets demonstrate the encouraging performance of the developed method. Algorithm implementation and datasets are available upon request.
Fast Analytical Methods for Macroscopic Electrostatic Models in Biomolecular Simulations
We review recent developments of fast analytical methods for macroscopic electrostatic calculations in biological applications, including the Poisson-Boltzmann (PB) and the generalized Born models for electrostatic solvation energy. The focus is on analytical approaches for hybrid solvation models, especially the image charge method for a spherical cavity, and also the generalized Born theory as an approximation to the PB model. This review places much emphasis on the mathematical details behind these methods.
ON IDENTIFIABILITY OF NONLINEAR ODE MODELS AND APPLICATIONS IN VIRAL DYNAMICS
Ordinary differential equations (ODE) are a powerful tool for modeling dynamic processes with wide applications in a variety of scientific fields. Over the last 2 decades, ODEs have also emerged as a prevailing tool in various biomedical research fields, especially in infectious disease modeling. In practice, it is important and necessary to determine unknown parameters in ODE models based on experimental data. Identifiability analysis is the first step in determing unknown parameters in ODE models and such analysis techniques for nonlinear ODE models are still under development. In this article, we review identifiability analysis methodologies for nonlinear ODE models developed in the past one to two decades, including structural identifiability analysis, practical identifiability analysis and sensitivity-based identifiability analysis. Some advanced topics and ongoing research are also briefly reviewed. Finally, some examples from modeling viral dynamics of HIV, influenza and hepatitis viruses are given to illustrate how to apply these identifiability analysis methods in practice.
High-Frequency Oscillations of a Sphere in a Viscous Fluid near a Rigid Plane
High-frequency oscillations of a rigid sphere in an incompressible viscous fluid moving normal to a rigid plane are considered when the ratio of minimum clearance to sphere radius is small. Asymptotic expansions are constructed that permit an analytical estimate of the force acting on the sphere as a result of its motion. An inner expansion, valid in the neighborhood of the minimum gap, reflects the dominance of viscous effects and fluid inertia. An outer expansion, valid outside the gap, reflects the dominance of fluid inertia with a correction for an oscillating viscous boundary layer. The results are applied to the hydrodynamics of the tapping mode of an atomic force microscope and to the dynamic calibration of its cantilevers.
Two-Term Asymptotic Approximation of a Cardiac Restitution Curve
If spatial extent is neglected, ionic models of cardiac cells consist of systems of ordinary differential equations (ODEs) which have the property of excitability, i.e., a brief stimulus produces a prolonged evolution (called an action potential in the cardiac context) before the eventual return to equilibrium. Under repeated stimulation, or pacing, cardiac tissue exhibits electrical restitution: the steady-state action potential duration (APD) at a given pacing period B shortens as B is decreased. Independent of ionic models, restitution is often modeled phenomenologically by a one-dimensional mapping of the form APD(next) = f(B - APD(previous)). Under some circumstances, a restitution function f can be derived as an asymptotic approximation to the behavior of an ionic model.In this paper, extending previous work, we derive the next term in such an asymptotic approximation for a particular ionic model consisting of two ODEs. The two-term approximation exhibits excellent quantitative agreement with the actual restitution curve, whereas the leading-order approximation significantly underestimates actual APD values.
Variational multiscale models for charge transport
This work presents a few variational multiscale models for charge transport in complex physical, chemical and biological systems and engineering devices, such as fuel cells, solar cells, battery cells, nanofluidics, transistors and ion channels. An essential ingredient of the present models, introduced in an earlier paper (Bulletin of Mathematical Biology, 72, 1562-1622, 2010), is the use of differential geometry theory of surfaces as a natural means to geometrically separate the macroscopic domain from the microscopic domain, meanwhile, dynamically couple discrete and continuum descriptions. Our main strategy is to construct the total energy functional of a charge transport system to encompass the polar and nonpolar free energies of solvation, and chemical potential related energy. By using the Euler-Lagrange variation, coupled Laplace-Beltrami and Poisson-Nernst-Planck (LB-PNP) equations are derived. The solution of the LB-PNP equations leads to the minimization of the total free energy, and explicit profiles of electrostatic potential and densities of charge species. To further reduce the computational complexity, the Boltzmann distribution obtained from the Poisson-Boltzmann (PB) equation is utilized to represent the densities of certain charge species so as to avoid the computationally expensive solution of some Nernst-Planck (NP) equations. Consequently, the coupled Laplace-Beltrami and Poisson-Boltzmann-Nernst-Planck (LB-PBNP) equations are proposed for charge transport in heterogeneous systems. A major emphasis of the present formulation is the consistency between equilibrium LB-PB theory and non-equilibrium LB-PNP theory at equilibrium. Another major emphasis is the capability of the reduced LB-PBNP model to fully recover the prediction of the LB-PNP model at non-equilibrium settings. To account for the fluid impact on the charge transport, we derive coupled Laplace-Beltrami, Poisson-Nernst-Planck and Navier-Stokes equations from the variational principle for chemo-electro-fluid systems. A number of computational algorithms is developed to implement the proposed new variational multiscale models in an efficient manner. A set of ten protein molecules and a realistic ion channel, Gramicidin A, are employed to confirm the consistency and verify the capability. Extensive numerical experiment is designed to validate the proposed variational multiscale models. A good quantitative agreement between our model prediction and the experimental measurement of current-voltage curves is observed for the Gramicidin A channel transport. This paper also provides a brief review of the field.
Linear Models Based on Noisy Data and the Frisch Scheme
We address the problem of identifying linear relations among variables based on noisy measurements. This is a central question in the search for structure in large data sets. Often a key assumption is that measurement errors in each variable are independent. This basic formulation has its roots in the work of Charles Spearman in 1904 and of Ragnar Frisch in the 1930s. Various topics such as errors-in-variables, factor analysis, and instrumental variables all refer to alternative viewpoints on this problem and on ways to account for the anticipated way that noise enters the data. In the present paper we begin by describing certain fundamental contributions by the founders of the field and provide alternative modern proofs to certain key results. We then go on to consider a modern viewpoint and novel numerical techniques to the problem. The central theme is expressed by the Frisch-Kalman dictum, which calls for identifying a noise contribution that allows a maximal number of simultaneous linear relations among the noise-free variables-a rank minimization problem. In the years since Frisch's original formulation, there have been several insights, including trace minimization as a convenient heuristic to replace rank minimization. We discuss convex relaxations and theoretical bounds on the rank that, when met, provide guarantees for global optimality. A complementary point of view to this minimum-rank dictum is presented in which models are sought leading to a uniformly optimal quadratic estimation error for the error-free variables. Points of contact between these formalisms are discussed, and alternative regularization schemes are presented.
AN EVOLUTIONARY MODEL OF TUMOR CELL KINETICS AND THE EMERGENCE OF MOLECULAR HETEROGENEITY DRIVING GOMPERTZIAN GROWTH
We describe a cell-molecular based evolutionary mathematical model of tumor development driven by a stochastic Moran birth-death process. The cells in the tumor carry molecular information in the form of a numerical genome which we represent as a four-digit binary string used to differentiate cells into 16 molecular types. The binary string is able to undergo stochastic point mutations that are passed to a daughter cell after each birth event. The value of the binary string determines the cell fitness, with lower fit cells (e.g. 0000) defined as healthy phenotypes, and higher fit cells (e.g. 1111) defined as malignant phenotypes. At each step of the birth-death process, the two phenotypic sub-populations compete in a prisoner's dilemma evolutionary game with the healthy cells playing the role of cooperators, and the cancer cells playing the role of defectors. Fitness, birth-death rates of the cell populations, and overall tumor fitness are defined via the prisoner's dilemma payoff matrix. Mutation parameters include passenger mutations (mutations conferring no fitness advantage) and driver mutations (mutations which increase cell fitness). The model is used to explore key emergent features associated with tumor development, including tumor growth rates as it relates to intratumor molecular heterogeneity. The tumor growth equation states that the growth rate is proportional to the logarithm of cellular diversity/heterogeneity. The Shannon entropy from information theory is used as a quantitative measure of heterogeneity and tumor complexity based on the distribution of the 4-digit binary sequences produced by the cell population. To track the development of heterogeneity from an initial population of healthy cells (0000), we use dynamic phylogenetic trees which show clonal and sub-clonal expansions of cancer cell sub-populations from an initial malignant cell. We show tumor growth rates are not constant throughout tumor development, and are generally much higher in the subclinical range than in later stages of development, which leads to a Gompertzian growth curve. We explain the early exponential growth of the tumor and the later saturation associated with the Gompertzian curve which results from our evolutionary simulations using simple statistical mechanics principles related to the degree of functional coupling of the cell states. We then compare dosing strategies at early stage development, mid-stage (clinical stage), and late stage development of the tumor. If used early during tumor development in the subclinical stage, well before the cancer cell population is selected for growth, therapy is most effective at disrupting key emergent features of tumor development.
Trajectory stratification of stochastic dynamics
We present a general mathematical framework for trajectory stratification for simulating rare events. Trajectory stratification involves decomposing trajectories of the underlying process into fragments limited to restricted regions of state space (strata), computing averages over the distributions of the trajectory fragments within the strata with minimal communication between them, and combining those averages with appropriate weights to yield averages with respect to the original underlying process. Our framework reveals the full generality and flexibility of trajectory stratification, and it illuminates a common mathematical structure shared by existing algorithms for sampling rare events. We demonstrate the power of the framework by defining strata in terms of both points in time and path-dependent variables for efficiently estimating averages that were not previously tractable.
Research and Education in Computational Science and Engineering
This report presents challenges, opportunities and directions for computational science and engineering (CSE) research and education for the next decade. Over the past two decades the field of CSE has penetrated both basic and applied research in academia, industry, and laboratories to advance discovery, optimize systems, support decision-makers, and educate the scientific and engineering workforce. Informed by centuries of theory and experiment, CSE performs computational experiments to answer questions that neither theory nor experiment alone is equipped to answer. CSE provides scientists and engineers with algorithmic inventions and software systems that transcend disciplines and scales. CSE brings the power of parallelism to bear on troves of data. Mathematics-based advanced computing has become a prevalent means of discovery and innovation in essentially all areas of science, engineering, technology, and society; and the CSE community is at the core of this transformation. However, a combination of disruptive developments-including the architectural complexity of extreme-scale computing, the data revolution and increased attention to data-driven discovery, and the specialization required to follow the applications to new frontiers-is redefining the scope and reach of the CSE endeavor. With these many current and expanding opportunities for the CSE field, there is a growing demand for CSE graduates and a need to expand CSE educational offerings. This need includes CSE programs at both the undergraduate and graduate levels, as well as continuing education and professional development programs, exploiting the synergy between computational science and data science. Yet, as institutions consider new and evolving educational programs, it is essential to consider the broader research challenges and opportunities that provide the context for CSE education and workforce development.
