Enhanced sampling in atomistic simulations

Standard equilibrium simulations of complex biosystems are usually done on a single solvated biomolecule in PBC due to computational bounds.^1.2.

In these conditions the only mean to measure, e.g., the free energy differences between two given protein conformations, is to record the number of times the protein molecule in the MD cell is observed in each of the two conformations. Swaps between these conformers can take, in (time) average, as long as 0.1-1 microseconds[41] even for small proteins. One then needs to do extremely long equilibrium simulations in order to have a sufficient number of swaps between conformational states so as to determine with good accuracy a stationary (equilibrium) ratio of the conformational probability and hence the free energy difference. To give just a faint idea of the computational cost involved, for the simple system of decaalanine in vacuo, 1.0 microseconds of serial simulation takes about 10 days on 2.5 MH processor. Due to this computational bounds, standard molecular dynamics simulation of even small biological systems are in general not ergodic in the accessible simulation time. Typically, the system remains trapped during the whole simulation time span in a local minimum and the rare event of escaping the trap surmounting a free energy barrier never happens.

In order to overcome such severe sampling problem, many recent MD techniques has been devised. The Replica Exchange Method (REM)[42,43,44,45] provides an elegant and simple solution to quasi-ergodic sampling. In REM, several independent trajectories, called replicas, are simultaneously generated in different thermodynamic conditions. The production of these simultaneous trajectories usually occurs on an array of parallel processors. The thermodynamics conditions of these replicas are chosen so as to span homogeneously the thermodynamic space from the ensemble of interest to a different ensemble with enhanced transition rates, where the sampling is ergodic. During the simulation, neighboring replicas are allowed to exchange their configurations, subject to specific acceptance criteria. In this fashion, a trajectory is no longer bound to an unique given equilibrium ensemble but can randomly walk in a thermodynamic space of different equilibrium conditions, visiting ensembles where an ergodic sampling is possible, and then going back to the quasi-ergodic ensemble of interest. Therefore, REM is an algorithm which employs an extended ensemble formalism in order to overcome slow relaxation. The gain in sampling efficiency with respect to a series of uncoupled parallel trajectories comes from the exchange of information between trajectories, and the replica exchange process is the tool by which ``information'' (e.g. a particular configuration) is carried, for example, from an high to a low temperature. The REM algorithm can be used in principle without prior knowledge of the ``important'' reaction coordinates of the system, i.e., in the case of biological systems, those that defines the accessible conformational space in the target thermodynamics conditions. The REM algorithm is described in detail in Chapter 5.

A class of simulation algorithms closely related to REM are the so-called serial generalized-ensemble (SGE) methods[46]. The basic difference between SGE methods and REM is that in the former no pairs of replicas are necessary to make a trajectory in temperature space and more generally in the generalized ensemble space. In SGE methods only one replica can undergo ensemble transitions which are realized on the basis of a Monte Carlo like criterion. The most known example of SGE algorithm is the simulated tempering technique[44,47], where weighted sampling is used to produce a random walk in temperature space. An important limitation of SGE approaches is that an evaluation of free energy differences between ensembles is needed as input to ensure equal visitation of the ensembles, and eventually a faster convergence of structural properties[48]. REM was just developed to eliminate the need to know a priori such free energy differences. On the other side, several studies[48,49,50] have reported that SGE in general and simulated tempering in particular consistently gives a higher rate of delivering the system between high temperature states and low temperature states, as well as a higher rate of crossing the potential energy space. Moreover SGE methods are well-suited to distributed computing environments because synchronization and communication between replicas/processors can be avoided. The potential of mean force[51,52] along a chosen collective coordinate can be computed a posteriori in REM and SGE simulations using multiple-histogram reweighting techniques[53,54]. The potential of mean force can also be determined by performing SGE and REM simulations directly in the space of the collective coordinate[55]. In the ORAC program we have implemented SGE simulations, either in a simulated-tempering like fashion or in the space of bond, bending, and torsion coordinates. These simulations exploit the adaptive method to calculate weight factors (i.e. free energies) proposed in Ref. [56]. The method is described in Chapter 6.

The a priori identification of the unknown coordinates, along with their underlying free energy surface, are actually one of the outputs of the REM and SGE approaches.However, once these important coordinates are known, one can use less expensive techniques to study the associated essential free energy surface. Canonical reweighting or Umbrella Sampling methods[57], for example, modify (bias) the interaction Hamiltonian of the system in such a way to facilitate barrier crossing between conformational basins. The canonical average of the unperturbed systems are then reconstructed by appropriately reweighting the biased averaged.

Quasi-equilibrium techniques[58,59,60,61] builds such biasing potential that favors barrier crossing by periodically adding a small perturbation to the system Hamiltonian so as to progressively flatten the free energy surfaces along selected reaction coordinates. For example, in the so-called standard ``metadynamics'' simulation method[58], a history-dependent potential, made of accumulated Gaussian functions deposited continuously at the instantaneous values of the given reaction coordinates, is imposed to the system. The history-dependent potential disfavors configurations in the space of the reaction coordinates that have already been visited, and it has been shown, by appropriately adjusting system dependent parameters, to numerically converge to the inverse of the free energy surface.[62] In the present version of ORAC the metadynamics technique has been implemented in the parallel version whereby multiple metadynamics simulation (walkers) are run in parallel cooperatively building a common history dependent potential which is passed among all processes. The history dependent potential is generally defined over a multidimensional domain involving several reaction coordinates. Metadynamics can be used, e.g., to identify the minimum free energy path between two metastable protein states defining the reactants and products of an elementary chemical reaction. Quasi-equilibrium techniques in biological systems converge rather slowly since the convergence rate depends crucially on the inherent slow diffusion along the conformational coordinates. So even if the potential is relatively flattened, the diffusion along a nearly free reaction coordinates can still be slow due to the friction of the orthogonal coordinates. The metadynamics algorithm is described in detail in Chapter 6.3.3

Non equilibrium techniques[63,64,65,66] uses an additional driving potential acting on an appropriate reaction coordinates to fast steer the system from a given equilibrium initial state to a given final state, and viceversa producing a series of forward and reverse non equilibrium trajectories. The driven coordinate is strictly mono-dimensional but can be defined as a trajectory in a multidimensional reaction coordinate space. The free energy differences between the initial and final states (the reactants and the products) is connected, through the Crooks fluctuation theorem[64], to the ratio of distribution functions of the work spent in these trajectories. Free energy reconstruction, using non equilibrium steered molecular dynamics, of the potential of mean force[67] along one arbitrary reaction coordinate is described in detail in Chapter 7.1.

procacci 2021-12-29