Multiple time steps integration schemes and electrostatic interactions in complex biomolecular systems

As stated above, simulations of complex systems at the atomistic level, unlike simplified models, have the advantage of representing with realistic detail the full flexibility of the system and the potential energy surface according to which the atoms move. Both these physically significant ingredients of the atomistic approach unfortunately pose severe computational problems: on one hand the inclusion of full flexibility necessarily implies the selection of small step size thereby reducing in a MD simulation the sampling power of the phase space. On the other hand, especially the evaluation of inter-particle long range interactions is an extremely expensive task using conventional methods, its computational cost scaling typically like $ N^{2}$ (with $ N$ being the number of particles) quickly exceeding any reasonable limit. In this book we shall devote our attention to the methods, within the framework of classical MD simulations, that partially overcome the difficulties related to the presence of flexibility and long range interactions when simulating complex systems at the atomistic level.

Complex systems experience different kind of motions with different time scales: Intramolecular vibrations have a period not exceeding few hundreds of femtoseconds while reorientational motions and conformational changes have much longer time scales ranging from few picoseconds to hundreds of nanoseconds. In the intra-molecular dynamics one may also distinguish between fast stretching involving hydrogen with period smaller than 10 fs, stretching between heavy atoms and bending involving hydrogen with double period or more, slow bending and torsional movements and so on. In a similar manner in the diffusional regime many different contributions can be identified.

In a standard integration of Newtonian equations, all these motions, irrespective of their time scales, are advanced using the same time step whose size is inversely proportional to the frequency of the fastest degree of freedom present in the system, therefore on the order of the femtosecond or even less. This constraint on the step size severely limits the accessible simulation time. One common way to alleviate the problem of the small step size is to freeze some supposedly irrelevant and fast degrees of freedom in the system. This procedure relies on the the so-called SHAKE algorithm [10,11,12] that implements holonomic constraints while advancing the Cartesian coordinates. Typically, bonds and/or bending are kept rigid thus removing most of the high frequency density of states and allowing a moderate increase of the step size. The SHAKE procedure changes the input potential and therefore the output density of the states. Freezing degrees of freedom, therefore, requires in principle an a priori knowledge of the dynamical behavior of the system. SHAKE is in fact fully justified when the suppressed degrees of freedom do not mix with the ``relevant'' degrees of freedom. This might be ``almost'' true for fast stretching involving hydrogen which approximately defines an independent subspace of internal coordinates in almost all complex molecular systems [13] but may turn to be wrong in certain cases even for fast stretching between heavy atoms. In any case the degree of mixing of the various degrees of freedom of a complex system is not known a priori and should be on the contrary considered one of the targets of atomistic simulations. The SHAKE algorithm allows only a moderate increase of the step size while introducing, if used without caution, essentially uncontrolled approximations. In other words indiscriminate use of constraints violates the brute-force postulate. A more fruitful approach to the multiple time scale problem is to devise a more efficient ``multiple time step'' integration of the equation of motion. Multiple time step integration in MD is a relatively old idea [14,15,16,17,18,19] but only in recent times, due to the work of Tuckerman, Martyna and Berne and coworkers [20,21,22,23,24,25] is finding widespread application. These authors introduced a very effective formalism for devising multilevel integrators based on the symmetric factorization of the Liouvillean classical time propagator. The multiple time step approach allows integration of all degrees of freedom at an affordable computational cost. In the simulation of complex systems, for a well tuned multilevel integrator, the speed up can be sensibly larger than that obtained imposing bond constraints. Besides its efficiency, the multiple time steps approach has the advantage of not introducing any a priori assumption that may modify part of the density of the state of the system.

The Liouvillean approach to multiple time steps integrator lends itself to the straightforward, albeit tedious, application to extended Lagrangian systems for the simulation in the canonical and isobaric ensembles: once the equations of motions are known, the Liouvillean and hence the scheme, is de facto available. Many efficient multilevel schemes for constant pressure or constant temperature simulation are available in the literature [26,25,27].

As already outlined, long range interactions are the other stumbling block in the atomistic MD simulation of complex systems. The problem is particularly acute in biopolymers where the presence of distributed net charges makes the local potential oscillate wildly while summing, e.g. onto spherical non neutral shells. The conditionally convergent nature of the electrostatic energy series for a periodic system such as the MD box in periodic boundary conditions (PBC) makes any straightforward truncation method essentially unreliable [28,12,29].

The reaction field [30,31] is in principle a physically appealing method that correctly accounts for long range effects and requires only limited computational effort. The technique assumes explicit electrostatic interactions within a cutoff sphere surrounded by a dielectric medium which exerts back in the sphere a ``polarization'' or reaction field. The dielectric medium has a dielectric constant that matches that of the inner sphere. The technique has been proved to give results identical to those obtained with the exact Ewald method in Monte Carlo simulation of dipolar spherocilynders where the dielectric constant that enters in the reaction field is updated periodically according to the value found in the sphere. The reaction field method does however suffer of two major disadvantages that strongly limits its use in MD simulations of complex systems at the atomistic level: during time integration the system may experience instabilities related to the circulating dielectric constant of the reaction field and to the jumps into the dielectric of molecules in the sphere with explicit interactions. The other problem, maybe more serious, is that again the method requires an a priori knowledge of the system, that is the dielectric constant. In pure phases this might not be a great problem but in inhomogeneous systems such as solvated protein, the knowledge of the dielectric constant might be not easily available. Even with the circulating technique, an initial unreliable guess of the unknown dielectric constant, can strongly affect the dielectric behavior of the system and in turn its dynamical and structural state.

The electrostatic series can be computed in principle exactly using the Ewald re-summation technique [32,33]. The Ewald method rewrites the electrostatic sum for the periodic system in terms of two absolutely convergent series, one in the direct lattice and the other in reciprocal lattice. This method, in its standard implementation, is extremely CPU demanding and scales like $ N^{2}$ with $ N$ being the number of charges with the unfortunate consequence that even moderately large size simulations of inhomogeneous biological systems are not within its reach. The rigorous Ewald method, which does not suffers of none of the inconveniences experienced by the reaction field approach, has however regained resurgent interest very recently after publication by Darden, York and Pedersen [34] of the Particle Mesh technique and later on by Essmann, Darden at al.[35] of the variant termed Smooth Particle Mesh Ewald (SPME). SPME is based on older idea idea of Hockney [36] and is essentially an interpolation technique with a charge smearing onto a regular grid and evaluation via fast Fourier Transform (FFT) of the interpolated reciprocal lattice energy sums. The performances of this techniques, both in accuracy and efficiency, are astonishing. Most important, the computational cost scales like $ N\log N$, that is essentially linearly for any practical application. Other algorithm like the Fast Multipole Method (FMM) [37,38,39,40] scales exactly like $ N$, even better than SPME. However FMM has a very large prefactor and the break even point with SPME is on the order of several hundred thousand of particles, that is, as up to now, beyond any reasonable practical limit.

The combination of the multiple time step algorithm and of the SPME [13] makes the simulation of large size complex molecular systems such as biopolymers, polar mesogenic molecules, organic molecules in solution etc., extremely efficient and therefore affordable even for long time spans. Further, this technique do not involve any uncontrolled approximation1.1 and is perfectly consistent with standard PBC.

procacci 2021-12-29