Calculating Ensemble Averages Using Configurations from All Ensembles (MBAR estimator)

As recently shown by Shirts and Chodera[107], all the configurations produced by a REM simulation of $ M$ replicas, each characterized by a distribution function $ P_m(X)$, can be effectively used to obtain equilibrium averages for any target distribution $ P_n(X)$, using the so-called Multistate Bennett Acceptance Ratio (MBAR) estimator, which is illustrated in the following.

In the ORAC REM implementation, the most general distribution function for replica $ m$ is given by Eq. 5.15. Given that for each replica $ m$ one has saved $ N_m$ configurations of the kind $ \{x_1^m,...x_k^m,...\}$, it can be easily shown that

$\displaystyle Z_n=\frac{\sum_{m=1}^M Z_mN_m^{-1}\sum_{k=1}^{N_m}\alpha_{nm}(x_k^m)P_n(x_k^m)}{N_n^{-1}\sum_{m=1}^M\sum_{k=1}^{N_n}\alpha_{nm}(x_k^n)P_m(x_k^n)}.$ (5.24)

Eq. 5.24 holds for any arbitrary bridge function $ \alpha_{nm}(X)$ , In particular, choosing[107]

$\displaystyle \alpha_{nm}(X) = \frac{N_m Z_m^{-1} }{\sum_l^M N_l Z_l^{-1} P_l(X)}$ (5.25)

Eq. 5.24 transforms as

$\displaystyle Z_n=\sum_{k=1}^N\frac{P_n(x_k)}{\sum_{l=1}^MZ_l^{-1}N_lP_l(x_k)}$ (5.26)

where we have collapsed the two indices $ k$, for the configurations, and $ m$, for the replicas in one single index $ k$ running on all the configurations $ N=\sum_m N_m$ produced by the REMD. Except for an arbitrary multiplicative factor, Eq. 5.26 can be solved iteratively for the partition function $ Z_n$. At the beginning of the process, one sets $ Z_i=1$ for all $ i$. . At the iteration $ i+1$ we have that

$\displaystyle Z_n(i+1) = \sum_k^{N} W_n(x_k,i)$ (5.27)

where the weights depend on the $ Z_l$ calculated at the previous iteration

$\displaystyle W_n(x_k,i) = \frac{e^{-\beta {\bf c}_n\cdot {\bf v}(x_k)}} {\sum_l N_l e^{-\ln Z_l(i)-\beta {\bf c}_l\cdot {\bf v}(x_k)}}$ (5.28)

and we have used the definition of 5.15 for the replica distribution in ORAC . Once the $ Z_i$ have been determined, $ Z_{*}$ for an arbitrary distribution $ P_{*}(X)$ can be calculated using the configurations sampled in the REMD simulation:

$\displaystyle Z_{*} = \sum_k^{N} \frac {P_{*}(x_k)}{\sum_l Z_l^{-1}N_l P_l(x_k)} = \sum_k^{N} W_{*}(x_k)$ (5.29)

Setting for example $ P_{*} = P_{1}(X)*A(X)$, where $ P_1(X)$ is the target distribution $ e^{-\beta V(x)}/Z$ (i.e. that of the replica 1) and $ A(x)$ is an arbitrary configurational property, we obtain

$\displaystyle <A>_1 = \frac{\int P_{1}(X)A(X) dX}{Z_1} = \frac{\int P_{*}(X) dX}{Z_1} = \frac{ Z_{*}}{Z_1}$ (5.30)

From Eq. 5.29, we have that $ Z_{*} = \sum_k^{N} W_1(x_k)
A(x_k )$ and that $ Z_{1} = \sum_k^{N} W_1 (x_k)$. Substituting these results into Eq. 5.30, we obtain

$\displaystyle <A>_1 = \frac{\sum_k^{N} W_1(x_k) A(x_k)}{\sum_k^{N} W_1(x_k)}$ (5.31)

where the weights $ W_1$ for all sampled points in the REMD simulation are given by

$\displaystyle W_1(x_k,i) = \frac{e^{-\beta V(x_k) }} {\sum_l N_l e^{-\ln Z_l(i)-\beta {\bf c}_l\cdot {\bf v}(x_k)}}$ (5.32)

In summary, using the configurational energies from all $ M$ replicas, we first solve iteratively the system 5.26 for all $ Z_n$ (except for a multiplicative factor), with $ 1\le n
\ne M$. In doing this, the weights $ W_n$ (including $ n=1$) are also determined. Finally configurational averages at, e.g., the target distribution can be determined using all the REMD configurations by means of Eq. 5.31.

procacci 2021-12-29