Calculating Ensemble Averages Using Configurations from All Ensembles (MBAR estimator)

As recently shown by Shirts and Chodera[107], all the configurations produced by a REM simulation of $M$ replicas, each characterized by a distribution function $P_m(X)$, can be effectively used to obtain equilibrium averages for any target distribution $P_n(X)$, using the so-called Multistate Bennett Acceptance Ratio (MBAR) estimator, which is illustrated in the following.

In the ORAC REM implementation, the most general distribution function for replica $m$ is given by Eq. 5.15. Given that for each replica $m$ one has saved $N_m$ configurations of the kind $\{x_1^m,...x_k^m,...\}$, it can be easily shown that

$\displaystyle Z_n=\frac{\sum_{m=1}^M Z_mN_m^{-1}\sum_{k=1}^{N_m}\alpha_{nm}(x_k^m)P_n(x_k^m)}{N_n^{-1}\sum_{m=1}^M\sum_{k=1}^{N_n}\alpha_{nm}(x_k^n)P_m(x_k^n)}.$ (5.24)

Eq. 5.24 holds for any arbitrary bridge function $\alpha_{nm}(X)$ , In particular, choosing[107]

$\displaystyle \alpha_{nm}(X) = \frac{N_m Z_m^{-1} }{\sum_l^M N_l Z_l^{-1} P_l(X)}$ (5.25)

Eq. 5.24 transforms as

$\displaystyle Z_n=\sum_{k=1}^N\frac{P_n(x_k)}{\sum_{l=1}^MZ_l^{-1}N_lP_l(x_k)}$ (5.26)

where we have collapsed the two indices $k$, for the configurations, and $m$, for the replicas in one single index $k$ running on all the configurations $N=\sum_m N_m$ produced by the REMD. Except for an arbitrary multiplicative factor, Eq. 5.26 can be solved iteratively for the partition function $Z_n$. At the beginning of the process, one sets $Z_i=1$ for all $i$. . At the iteration $i+1$ we have that

$\displaystyle Z_n(i+1) = \sum_k^{N} W_n(x_k,i)$ (5.27)

where the weights depend on the $Z_l$ calculated at the previous iteration

$\displaystyle W_n(x_k,i) = \frac{e^{-\beta {\bf c}_n\cdot {\bf v}(x_k)}}
{\sum_l N_l e^{-\ln Z_l(i)-\beta {\bf c}_l\cdot {\bf v}(x_k)}}$ (5.28)

and we have used the definition of 5.15 for the replica distribution in ORAC . Once the $Z_i$ have been determined, $Z_{*}$ for an arbitrary distribution $P_{*}(X)$ can be calculated using the configurations sampled in the REMD simulation:

$\displaystyle Z_{*} = \sum_k^{N} \frac {P_{*}(x_k)}{\sum_l Z_l^{-1}N_l P_l(x_k)} =
\sum_k^{N} W_{*}(x_k)$ (5.29)

Setting for example $P_{*} = P_{1}(X)*A(X)$, where $P_1(X)$ is the target distribution $e^{-\beta V(x)}/Z$ (i.e. that of the replica 1) and $A(x)$ is an arbitrary configurational property, we obtain

$\displaystyle <A>_1 = \frac{\int P_{1}(X)A(X) dX}{Z_1} = \frac{\int P_{*}(X)
dX}{Z_1} = \frac{ Z_{*}}{Z_1}$ (5.30)

From Eq. 5.29, we have that $Z_{*} = \sum_k^{N} W_1(x_k)
A(x_k )$ and that $Z_{1} = \sum_k^{N} W_1 (x_k)$. Substituting these results into Eq. 5.30, we obtain

$\displaystyle <A>_1 = \frac{\sum_k^{N} W_1(x_k) A(x_k)}{\sum_k^{N} W_1(x_k)}$ (5.31)

where the weights $W_1$ for all sampled points in the REMD simulation are given by

$\displaystyle W_1(x_k,i) = \frac{e^{-\beta V(x_k) }}
{\sum_l N_l e^{-\ln Z_l(i)-\beta {\bf c}_l\cdot {\bf v}(x_k)}}$ (5.32)

In summary, using the configurational energies from all $M$ replicas, we first solve iteratively the system 5.26 for all $Z_n$ (except for a multiplicative factor), with $1\le n
\ne M$. In doing this, the weights $W_n$ (including $n=1$) are also determined. Finally configurational averages at, e.g., the target distribution can be determined using all the REMD configurations by means of Eq. 5.31.