Overview
Atomistic simulations are a powerful tool for understanding material behavior at the microscopic scale, but their applicability is fundamentally limited by the timescales accessible to brute-force methods. Systems remain trapped within metastable states for long stretches of simulation time, while the rare transitions between states are what actually drive the long-term evolution—a difficulty widely known as the time-scale or rare-event problem.
This work presents an importance-sampling framework for accelerating discrete-state Markov chain Monte Carlo (MCMC) simulations, in which a neural network is trained to represent the optimal bias potential—the logarithm of the optimal importance function. By operating in log-space, the method handles the extremely small magnitudes of the importance function near energy minima that would otherwise cause numerical underflow at low temperatures. A rigorous reweighting formula recovers the unbiased transition rates of the original system, and a branching random walk (BRW) procedure controls the variance of the estimator.
This is the discrete-state precursor to my continuous Langevin dynamics framework, which extends the same ideas to atomistic systems with continuous degrees of freedom.
Problem Setup
Consider a discrete domain $\Omega$ of microscopic states laid out on a regular grid, with an energy landscape $E(i)$ defined on each site. The kinetics is specified by jump rates between nearest-neighbor sites:
where $\nu_0$ is a frequency prefactor and $\mathbb{1}_{\rm nn}(i,j)=1$ for nearest neighbors. These rates satisfy detailed balance with respect to the Boltzmann distribution. The corresponding jump probabilities upon leaving site $i$ are $K(i\to j) = r(i\to j)/r_{\rm tot}(i)$.
At low temperatures, a system initialized near A spends a long time in the A-basin before a rare transition into the B-basin. The two practical objectives are (i) obtaining accurate transition rates between metastable states, and (ii) resolving the relative probabilities of competing reaction channels (e.g. those that pass through $S_1$ vs. $S_2$).
Approach
Importance sampling modifies the jump probabilities through an importance function $I(i)$:
where the indicator $\mathbb{1}_{\rm A}(j)$ removes returns to A, and $n'(i)$ is a normalization factor. There exists a unique optimal importance function $I^{\rm opt}(i)$ for which all the $n'(i) = 1$; under this choice, every successful path is enhanced by exactly the same factor $I^{\rm opt}({\rm B})/I^{\rm opt}({\rm A})$, so relative path probabilities are preserved and the original transition rate can be recovered exactly. The catch: finding $I^{\rm opt}$ exactly is as hard as solving the rare-event problem itself.
The framework solves three concrete obstacles to making this practical at high dimension and low temperature:
- Neural-network bias potential, trained in log-space. Instead of fitting $I^{\rm opt}(i)$ directly—which can be exponentially small near energy minima and cause numerical underflow—a neural network is trained to represent the bias potential $E_b(i;\theta) \propto -k_B T \log I^{\rm opt}(i)$. Working in log-space stabilizes both training and sampling at low temperatures.
- Adaptive sampling with simulated annealing. Training begins at a high effective temperature where successful transitions are easy to sample, and the temperature is reduced in stages down to the target. At each stage, the network is updated using the paths produced under its own biased dynamics, avoiding the chicken-and-egg problem of needing successful paths to learn the bias.
- Hybrid Gaussian + MLP for high dimensions. For the 14-dimensional system, the bias potential is parameterized as $E_b = A \exp[-\sum_i a_i (x_i - c_i)^2] + \mathrm{MLP}(x; \theta)$. The Gaussian term gives a coarse global gradient that drives early exploration; the MLP refines it into the optimal shape.
- Unbiased rate estimator with branching random walk. When $I(i) \neq I^{\rm opt}(i)$, the path-weight reweighting still recovers the correct rate, but with high variance. The BRW procedure splits paths whose weights grow too large and prunes those whose weights become negligible, keeping the weight distribution within a controlled range and dramatically reducing the variance of the rate estimator.
Results
2D System with Two Reaction Channels
The first benchmark is the 2D landscape shown in Figure 1, with energy minima at
$(\pm 1.1, 0)$ nm and saddle points $S_1, S_2$ at $(0, \pm 1)$ nm with barriers of
$1.02$ and $0.98$ eV respectively. Because the state space is finite, the optimal
bias potential can be computed numerically and used as a ground truth. The neural
network—two hidden layers of 30 tanh units, trained with simulated
annealing from 5800 K down to 500 K—recovers the exact bias potential to high accuracy,
with the largest residual errors confined to sparsely sampled states far from the
dominant transition pathways.
Variance Control via Branching Random Walk
To stress-test the framework, the network is trained on a coarse grid ($\Delta x = 0.1$ nm) and then used to bias dynamics on a refined grid ($\Delta x = 0.025$ nm), introducing a deliberate discretization mismatch. Without BRW, the path weights span many orders of magnitude—a few outlier paths dominate the estimator. With BRW (weight range $[0.5, 1.2]$), the weights are kept within a tight band, and the success-probability estimator becomes both accurate and low-variance: $\hat P_{\rm AB}=(1.52 \pm 0.06)\times 10^{-14}$ versus the exact $1.41\times 10^{-14}$.
Transition Rates and Mechanisms
The estimated transition rates converge as the grid is refined, and their temperature dependence agrees with Kramers' rate theory across more than ten orders of magnitude. Crucially, the framework also resolves the mechanism: the fraction of transitions through the upper saddle $S_1$ versus the lower saddle $S_2$ is recovered accurately, including the correction beyond the naive barrier-energy ratio.
Scaling to 14 Dimensions
The 14-dimensional benchmark embeds the 2D landscape into a higher-dimensional
hypercube by adding harmonic terms in the remaining twelve coordinates. The exact
optimal bias potential remains independent of the harmonic directions, which gives
a clean ground truth for the 14D-trained network when projected back to the 2D
subspace. With the hybrid Gaussian + MLP parameterization (two hidden layers of 100
ReLU units), the projected bias potential agrees with the exact 2D
solution within the regions traversed by the dominant transition pathways.
Connection and Future Directions
The continuous-state extension of this framework, applied to overdamped Langevin dynamics and the Lennard-Jones 7 atomistic benchmark, is described on the Rare Events page. Ongoing directions include applying the method to atomistic systems where rare events drive long-term behavior—crystal defect evolution, dislocation dynamics, and protein conformational change—and connecting it to broader machine-learning frameworks for stochastic dynamics.
Reference
M. M. Kim and W. Cai, Accelerated Markov Chain Monte Carlo Simulation via Neural Network–Driven Importance Sampling.