Accelerating Langevin Dynamics Simulation

Neural network-driven importance sampling for accelerated Langevin dynamics

Accelerated Langevin Dynamics Simulation via Neural Network-Driven Importance Sampling
Michael M. Kim, Wei Cai
Department of Mechanical Engineering, Stanford University · 2026

Overview

Atomistic simulations are powerful tools for understanding mechanisms at the molecular scale, but they are fundamentally limited by the timescales accessible to brute-force integration. Most of the simulated time is spent inside metastable basins, while the rare transitions between basins—protein conformational changes, defect rearrangements, nucleation events—are precisely what governs long-term evolution.

My research develops an importance-sampling framework that accelerates Langevin dynamics simulations of these rare transitions, while still recovering the unbiased transition rates and the correct relative probabilities of competing reaction channels. The biasing is driven by a neural-network–parameterized importance function (a learned approximation of the committor), and statistical reliability is maintained by a branching random walk (BRW) algorithm that controls the variance of path weights.

The Rare Event Problem

Consider a system with position $\mathbf{x} \in \Omega$ evolving under overdamped Langevin dynamics at temperature $T$:

$$ d\mathbf{x}(t) = -\gamma^{-1}\,\nabla U(\mathbf{x}(t))\,dt + \sqrt{2\gamma^{-1}\beta^{-1}}\, d\mathbf{W}(t), \qquad \beta = 1/(k_B T). $$

The energy landscape $U(\mathbf{x})$ partitions $\Omega$ into deep basins corresponding to metastable states $\Omega_{\rm A}$ and $\Omega_{\rm B}$, separated by saddle points. Because the equilibrium distribution $\rho_{\rm eq}(\mathbf{x}) \propto e^{-\beta U(\mathbf{x})}$ is concentrated near the minima, a trajectory initialized in $\Omega_{\rm A}$ tends to remain there for very long times before making the rare jump into $\Omega_{\rm B}$.

Energy landscape with two metastable basins
Figure 1. Schematic of the configuration space partitioned into two metastable basins. Small regions A and B sit near the local minima; the dividing surface between basins passes through the saddle points $S_1$ and $S_2$. A failure path leaves region A and returns; a success path reaches region B before returning.

Defining small regions A and B around the energy minima lets us decompose any long trajectory into short paths—either failure paths (leave A, return to A before reaching B) or success paths (leave A, reach B before returning). The transition rate is then

$$ r_{\rm AB} = \langle t_{\rm FPT}\rangle^{-1} \approx J_{\rm A}\, P_{\rm AB}, $$

where $J_{\rm A} = \langle t_{\rm AA}\rangle^{-1}$ is the flux out of region A (the Hill relation) and $P_{\rm AB}$ is the success probability—the rare quantity we want to estimate. For genuine rare events $P_{\rm AB} \ll 1$, so brute-force sampling is hopeless.

Decomposition of a long trajectory into failure and success paths
Figure 2. A long-time trajectory decomposed into many short paths. Red curves are failure paths returning to A; the blue curve is a successful path reaching B. The rare event problem is equivalent to the overwhelming probability of generating failure paths versus the vanishingly small probability of generating a success path.

Approach

The framework modifies the dynamics through an importance function $I(\mathbf{x})$ that biases the transition kernel toward successful trajectories. The optimal choice is the committor function—the probability that a path starting at $\mathbf{x}$ reaches B before A—but the committor is the solution of a high-dimensional PDE on a complex landscape and is generally intractable. Two ideas make the framework practical:

Results

2D Potential with Two Reaction Channels

The first benchmark is a two-dimensional potential with two saddle points and asymmetric barriers (1.02 eV and 0.98 eV). The neural network learns the importance function from short biased trajectories, and the framework recovers both the total transition rate and the relative weight of each channel in close agreement with the analytical and brute-force references. This is a clean setting to verify that errors in the learned importance function are absorbed by the path-weight reweighting.

Neural network importance function on the 2D two-channel system
Figure 3. Neural-network importance function trained on the 2D two-channel system. The learned $I(\mathbf{x};\theta)$ smoothly interpolates between the two basins and captures the structure of the committor across both reaction channels.

14-Dimensional Rugged Potential

Scaling up, the framework is applied to a 14-dimensional rugged potential constructed by embedding the two-channel system into a high-dimensional space with additional rough directions. Importance sampling continues to recover accurate transition rates and channel fractions, demonstrating that the neural-network parameterization scales beyond the regime where exact committor PDE solvers are feasible.

The LJ7 Atomistic System

The most demanding benchmark is the system of seven Lennard-Jones discs (LJ7)—a 14-degree-of- freedom atomistic model whose local energy minima fall into four collections $C_0,C_1,C_2,C_3$ (after quotienting out continuous symmetries). A soft-core form of the LJ potential is used to regularize the $r_{ij}\to 0$ singularity:

$$ U_{\rm LJ7}^{\rm soft}(\{\mathbf{x}_i\}) = 4\varepsilon\sum_{i

Metastable states are labeled by the second and third central moments $(\mu_2, \mu_3)$ of the atomic coordination numbers. The transition of interest goes from the ground state $C_3$ (region A) to the higher-energy $C_2$ family (region B), which itself decomposes into four distinct transition modes $M_1$–$M_4$ with slightly different barriers (0.49 eV vs. 0.52 eV).

Soft Lennard-Jones pair potential LJ7 metastable states in collective-variable space
Figure 4. Left: soft-core LJ pair potential, regularized to avoid divergent forces. Right: the four families of LJ7 local minima projected onto the $(\mu_2,\mu_3)$ collective variable space, with representative configurations.

At $T=600$ K the importance-sampled rate is $r_{\rm AB} = (4.32 \pm 0.01)\times 10^{-4}$ fs$^{-1}$, in excellent agreement with the brute-force rate $4.41\times 10^{-4}$ fs$^{-1}$. Without the path-weight reweighting—using $I(\mathbf{x};\theta)$ as if it were the exact optimal importance function—the rate is underestimated by roughly 40%, and the relative weights of the four transition modes collapse to a near-uniform distribution rather than the correct asymmetric pattern. Reweighting is what allows the framework to distinguish the lower-barrier modes $M_1, M_2$ (~32% each) from the higher-barrier modes $M_3, M_4$ (~18% each).

Sampled transition paths for modes M1 and M2 Sampled transition paths for modes M3 and M4
Figure 5. Sampled transition paths for the four LJ7 transition modes, projected onto the $(\mu_2, \mu_3)$ collective variables and colored by the learned importance function value.

Transfer Learning Across Temperatures

The optimal importance function is temperature-dependent, so naively the network would have to be retrained at every temperature of interest. Instead, the network trained at $T=600$ K is used as a warm start for $T=500, 400, 300$ K via transfer learning. This recovers Kramers-theory-consistent rates spanning four orders of magnitude—down to $r_{\rm AB} \approx 3\times 10^{-8}$ fs$^{-1}$ at 300 K—at temperatures where brute-force Langevin dynamics cannot accumulate any transitions in feasible wall-clock time. The agreement with Kramers theory in fact improves at lower temperatures, where the harmonic approximation underlying Kramers becomes more accurate.

Arrhenius plot of LJ7 transition rate across temperatures
Figure 6. Arrhenius plot of the LJ7 transition rate $r_{\rm AB}$ across temperatures. Importance sampling agrees with Kramers theory across four orders of magnitude in rate, while the no-reweighting estimator deviates systematically and overestimates the rate at low temperature by nearly a factor of two.
Takeaway. A learned, imperfect importance function is sufficient for accurate rate and mechanism estimation, provided sampled paths are reweighted using the stopped-transition-operator formula and their weights are managed by a branching random walk. The framework gives both the right total rate and the right channel-resolved breakdown, and it generalizes across temperatures through transfer learning.

Future Directions

Ongoing and future work extends the framework toward larger atomistic systems and richer physical settings: protein conformational changes, crystal defect evolution and dislocation plasticity, and underdamped Langevin dynamics where inertia plays a role. There are also promising connections to maximum-diffusion reinforcement learning, score-based diffusion models, and quantum-walk acceleration of Fokker-Planck propagators that I am exploring as part of broader efforts to learn good importance functions automatically.

Reference

M. M. Kim and W. Cai, Accelerated Langevin Dynamics Simulation via Neural Network–Driven Importance Sampling, 2026.