Accelerating Langevin Dynamics Simulation

Neural network-driven importance sampling for accelerated Langevin dynamics

Accelerated Langevin Dynamics Simulation via Neural Network-Driven Importance Sampling

Michael M. Kim, Wei Cai

Department of Mechanical Engineering, Stanford University · 2026

Paper (PDF) arXiv Code BibTeX

Overview

Atomistic simulations are powerful tools for understanding mechanisms at the molecular scale, but they are fundamentally limited by the timescales accessible to brute-force integration. Most of the simulated time is spent inside metastable basins, while the rare transitions between basins—protein conformational changes, defect rearrangements, nucleation events—are precisely what governs long-term evolution.

My research develops an importance-sampling framework that accelerates Langevin dynamics simulations of these rare transitions, while still recovering the unbiased transition rates and the correct relative probabilities of competing reaction channels. The biasing is driven by a neural-network–parameterized importance function (a learned approximation of the committor), and statistical reliability is maintained by a branching random walk (BRW) algorithm that controls the variance of path weights.

The Rare Event Problem

Consider a system with position $\mathbf{x} \in \Omega$ evolving under overdamped Langevin dynamics at temperature $T$:

$$ d\mathbf{x}(t) = -\gamma^{-1}\,\nabla U(\mathbf{x}(t))\,dt + \sqrt{2\gamma^{-1}\beta^{-1}}\, d\mathbf{W}(t), \qquad \beta = 1/(k_B T). $$

The energy landscape $U(\mathbf{x})$ partitions $\Omega$ into deep basins corresponding to metastable states $\Omega_{\rm A}$ and $\Omega_{\rm B}$, separated by saddle points. Because the equilibrium distribution $\rho_{\rm eq}(\mathbf{x}) \propto e^{-\beta U(\mathbf{x})}$ is concentrated near the minima, a trajectory initialized in $\Omega_{\rm A}$ tends to remain there for very long times before making the rare jump into $\Omega_{\rm B}$.

Energy landscape with two metastable basins — **Figure 1.** Schematic of the configuration space partitioned into two metastable basins. Small regions A and B sit near the local minima; the dividing surface between basins passes through the saddle points $S_1$ and $S_2$. A failure path leaves region A and returns; a success path reaches region B before returning.

Defining small regions A and B around the energy minima lets us decompose any long trajectory into short paths—either failure paths (leave A, return to A before reaching B) or success paths (leave A, reach B before returning). The transition rate is then

$$ r_{\rm AB} = \langle t_{\rm FPT}\rangle^{-1} \approx J_{\rm A}\, P_{\rm AB}, $$

where $J_{\rm A} = \langle t_{\rm AA}\rangle^{-1}$ is the flux out of region A (the Hill relation) and $P_{\rm AB}$ is the success probability—the rare quantity we want to estimate. For genuine rare events $P_{\rm AB} \ll 1$, so brute-force sampling is hopeless.

Decomposition of a long trajectory into failure and success paths — **Figure 2.** A long-time trajectory decomposed into many short paths. Red curves are failure paths returning to A; the blue curve is a successful path reaching B. The rare event problem is equivalent to the overwhelming probability of generating failure paths versus the vanishingly small probability of generating a success path.

Approach

The framework modifies the dynamics through an importance function $I(\mathbf{x})$ that biases the transition kernel toward successful trajectories. The optimal choice is the committor function—the probability that a path starting at $\mathbf{x}$ reaches B before A—but the committor is the solution of a high-dimensional PDE on a complex landscape and is generally intractable. Two ideas make the framework practical:

Neural-network parameterization of the importance function. $I(\mathbf{x};\theta)$ is represented by a neural network and trained adaptively. For atomistic systems the network respects physical symmetries (translation, rotation, atom-permutation) via a SchNet-style graph neural network.
Unbiased rate estimator that tolerates approximation error. Even when $I(\mathbf{x};\theta)$ is not exactly optimal, weighting each sampled path $\Gamma$ by $W(\Gamma)$ derived from the stopped transition operator yields an unbiased estimator of $P_{\rm AB}$ and of the rate $r_{\rm AB}$. Crucially, the relative probabilities of competing reaction channels are also preserved.
Branching random walk (BRW) for variance control. Naive importance sampling produces path weights with extreme variance—a few paths dominate the estimator. The BRW algorithm splits over-weighted paths and prunes under-weighted ones, keeping the weight distribution within a controlled range and dramatically reducing the variance of $\hat{r}_{\rm AB}$.
Reverse path sampling and bidirectional rates. The same neural-network importance function is used to sample both A→B and B→A transitions, enabling consistent estimation of forward and reverse rates without a separate biasing strategy for each direction.

Results

2D Potential with Two Reaction Channels

The first benchmark is a two-dimensional potential with two saddle points and asymmetric barriers (1.02 eV and 0.98 eV). The neural network learns the importance function from short biased trajectories, and the framework recovers both the total transition rate and the relative weight of each channel in close agreement with the analytical and brute-force references. This is a clean setting to verify that errors in the learned importance function are absorbed by the path-weight reweighting.

Neural network importance function on the 2D two-channel system — **Figure 3.** Neural-network importance function trained on the 2D two-channel system. The learned $I(\mathbf{x};\theta)$ smoothly interpolates between the two basins and captures the structure of the committor across both reaction channels.

14-Dimensional Rugged Potential

Scaling up, the framework is applied to a 14-dimensional rugged potential constructed by embedding the two-channel system into a high-dimensional space with additional rough directions. Importance sampling continues to recover accurate transition rates and channel fractions, demonstrating that the neural-network parameterization scales beyond the regime where exact committor PDE solvers are feasible.

The LJ7 Atomistic System

The most demanding benchmark is the system of seven Lennard-Jones discs (LJ7)—a 14-degree-of- freedom atomistic model whose local energy minima fall into four collections $C_0,C_1,C_2,C_3$ (after quotienting out continuous symmetries). A soft-core form of the LJ potential is used to regularize the $r_{ij}\to 0$ singularity:

$$ U_{\rm LJ7}^{\rm soft}(\{\mathbf{x}_i\}) = 4\varepsilon\sum_{i

Metastable states are labeled by the second and third central moments $(\mu_2, \mu_3)$ of the atomic coordination numbers. The transition of interest goes from the ground state $C_3$ (region A) to the higher-energy $C_2$ family (region B), which itself decomposes into four distinct transition modes $M_1$–$M_4$ with slightly different barriers (0.49 eV vs. 0.52 eV).

Soft Lennard-Jones pair potential — **Figure 4.** Left: soft-core LJ pair potential, regularized to avoid divergent forces. Right: the four families of LJ7 local minima projected onto the $(\mu_2,\mu_3)$ collective variable space, with representative configurations.

LJ7 metastable states in collective-variable space — **Figure 4.** Left: soft-core LJ pair potential, regularized to avoid divergent forces. Right: the four families of LJ7 local minima projected onto the $(\mu_2,\mu_3)$ collective variable space, with representative configurations.

At $T=600$ K the importance-sampled rate is $r_{\rm AB} = (4.32 \pm 0.01)\times 10^{-4}$ fs$^{-1}$, in excellent agreement with the brute-force rate $4.41\times 10^{-4}$ fs$^{-1}$. Without the path-weight reweighting—using $I(\mathbf{x};\theta)$ as if it were the exact optimal importance function—the rate is underestimated by roughly 40%, and the relative weights of the four transition modes collapse to a near-uniform distribution rather than the correct asymmetric pattern. Reweighting is what allows the framework to distinguish the lower-barrier modes $M_1, M_2$ (~32% each) from the higher-barrier modes $M_3, M_4$ (~18% each).

Sampled transition paths for modes M1 and M2 — **Figure 5.** Sampled transition paths for the four LJ7 transition modes, projected onto the $(\mu_2, \mu_3)$ collective variables and colored by the learned importance function value.

Sampled transition paths for modes M3 and M4 — **Figure 5.** Sampled transition paths for the four LJ7 transition modes, projected onto the $(\mu_2, \mu_3)$ collective variables and colored by the learned importance function value.

Transfer Learning Across Temperatures

The optimal importance function is temperature-dependent, so naively the network would have to be retrained at every temperature of interest. Instead, the network trained at $T=600$ K is used as a warm start for $T=500, 400, 300$ K via transfer learning. This recovers Kramers-theory-consistent rates spanning four orders of magnitude—down to $r_{\rm AB} \approx 3\times 10^{-8}$ fs$^{-1}$ at 300 K—at temperatures where brute-force Langevin dynamics cannot accumulate any transitions in feasible wall-clock time. The agreement with Kramers theory in fact improves at lower temperatures, where the harmonic approximation underlying Kramers becomes more accurate.

Arrhenius plot of LJ7 transition rate across temperatures — **Figure 6.** Arrhenius plot of the LJ7 transition rate $r_{\rm AB}$ across temperatures. Importance sampling agrees with Kramers theory across four orders of magnitude in rate, while the no-reweighting estimator deviates systematically and overestimates the rate at low temperature by nearly a factor of two.

Takeaway. A learned, imperfect importance function is sufficient for accurate rate and mechanism estimation, provided sampled paths are reweighted using the stopped-transition-operator formula and their weights are managed by a branching random walk. The framework gives both the right total rate and the right channel-resolved breakdown, and it generalizes across temperatures through transfer learning.

Future Directions

Ongoing and future work extends the framework toward larger atomistic systems and richer physical settings: protein conformational changes, crystal defect evolution and dislocation plasticity, and underdamped Langevin dynamics where inertia plays a role. There are also promising connections to maximum-diffusion reinforcement learning, score-based diffusion models, and quantum-walk acceleration of Fokker-Planck propagators that I am exploring as part of broader efforts to learn good importance functions automatically.

Reference

M. M. Kim and W. Cai, Accelerated Langevin Dynamics Simulation via Neural Network–Driven Importance Sampling, 2026.