Introduction

Computational techniques are invaluable for exploring complex configurational and compositional spaces of molecular and material systems. The accuracy and efficiency, however, depend on the chosen computational methods. Ab initio molecular dynamics (MD) simulations using density-functional theory (DFT) provide accurate results but are computationally demanding. Atomistic simulations with classical force fields offer a faster alternative but often lack accuracy. Thus, developing accurate and computationally efficient interatomic potentials is a key challenge successfully addressed by machine-learned interatomic potentials (MLIPs)1,2,3,4,5. An essential component of any MLIP is the accurate encoding of the atomic system by a local representation, which depends on configurational (atomic positions) and compositional (atomic types) degrees of freedom6. Recently, a wide range of MLIPs have been introduced, comprising linear and kernel-based models7,8,9,10, Gaussian approximation11,12, and neural network (NN) interatomic potentials13,14,15,16,17, including graph NNs18,19,20,21,22,23,24, all demonstrating remarkable success in atomistic simulations.

The effectiveness of MLIPs, however, crucially relies on training data sufficiently covering configurational and compositional spaces25,26. Without such training data, MLIPs cannot faithfully reproduce the underlying physics. An open challenge, therefore, is the generation of comprehensive training data sets for MLIPs, covering relevant configurational and compositional spaces and ensuring that resulting MLIPs are uniformly accurate across these spaces. This objective must be realized while reducing the number of expensive DFT evaluations, which provide reference energies, atomic forces, and stresses. This challenge is further complicated by the limited knowledge of physical conditions, such as temperature and pressure, at which configurational changes occur. Setting temperatures and pressures excessively high can result in atomic system degradation before exploring the relevant phase space.

To address this challenge, iterative active learning (AL) algorithms are used to improve the accuracy of MLIPs by providing an augmented data set27,28,29,30,31,32,33,34; see Fig. 1(a). They select the data most informative to the model, i.e., atomic configurations with high energy and force uncertainties, as estimated by the model. This data is drawn from configurational and compositional spaces explored during, e.g., MD simulations. Reference DFT energies, atomic forces, and stresses are evaluated for the selected configurations. Furthermore, energy and force uncertainties indicate the onset of extrapolative regions—regions where unreliable predictions are made—prompting the termination of MD simulations and the evaluation of reference DFT values. In this AL setting, covering the configurational space and exploring extrapolative configurations might require running longer MD simulations and defining physical conditions for observing slow configurational changes (rare events).

Fig. 1: A schematic overview of an AL algorithm for MLIP training.
figure 1

Training structures are selected from data gathered during biased or unbiased MD simulations. a An AL experiment begins with training an MLIP in the first iteration using a small set of randomly perturbed initial configurations. The current MLIP is employed in each iteration to run parallel MD simulations. Each simulation continues until it reaches a predefined uncertainty threshold. Then, a batch of configurations is selected from all trajectories. Reference energies and forces of these samples are evaluated using a DFT solver, updating the training data set. The updated data set is employed for training the MLIP in the next iteration. b Adaptive biasing strategies like metadynamics enhance the exploration of the configurational space. In metadynamics, exploration along manually defined CVs is facilitated by adding Gaussian functions to a history-dependent bias (areas filled by blue, orange, and red colors). However, even for well-defined CVs, exploring the configurational space of interest may require long simulation times due to the diffusive motion along these CVs. c Uncertainty-biased MD aims to minimize uncertainty u (grey shaded area) related to the actual error, thereby facilitating the exploration of the configurational space. In uncertainty-biased MD, we subtract the MLIP’s energy uncertainty from the predicted energy (continuous black line) and run MD simulations using the altered energy surface (dashed black line). Curved lines denote distinct MD trajectories. Unlike metadynamics, uncertainty-biased MD operates without defining CVs and drives MD simulations toward high uncertainty regions in each iteration.

Alternatively, enhanced sampling methods can significantly speed up the exploration of the configurational space by using adaptive biasing strategies such as metadynamics35,36,37,38,39,40,41; see Fig. 1(b). However, metadynamics requires manually selecting a few collective variables (CVs) that are assumed to describe the system. The limited number of CVs restricts exploration, as they might miss relevant transitions and parts of the configurational space. In contrast, MD simulations biased toward regions of high uncertainty can enhance the discovery of extrapolative configurations42,43. A related work utilizes uncertainty gradients for adversarial training of MLIPs44,45. To obtain MLIPs that are uniformly accurate across the relevant configurational space, however, simultaneous exploration of rare events and extrapolative configurations is necessary. The extent to which uncertainty-biased MD can achieve this objective remains an unexplored research area.

This work demonstrates the capability of uncertainty-biased MD to explore the configurational space, including fast exploration of rare events and extrapolative regions; see Fig. 1(c). We achieve this by exploring the CVs of alanine dipeptide—a widely used model for protein backbone structure. To assess the coverage of the CV space, we introduce a measure using a tree-based weighted recursive space partitioning. Furthermore, we extend existing uncertainty-biased MD simulations by automatic differentiation (AD) and propose a biasing technique that utilizes bias stresses obtained by differentiating the model’s uncertainty with respect to infinitesimal strain deformations. We assess the efficiency of the proposed technique by running MD simulations in isothermal-isobaric (NpT) statistical ensemble and exploring cell parameters of MIL-53(Al)—a flexible metal-organic framework (MOF) featuring closed- and large-pore stable states. Both benchmark systems are often used in studies assessing enhanced sampling and data generation methods36,38,41,44.

A key ingredient of AL algorithms with dynamically generated candidate pools is a sensitive metric for detecting the onset of extrapolative regions. These regions are typically associated with large errors in MLIP predictions. However, MLIP uncertainties often underestimate actual errors46,47, resulting in the exploration of unphysical regions, negatively affecting MLIP training. Thus, calibrated uncertainties are crucial for generating high-quality MLIPs with AL, which involves configurations explored during MLIP-based MD47,48,49, but might be unnecessary in AL tasks that rely on relative uncertainties50,51,52. In our setting, we demonstrate that conformal prediction (CP) helps align the largest force error with its corresponding uncertainty value. This approach effectively makes MLIPs not underestimate force errors, which is important for preventing MD from exploring unphysical configurations. Thus, CP-based uncertainty calibration helps set reasonable uncertainty thresholds without limiting the exploration of the configurational space. In contrast, conventional approaches drive MD away from high-uncertainty regions, which can hinder exploration53.

Contrary to existing work42,43, which relies on ensembles of MLIPs for uncertainty quantification, we propose using ensemble-free uncertainties of NN-based MLIPs derived from gradient features50,51,52. These features can be interpreted as the sensitivity of a model’s output to parameter changes. Recent studies demonstrate that gradient-based uncertainties perform comparably to ensemble-based counterparts in AL51,52,54. Furthermore, they yield the exact posterior in the case of linear models9,10. We demonstrate that gradient features can define uncertainties of total and atom-based properties, such as energy and atomic forces. To make gradient-based uncertainties computationally efficient, we employ the sketching technique55 and reduce the dimensionality of gradient features. For many NN-based MLIPs, gradient-based approaches can significantly reduce the computational cost of uncertainty quantification and accelerate the time-consuming MD simulations compared to ensemble-based methods. However, the latter can be made computationally efficient, e.g., through parallelization or employing specific settings with non-trainable descriptors and gradient-free force uncertainties45.

We further enhance configurational space exploration and improve the computational efficiency of AL by employing batch selection algorithms51,52. These algorithms simultaneously select multiple atomic configurations from trajectories generated during parallel MD simulations. Batch selection algorithms enforce the informativeness and diversity of the selected atomic structures. Thus, they ensure the construction of maximally diverse training data sets.

Results

Overview

In the following, we first demonstrate the necessity of uncertainty calibration on an example of MIL-53(Al) to constrain MD to physically reasonable regions of the configurational space. Then, we present two complementary analyses demonstrating the improved data efficiency of MLIPs obtained by our AL approach, developing MLIPs for alanine dipeptide and MIL-53(Al). Furthermore, we investigate how uncertainty-biased MD enhances the exploration of the configurational space, utilizing bias forces and stress. To benchmark our results, we draw a comparison with MD run at elevated temperatures and pressures as well as metadynamics simulations. The details on the ensemble-free uncertainties (distance- and posterior-based ones) and uncertainty-biased MD can be found in Methods.

Calibrating uncertainties with conformal prediction

Total and atom-based uncertainties are typically poorly calibrated47, meaning that they often underestimate actual errors. The underestimation of atomic force errors is particularly dangerous when dynamically generating candidate pools, as it may result in exploring unphysical configurations with extremely large errors in predicted forces. These unphysical configurations often cause convergence issues in reference DFT calculations. Additionally, poor calibration complicates defining an appropriate uncertainty threshold for prompting the termination of MD simulations and the evaluation of reference DFT energies, atomic forces, and stresses. To address this issue, we utilize inductive CP, which computes a re-scaling factor based on predicted uncertainties and prediction errors on a calibration set. The confidence level 1 − α in CP is defined such that the probability of underestimating the error is at most α on data drawn from the same distribution as the calibration set. For more details, see Methods.

Figure 2 demonstrates the correlation of maximal atom-based uncertainties, \(\mathop{\max }\limits_{i}{u}_{i}\), with maximal atomic force RMSEs, \(\mathop{\max }\limits_{i}\sqrt{\frac{1}{3}\mathop{\sum }\nolimits_{k = 1}^{3}{(\Delta {F}_{i,k})}^{2}}\), for the MIL-53(Al) test data set from ref. 41 based on numerous first principles MD trajectories at 600 K. We chose maximal atomic force RMSE as our target metric to identify extrapolative regions due to its high sensitivity to unphysical local atomic environments. In MLIP-based atomistic simulations, we model it using maximal atom-based uncertainty. Employing quantiles or averages of atomic force RMSE could extend simulation time by reducing sensitivity to extreme values; however, exploring these alternatives is left for future work.

Fig. 2: Correlation of maximal atom-based uncertainties with maximal atomic force RMSEs for MIL-53(Al).
figure 2

The results are presented for the test data set from ref. 41. All uncertainty quantification methods are calibrated using CP and atomic force RMSEs. The top row shows the results of MLIPs trained using 45 atomic configurations, while five are used for early stopping and uncertainty calibration. The bottom row shows the results obtained with 450 and 50 MIL-53(Al) configurations, respectively. The training and validation data are taken from ref. 41. Transparent hexbin points represent uncertainties calibrated with α = 0.5 (low confidence; see Methods), while opaque ones denote uncertainties calibrated with α = 0.05 (high confidence). Calibrating uncertainties with a high confidence level helps align the largest actual error with the corresponding uncertainty, shifting the hexbin points to or below the red diagonal line. This alignment is crucial for identifying unreliable predictions and prompting the termination of MD simulations.

In Fig. 2, transparent hexbins represent uncertainties calibrated with a lower confidence (α = 0.5; see Methods), while opaque ones depict those calibrated with a higher confidence (α = 0.05). The presented uncertainties are derived from gradient features or an ensemble of three MLIPs and calibrated using CP with atomic force RMSEs49. For posterior- and distance-based uncertainties, which are unitless, the re-scaling with CP ensures that the resulting uncertainties are provided in correct units, i.e., eV Å−1. Ensemble-based uncertainty quantification already provides correct units, which CP preserves. Equivalent results for alanine dipeptide, including the correlation between average uncertainties and average force RMSEs, can be found in the Supplementary Information.

Figure 2 (top) demonstrates results for MLIPs trained on 45 MIL-53(Al) configurations, while five samples were used for early stopping and uncertainty calibration. Figure 2 (bottom) shows the results for MLIPs trained and validated on 450 and 50 MIL-53(Al) configurations, respectively. In both experiments, the training and validation samples were selected from the data sets provided by ref. 41. The first 50 samples correspond to randomly perturbed structures, while the remaining 450 are generated using metadynamics combined with incremental learning41. The latter is an iterative algorithm that improves MLIPs by training on configurations generated sequentially over time, using the last frame of atomistic simulations.

We observe that uncertainties calibrated with a lower confidence level often underestimate actual errors. In this case, MD can explore unphysical regions before reaching the uncertainty threshold, especially in cases with a weak correlation between uncertainties and actual errors. By employing CP with higher confidence, we help align the largest prediction error with the corresponding uncertainty, thereby improving its ability to identify the onset of extrapolative regions. This alignment becomes apparent in Fig. 2, where CP shifts the hexbin points to be on or below the diagonal.

In Fig. 2 (top), we find that even training and calibrating models with a few randomly perturbed atomic configurations is sufficient for robust identification of unreliable predictions. This result is crucial as we rely on such data sets to initialize our AL experiments, eliminating the need for predefined data sets42,43. Furthermore, we observe that, for MIL-53(Al), calibrated uncertainties from model ensembles tend to overestimate the actual error to a greater extent than gradient-based approaches. While this may not be critical when exploring unphysical configurations, it can prematurely terminate MD simulations. This trend is consistent across all training and calibration data sizes. Lastly, the results provided here and in the Supplementary Information demonstrate that all uncertainty methods perform comparably regarding Pearson and Spearman correlation coefficients.

Performance of bias-forces-driven active learning

Exploring the configurational space of complex molecular systems, particularly those with multiple stable states, is essential for developing accurate and robust MLIPs. We apply bias-forces-driven MD combined with AL to develop MLIPs for alanine dipeptide in vacuum. This dipeptide exhibits two stable conformers characterized by the backbone dihedral angles ϕ and ψ (see Fig. 3): the C7eq state with ϕ ≈ − 1.5 rad and ψ ≈ 1.19 rad and the Cax state with ϕ ≈ 0.9 rad and ψ ≈ − 0.9 rad56. We use unbiased MD as the baseline for generating candidate pools in two scenarios: AL with candidates selected from unbiased MD trajectories based on their uncertainty (and diversity) and candidates sampled from them at random. The performance of MLIPs is assessed employing the test data obtained from a long MD trajectory at 1200 K; see Methods. We employ the AMBER ff19SB force field for reference energy and force calculations57, as implemented in the TorchMD package using PyTorch58,59.

Fig. 3: Comparison of AL approaches employing biased and unbiased MD simulations to generate the candidate pool of atomic configurations for alanine dipeptide.
figure 3

Results are provided for the posterior-based uncertainty quantification derived from sketched gradient features. Unlike unbiased MD simulations, which rely on atom-based uncertainties to terminate MD simulations, biased MD simulations use total and atom-based uncertainties to bias MD simulations and prompt their termination, respectively. We use three metrics to assess the performance of our AL approaches: (a) Coverage of the CV space; (b) Energy RMSE; and (c) Force RMSE. All RMSEs are evaluated on the alanine dipeptide test data set; see Methods. Shaded areas denote the standard deviation across five independent runs. The alanine dipeptide molecule, including its CVs, is shown as an inset in (a). The color code of the inset molecule is C grey, O red, N blue, and H white. d Ramachandran plots demonstrating the CV spaces explored by the four AL experiments. Biased MD simulations achieve exceptional performance, close to those of MD conducted at 1200 K, without knowledge of temperatures that accelerate transitions between stable states. The CV space covered by uncertainty-biased MD simulations at 300 K matches that of unbiased simulations at 1200 K, significantly outperforming the coverage achieved by unbiased MD at 300 K and 600 K.

Figure 3 demonstrates the performance of MLIPs obtained for alanine dipeptide depending on the number of acquired configurations. Table 1 presents error metrics evaluated for MLIPs at the end of each experiment. Here, we provide results for the posterior-based uncertainty and uncertainty-biased MD at 300 K. The Supplementary Information presents equivalent results for other uncertainty methods and temperatures. Figure 3a presents the coverage of the CV space defined by ϕ and ψ, calculated using all MD trajectories up to the current AL step. We measure the coverage of the respective space by a tree-based weighted recursive space partitioning; see Methods. AL experiments combined with unbiased MD at 1200 K serve as the upper-performance limit for MLIPs in the case of alanine dipeptide, achieving the highest coverage of 0.97 after acquiring 512 configurations. Increasing temperature even further while using interatomic potentials, which allow for bond breaking and formation, may lead to the degradation of the molecule. Uncertainty-biased MD simulations at 300 K result in slightly lower coverage values, surpassing the coverages achieved by unbiased MD at 300 K and 600 K.

Table 1 CV space coverage, atomic energy (E-) and atomic force (F-) RMSEs, as well as position (Pos.) and uncertainty (Unc.) auto-correlation times (ACTs) for alanine dipeptide experiments conducted with posterior-based uncertainties

Furthermore, biased MD at 300 K outperforms unbiased dynamics at 1200 K, efficiently covering the CV space before acquiring ~ 200 configurations. This observation is attributed to the gradual increase in driving forces induced by the uncertainty bias, resulting in a more gradual distortion of the atomic structure. In contrast, high-temperature unbiased simulations perturb the system more strongly and rapidly enter extrapolative regions without exploring relevant configurational changes. Thus, high-temperature simulations may also cause the degradation of the investigated atomic systems, unlike uncertainty-biased dynamics applied at mild physical conditions.

Figure 3b, c present energy and force RMSEs evaluated on the alanine dipeptide test data set; see Methods. Consistent with the findings in Fig. 3a, AL approaches combined with biased MD at 300 K outperform their unbiased counterparts at 300 K and 600 K once they acquire ~ 100 configurations. Biased AL experiments achieve energy RMSE of 1.97 meV atom−1, close to those observed in high-temperature MD simulations, surpassing others by a factor of more than 13. A similar trend is observed for force RMSE. Biased AL experiments achieve an RMSE of 0.071 eV Å−1, outperforming their counterparts at 300 K and 600 K by factors of 2.1 and 1.6, respectively.

These results demonstrate the efficiency of uncertainty-biased dynamics in exploring the configurational space and developing accurate and robust MLIPs. Moreover, generating training data that sufficiently covers the configurational space by combining AL with biased MD does not significantly increase the computational demand compared to conventional AL with unbiased MD; see the Supplementary Information. Lastly, MLIPs trained with candidates selected based on their uncertainty (and diversity) from biased and unbiased MD trajectories systematically outperform MLIPs trained with candidates selected at random; see Table 1.

Biased AL experiments achieve exceptional performance without knowledge of temperatures that accelerate transitions between stable states; see Fig. 3d. Identifying these temperatures requires running MD simulations at different conditions to explore the configurational space without degrading the atomic system. In contrast, given the mild physical conditions such as temperatures of 300 K and 600 K, biased MD simulations outperform their unbiased counterparts at 300 K and 600 K and achieve comparable performance to experiments at 1200 K for τ 0.5 and 0.2 τ 0.4, respectively. The available range of biasing strength values may be more restricted at more extreme conditions. Adding uncertainty bias to MD at 1200 K results in an even stronger system perturbation than during unbiased MD without yielding any improvement. For additional details, see the Supplementary Information.

Our results offer evidence of rare event exploration (the exploration of both stable states of alanine dipeptide) through uncertainty-biased dynamics. The following section will present a detailed analysis of the exploration rates. Additionally, we have identified how to further improve our biased MD simulations by making biasing strengths species dependent; see the Supplementary Information. The results presented in this section, achieved with a biasing strength of zero for hydrogen atoms, outperform settings where all atoms are biased equally, with improvements by a factor of 1.08 in coverage and 1.15 in force RMSE; see Table 1. Thus, a more sophisticated data-driven redistribution of biasing strengths can further enhance the performance of bias-forces-driven MD simulations. However, learning species-dependent biasing strengths necessitates defining a suitable loss function that promotes the fast exploration of phase space60, which falls beyond the scope of this work.

Exploration rates for collective variables of alanine dipeptide

We have observed that uncertainty-biased MD simulations effectively explore the configurational space of alanine dipeptide, defined by its CVs. Figure 4 evaluates the extent to which the introduced bias forces in MD simulations accelerate their exploration. In Fig. 4a, we present the coverage of the CV space as a function of simulation time, i.e., of the effective number of MD steps. The figure demonstrates that uncertainty-biased AL experiments at 300 K outperform unbiased experiments at 300 K and 600 K. They achieve the same coverage in considerably shorter simulation times, thereby enhancing exploration rates by a factor of larger than two. At the same time, biased MD simulations yield results comparable to those obtained from unbiased MD simulations at 1200 K. Thus, uncertainty-biased MD explores configurational space at a similar rate to unbiased MD at 1200 K.

Fig. 4: Evaluation of CV space exploration rates for biased and unbiased MD simulations of alanine dipeptide.
figure 4

Here, MD simulations generate candidate pools of atomic configurations for AL algorithms. Results are provided for the posterior-based uncertainty quantification derived from sketched gradient features. Unlike unbiased MD simulations, which rely on atom-based uncertainties to terminate MD simulations, biased MD simulations use total and atom-based uncertainties to bias MD simulations and prompt their termination, respectively. We use three metrics to asses the exploration rates: (a) Coverage of the CV space; (b) Auto-correlation functions of atomic positions; and (c) Auto-correlation functions of atom-based uncertainties. Shaded areas denote the standard deviation across five independent runs. d Time evolution of the maximal atom-based uncertainty within an AL iteration and the entire experiment. Time evolution is shown for one of the eight MD simulations. The dashed gray line represents the uncertainty threshold of 1.5 eV Å−1. The insets show configurations that reached the uncertainty threshold for uncertainty-biased MD. e Ramachandran plots illustrate the exploration of the CV space over AL iterations and the entire experiment. Ramachandran plots are presented for unbiased MD simulations at 300 K and 1200 K and biased MD simulations at 300 K. Simulation time refers to the effective number of MD steps ( × 0.5 fs) required to reach the final coverage, while lag time denotes the time interval between two successive MD frames. Biased MD simulations at 300 K exhibit at least two times higher exploration rates than their unbiased counterparts at 300 K and 600 K. Their exploration rates are comparable to those of unbiased MD simulations at 1200 K, with the advantage of gradually distorting the molecule, reducing the risk of its degradation.

The exploration rates estimated from Fig. 4a provide an approximate measure of how uncertainty-biased dynamics accelerate the exploration of configurational space. To offer a more thorough assessment, we examine auto-correlation functions (ACFs) computed for both position and uncertainty spaces in Fig. 4b, c. Here, a faster decay corresponds to a faster exploration of the respective space. We compute ACFs using MD trajectories from all AL iterations. Additionally, we calculate the auto-correlation time (ACT) for each experiment. For the definition of ACF and ACT, see Methods. Table 1 presents ACTs for all AL experiments. Smaller ACTs correspond to a faster decay of ACFs, indicating a faster exploration of the respective spaces.

ACTs demonstrate that uncertainty-biased MD at 300 K explores position and uncertainty spaces two to six times faster than unbiased MD at 300 K and 600 K. Compared to unbiased MD at 1200 K, it achieves comparable exploration rates in the position space and rates lower by a factor of two for the uncertainty space. Biasing hydrogen atoms reduces the uncertainty ACT compared to experiments with zero hydrogen biasing strength but increases the position ACT by a factor of three. Thus, stronger atomic bond distortions, resulting in fast exploration of extrapolative regions, can explain a shorter uncertainty ACT of unbiased MD at 1200 K. While this effect can be unfavorable for promoting the exploration of rare events in biased MD, incorporating small, non-zero biasing strengths for hydrogen atoms may be necessary to ensure the robustness of MD simulations at elevated temperatures. Interestingly, we observe that uncertainty-biased MD explores both stable states in alanine dipeptide, even though 27 degrees of freedom (C, N, and O atoms) were effectively biased, demonstrating its remarkable efficiency.

To gain insight into the exploration of the CV space during AL, we refer to Fig. 4d, e, which illustrate the time evolution of the maximal atom-based uncertainty and the CV space coverage for selected AL iterations. Biased MD systematically explores configurations with higher uncertainty values than unbiased MD at 300 K and 600 K. Furthermore, bias forces drive the exploration of both stable states of alanine dipeptide and promote transitions between them, similar to higher temperatures in unbiased MD. Later AL iterations in Fig. 4d, e demonstrate that MD driven by bias forces reduces the uncertainty level uniformly across the configurational space. Thus, given the correlation between uncertainties and actual errors, uncertainty-biased MD generates MLIPs uniformly accurate across the configurational space.

Performance of bias-stress-driven active learning

Generating training data for bulk material systems with large unit cells and multiple stable states poses a significant challenge in developing MLIPs. Therefore, we assess the performance of the bias-stress-driven AL applied to MIL-53(Al), a flexible MOF that undergoes reversible, large-amplitude volume changes under external stimuli, such as temperature and pressure (see Fig. 5). MIL-53(Al) features two stable phases: the closed-pore state with a unit cell volume of V ~ 830 Å3 and the large-pore state with V ~ 1419 Å3. For reference energy, force, and stress calculations, we use the CP2K simulation package (version 2023.1)61 and DFT at the PBE-D3(BJ) level62,63. Our baseline for generating candidate pools for AL involves unbiased MD and training data selected based on their uncertainty (and diversity) or at random. We also employ metadynamics41, which uses an adaptive biasing strategy for cell parameters of MIL-53(Al), as a baseline. We assess the performance of MLIPs for MIL-53(Al) using the test data set presented by ref. 41.

Fig. 5: Comparison of AL approaches employing biased and unbiased MD simulations to generate the candidate pool of atomic configurations for MIL-53(Al).
figure 5

Results are provided for the posterior-based uncertainty quantification derived from sketched gradient features. Unlike unbiased MD simulations, which rely on atom-based uncertainties to terminate MD simulations, biased MD simulations use total and atom-based uncertainties to bias MD simulations and prompt their termination, respectively. We use three metrics to assess the performance of our AL approaches: (a) Energy RMSE; (b) Force RMSE; and (c) Stress RMSE. All RMSEs are evaluated on the MIL-53(Al) test data set41. Shaded areas denote the standard deviation across three independent runs, except for metadynamics. For it, shaded areas denote standard deviation across three randomly initialized MLIPs. d Volume distribution for atomic configurations acquired during MD at 600 K, along with volume-dependent energy, force, and stress RMSEs. e Volume distribution for configurations acquired during MD at 300 K, along with volume-dependent energy, force, and stress RMSEs. We employ a temperature of 300 K to reduce the probability of exploring the large-pore state of MIL-53(Al). Bias-stress-driven MD simulations outperform metadynamics-based simulations with adaptive biasing of the cell parameters. Metadynamics aims to cover the volume space uniformly. In contrast, uncertainty-biased MD generates training data sets that uniformly reduce energy, force, and stress RMSEs. Additionally, biased MD simulations enhance the exploration of closed- and large-pore states of MIL-53(Al) shown in the inset of (d).

Figure 5a–c demonstrate the performance of MLIPs developed for MIL-53(Al) depending on the number of acquired configurations. Table 2 presents error metrics evaluated for MLIPs at the end of each experiment. Here, we present results for the posterior-based uncertainty. The Supplementary Information presents equivalent results for other uncertainty methods and pressures. We observe that MLIPs trained with configurations generated using metadynamics outperform the others for data set sizes below ~ 200 samples. This difference in performance can be attributed to how perturbed configurations are generated and the differing experimental settings between incremental learning and AL applied here. Bias-stress-driven AL outperforms metadynamics-based experiments after acquiring ~ 200 configurations regarding force and stress RMSEs.

Table 2 Atomic energy (E-), atomic force (F-), and stress (S-) RMSEs, as well as position (Pos.) and uncertainty (Unc.) auto-correlation times (ACTs) for MIL-53(Al) experiments conducted with posterior-based uncertainties

Metadynamics-based experiments achieve performance on par with unbiased AL experiments conducted at 0 MPa after they reach a data set size of ~ 200 configurations. For uncertainty-biased MD, the force RMSE improves by a factor of 1.14, and the stress RMSE improves by a factor of two compared to zero-pressure unbiased MD. Furthermore, AL experiments with biased MD simulations outperform unbiased MD simulations at 250 MPa regarding stress RMSE. Thus, bias-stress-driven MD generates a data set that better represents the relevant configurational space of flexible MOFs compared to MLIPs trained with conventional MD and metadynamics simulations. This improvement is achieved without significantly increasing the computational cost of data generation; see the Supplementary Information. Lastly, similar to the results obtained for alanine dipeptide, AL with a more advanced selection strategy outperforms experiments where training data is picked at random; see Table 2.

Figure 5d, e show the main advantage of biased MD simulations over unbiased and metadynamics-based approaches. While exploring the large-pore state less frequently than metadynamics-based counterparts, bias-stress-driven MD spans a broader range of volumes and uniformly reduces energy, force, and stress RMSEs across the entire volume space. Compared to zero-pressure unbiased MD simulations, it promotes the exploration of the large-pore state. However, this state can be modeled using atomic environments from the closed-pore one. Thus bias stress does not excessively favor exploration of the former. Instead, it drives the dynamics more toward smaller volumes, for which all other approaches tend to predict energy, force, and stress values with larger errors. Note that, in Fig. 5e, we reduce the temperature to 300 K and initiate AL experiments with 256 configurations, each having a unit cell volume below 1200 Å3 (drawn from the training data in ref. 41). Using a lower temperature and learning the configurational space around the closed-pore state is required to decrease the probability of MD simulations exploring the large-pore stable state of MIL-53(Al). In contrast, we found that using randomly perturbed atomic configurations can lead to underestimated energy barriers by MLIPs, thus facilitating the transition between both stable phases in initial AL iterations.

These results show that uncertainty-biased MD simulations aim to uniformly reduce errors across the relevant configurational space and promote the simultaneous exploration of extrapolative regions and transitions between stable states. Also, under selected physical conditions (T = 600 K and p = 0 MPa), the performance of our uncertainty-biased MD exhibits low sensitivity to stress biasing strength values for τ ≥ 0.5; see the Supplementary Information. Metadynamics, in contrast, may require longer simulation times to generate equivalent candidate pools as it focuses on generating configurations uniformly distributed in the CV space, which is unnecessary for developing MLIPs.

Exploration rates for cell parameters of MIL-53(Al)

Figure 6 assesses the extent to which uncertainty-biased (bias stress) MD simulations enhance the exploration of the extensive volume space of MIL-53(Al). In Fig. 6a, we observe a higher frequency of transitions between stable phases for biased MD simulations than for zero-pressure counterparts. Additionally, uncertainty-biased simulations favor the exploration of smaller MIL-53(Al) volumes, in line with the results shown in Fig. 5. Figure 6b, c present ACFs for position and uncertainty spaces, with estimated ACTs provided in Table 2. Here, a faster decay of ACFs corresponds to shorter ACTs and indicates a faster exploration of the respective space. These results indicate that bias-stress-driven MD is at least as efficient as high-pressure MD simulations in exploring both spaces. Figure 6d demonstrates the time evolution of energy, force, and stress RMSEs. It reveals that local atomic environments in the large-pore state are well represented by those in the closed-pore state, explaining the stronger preference for smaller volumes by biased MD; see Figs. 6a and 5d, e. This effect is evident from the low force and stress RMSEs in the early AL iterations for the large-pore state, even though this state has not been explored yet. Furthermore, uncertainty-biased MD simulations surpass the performance of their counterparts already in the early stages by aiming to reduce errors across the test volume space uniformly.

Fig. 6: Evaluation of configurational space exploration rates for biased and unbiased MD simulations of MIL-53(Al).
figure 6

Here, MD simulations generate candidate pools of atomic configurations for AL algorithms. Results are provided for the posterior-based uncertainty quantification derived from sketched gradient features. Unlike unbiased MD simulations, which rely on atom-based uncertainties to terminate MD simulations, biased MD simulations use total and atom-based uncertainties to bias MD simulations and prompt their termination, respectively. We use three metrics to asses the exploration rates: (a) Volume distribution of configurations sampled throughout the experiment; (b) Auto-correlation functions for positions; and (c) Auto-correlation functions for atom-based uncertainties. Shaded areas denote the standard deviation across three independent runs. d Time evolution of the volume distribution of configurations acquired during training and of energy, force, and stress RMSEs evaluated on the test data set41 depending on the unit cell volume. Bias-stress-driven MD simulations achieve exploration rates comparable to those of high-pressure unbiased MD simulations. They aim to reduce RMSEs uniformly across the entire volume space, even in the early stages of AL, surpassing the performance of unbiased simulations.

From these results and the findings in Fig. 5d, we conclude that bias-stress-driven MD significantly enhances the exploration of the relevant configurational space, including rare events (i.e., transitions between stable phases). However, in Table 2, we obtained longer ACTs for biased MD at 300 K compared to its unbiased counterparts, which contradicts our previous arguments. When examining the ACF shown in Fig. 7, it becomes evident that a stronger correlation in the position space results from the volume fluctuations induced in MIL-53(Al) by the bias stress. These fluctuations can be represented by a sine wave with additive random noise and a period twice the simulation’s length; see Methods. This observation implies that bias stress induces correlated motions in the MIL-53(Al) system, causing it to expand and contract alternately for half of the simulation time. This phenomenon results in periodic exploration of small and large volumes within the configurational space.

Fig. 7: Position ACF obtained by running biased and unbiased MD simulations at 300 K for MIL-53(Al).
figure 7

Shaded areas denote the standard deviation across three independent runs. We employ a temperature of 300 K to reduce the probability of exploring the large-pore state of MIL-53(Al). The ACF exhibits strongly correlated motions attributed to volume fluctuations induced by the bias stress. These fluctuations can be modeled by a sine wave with a period twice the length of the simulation. The red line denotes a sine wave with a larger noise amplitude than the one denoted by the blue line.

In contrast to the conventional approaches, including the bias-forces-driven MD simulations, which aim for uncorrelated random-walk-like behavior of predetermined CVs to capture configurational changes, our method introduces correlated motion that explores the entire configurational space. Increasing the amplitude of random noise in the sine wave reduces the amplitude of these fluctuations in the ACF, similar to raising the temperature in an atomic system. This decrease in the amplitude explains why this effect is not observed in Fig. 6b.

Discussion

This work investigates an uncertainty-driven AL approach for data set generation, facilitating the development of high-quality MLIPs for chemically complex atomic systems. We employ uncertainty-biased MD simulations to generate candidate pools for AL algorithms. Our results show that applying uncertainty bias facilitates simultaneous exploration of extrapolative regions and rare events. Efficient exploration of both is crucial in constructing comprehensive training data sets, enabling the development of uniformly accurate MLIPs. In contrast, classical enhanced sampling techniques (e.g., metadynamics) or unbiased MD simulations at elevated temperatures and pressures often cannot simultaneously explore extrapolative regions and rare events. Enhanced sampling techniques were designed to ensure the reconstruction of the underlying Boltzmann distribution. However, this property is unnecessary for data set generation and may limit their effectiveness in this context.

The performance of enhanced sampling techniques depends on the manual definition of hyper-parameters, e.g., CVs for metadynamics. Setting them requires expert knowledge because the wrong choice can limit the range of explored configurations. Uncertainty-biased MD only needs to define an uncertainty threshold and biasing strength. Both parameters influence the exploration rate of configurational space without constraining the space that can be explored. Under milder conditions, uncertainty-biased MD simulations outperform their unbiased counterparts for a broad range of biasing strength values, making the latter’s choice more accessible. Yet, the dependence of the performance on the biasing strength value becomes more noticeable under extreme conditions, sometimes with no improvement by adding uncertainty bias to MD. A similar behavior can also be expected for metadynamics simulations64. Additionally, employing species-dependent biasing strength can restrict biasing in sensitive configurational regions, e.g., biasing hydrogen atoms.

Identifying extreme conditions like high temperatures and pressures can also accelerate phase space exploration in unbiased MD. However, a wrong choice of temperature and pressure may result in unphysical force predictions and degradation of the atomic system. In contrast, uncertainty-biased MD, conducted under milder conditions, explores relevant phase space at rates comparable to those obtained under extreme conditions and reduces the risk of degrading the atomic system. As mentioned, uncertainty-biased MD simulations outperform their unbiased counterparts for a broad range of biasing strength values in our setting. Furthermore, while evaluating uncertainty gradients increases the inference times by a factor of 1.4 to 1.7 compared to unbiased MD, applying uncertainty bias leads to, on average, shorter MD simulations. Thus, the difference in the computational cost between biased and unbiased MD is typically insignificant.

We compare uncertainty quantification methods, including the variance of an ensemble of MLIPs, and ensemble-free methods derived from sketched gradient features, focusing on configurational space exploration rates and generating uniformly accurate potentials; see the Supplementary Information. Overall, gradient-based approaches yield MLIPs with similar performance to those created using ensemble-based uncertainty while significantly reducing the computational cost of uncertainty quantification. For MIL-53(Al), we find that ensemble-based uncertainties overestimate the force error more strongly than gradient-based approaches, resulting in earlier termination of MD simulations and potentially worse configurational space exploration. For alanine dipeptide, using an ensemble of MLIPs improves their robustness during MD simulations, facilitating CV space exploration. Therefore, improving the robustness of a single MLIP during an MD simulation is a promising research direction65, combined with the proposed ensemble-free techniques.

While this study thoroughly investigates AL with uncertainty-biased MD for generating candidate pools, further research is still necessary. For example, one should analyze how well uncertainty-biased MD explores a configurational space with multiple stable states and how it identifies the respective slow modes using solely uncertainty bias. Also, assessing the uniform accuracy of resulting MLIPs and the enhanced exploration in higher-dimensional CV spaces remains challenging. Furthermore, the applicability of the proposed data generation approach to more complex molecular and material systems, such as biological polymers66 and multicomponent alloys5, is yet to be explored. Unlike MD, Monte Carlo simulations generally allow significant configurational changes, eliminating the need to explore intermediate transition paths. Combined with uncertainty bias, they might avoid exploring intermediate, low-uncertainty transition regions, improving the efficiency of uncertainty-driven data generation. Lastly, the extent to which MLIPs based on graph NNs can enhance the efficiency of the proposed data generation approach remains to be seen.

Methods

Machine-learned interatomic potentials

We define an atomic configuration, \(S={\{{{{{\bf{r}}}}}_{i},{Z}_{i}\}}_{i = 1}^{{N}_{{{{\rm{at}}}}}}\), where \({{{{\bf{r}}}}}_{i}\in {{\mathbb{R}}}^{3}\) are Cartesian coordinates and \({Z}_{i}\in {\mathbb{N}}\) is the atomic number of atom i, with a total of Nat atoms. Our focus lies on interatomic NN potentials, which map an atomic configuration to a scalar energy E. The mapping is denoted as \({f}_{{{{\boldsymbol{\theta }}}}}:S\,\mapsto\, E\in {\mathbb{R}}\), where θ denotes the trainable parameters. By assuming the locality of interatomic interactions, we decompose the total energy of the system into individual atomic contributions13

$$E\left(S,{{{\boldsymbol{\theta }}}}\right)=\mathop{\sum }\limits_{i=1}^{{N}_{{{{\rm{at}}}}}}{E}_{i}\left({S}_{i},{{{\boldsymbol{\theta }}}}\right),$$
(1)

where Si is the local environment of atom i, defined by the cutoff radius rc. The trainable parameters θ are learned from atomic data sets containing atomic configurations and their energies, atomic forces, and stress tensors.

Gradient-based uncertainties

We quantify the uncertainty of a trained MLIP by expanding its energy per atom Eat = E/Nat around the locally optimal parameters θ*50,51,52

$$E_{\mathrm{at}}\left(S, {\boldsymbol{\theta}}\right)\, \approx \,E_{\mathrm{at}}(S, {\boldsymbol{\theta}}^\ast) + ({\boldsymbol{\theta}} - {\boldsymbol{\theta}}^\ast)^\top\underbrace{\nabla_{{\boldsymbol{\theta}}} E_{\mathrm{at}}\left(S, {\boldsymbol{\theta}}\right)\Big|_{{\boldsymbol{\theta}} = {\boldsymbol{\theta}}^\ast}}_{=\phi\left(S\right)},$$
(2)

where S denotes an atomic configuration as defined in the previous section. Gradient features \(\phi \left(S\right)\in {{\mathbb{R}}}^{{N}_{{{{\rm{feat}}}}}}\) can be interpreted as the sensitivity of the energy to small parameter perturbations. Here, Nfeat is the number of trainable parameters of the MLIP. We employ the energy per atom Eat in Eq. (2), as it accounts for the extensive nature of the energy, whose value depends on the system size. This choice ensures that uncertainties defined using gradient features do not favor the selection of larger structures. Gradient features can also be expressed as the mean of their atomic contributions: \(\phi =\mathop{\sum }\nolimits_{i = 1}^{{N}_{{{{\rm{at}}}}}}{\phi }_{i}/{N}_{{{{\rm{at}}}}}\). For atomic gradient features ϕi, using the energy per atom in Eq. (2) is unnecessary. Here, we use \(\phi =\phi \left(S\right)\) and \({\phi }_{i}={\phi }_{i}\left({S}_{i}\right)\), with Si denoting the local environment of an atom i, to simplify the notation. Thus, gradient features can be used to quantify uncertainties in total and atom-based properties of an atomic system, such as energy and atomic forces, respectively.

Particularly, we define the atom-based model’s uncertainty (atomic forces) by employing squared distances between atomic gradient features

$${u}_{i}^{2}=\mathop{\min}\limits_{{\phi}_{j}\in {\Phi}_{{{{\rm{train}}}}}}\Vert{\phi}_{i}-{\phi}_{j}\Vert_{2}^{2}.$$
(3)

Alternatively, we consider Bayesian linear regression in Eq. (2) and compute the posterior uncertainty as

$${u}_{i}^{2}={\lambda }^{2}{\phi }_{i}^{\top }{\left({\Phi }_{{{{\rm{train}}}}}^{\top }{\Phi }_{{{{\rm{train}}}}}+{\lambda }^{2}{{{\bf{I}}}}\right)}^{-1}{\phi }_{i},$$
(4)

where λ is the regularization strength. Here, we define \({\Phi }_{{{{\rm{train}}}}}={\phi }_{j}\left({{{{\mathscr{X}}}}}_{{{{\rm{train}}}}}\right)\in {{\mathbb{R}}}^{\left({N}_{{{{\rm{at}}}}}\cdot {N}_{{{{\rm{train}}}}}\right)\times {N}_{{{{\rm{feat}}}}}}\) with \({{{{\mathscr{X}}}}}_{{{{\rm{train}}}}}\) denoting the local atomic environments of configurations in the training set of size Ntrain. In this work, we refer to our uncertainties as distance- and posterior-based uncertainties. Equivalent results can be obtained for total uncertainties (energy), employing gradient features \(\phi =\mathop{\sum }\nolimits_{i = 1}^{{N}_{{{{\rm{at}}}}}}{\phi }_{i}/{N}_{{{{\rm{at}}}}}\) with \({\Phi }_{{{{\rm{train}}}}}=\phi \left({{{{\mathcal{X}}}}}_{{{{\rm{train}}}}}\right)\in {{\mathbb{R}}}^{{N}_{{{{\rm{train}}}}}\times {N}_{{{{\rm{feat}}}}}}\).

Calculating uncertainties using gradient features is computationally challenging, especially for the posterior-based approach, for which a single uncertainty evaluation scales as \({{{\mathscr{O}}}}\left({N}_{{{{\rm{feat}}}}}^{2}\right)\). Therefore, we employ the sketching technique55 to reduce the dimensionality of gradient features, i.e., \({\phi }_{i}^{{{{\rm{rp}}}}}={{{\bf{U}}}}{\phi }_{i}\in {{\mathbb{R}}}^{{N}_{{{{\rm{rp}}}}}}\) with Nrp and \({{{\bf{U}}}}\in {{\mathbb{R}}}^{{N}_{{{{\rm{rp}}}}}\times {N}_{{{{\rm{feat}}}}}}\) denoting the number of random projections and a random matrix, respectively51,52. In previous work51, we have observed that uncertainties derived from sketched gradient features demonstrate a better correlation with RMSEs of related properties than those based on last-layer features50,67,68. More details on sketched gradient features can be found in the following sections. Atom-based uncertainties, defined by the distances between gradient features, scale linearly with both the system size and the number of training structures, i.e., as \({{{\mathcal{O}}}}\left({N}_{{{{\rm{at}}}}}{N}_{{{{\rm{train}}}}}\right)\). Consequently, they require an additional approximation to ensure computational efficiency. To address this, we employed the batch selection algorithm that maximizes distances within the training set, allowing us to identify the most representative subset of atomic gradient features; see the following sections.

Uncertainty-biased molecular dynamics

Following previous work42,43, we define the biased energy as

$${E}^{{{{\rm{biased}}}}}\left(S,{{{\boldsymbol{\theta }}}}\right)=E\left(S,{{{\boldsymbol{\theta }}}}\right)-\tau u\left(S,{{{\boldsymbol{\theta }}}}\right),$$
(5)

where τ denotes the biasing strength. The negative sign ensures that negative uncertainty gradients with respect to atomic positions (bias forces) drive the system toward high uncertainty regions; see Fig. 1c. In this work, we use AD to compute bias forces acting on atom i, denoted as \(-{\nabla }_{{{{{\bf{r}}}}}_{i}}u\left(S,{{{\boldsymbol{\theta }}}}\right)\) with atomic positions ri. The total biased force on atom i reads

$${{{{\bf{F}}}}}_{i}^{{{{\rm{biased}}}}}\left(S,{{{\boldsymbol{\theta }}}}\right)={{{{\bf{F}}}}}_{i}\left(S,{{{\boldsymbol{\theta }}}}\right)+\tau {\nabla }_{{{{{\bf{r}}}}}_{i}}u\left(S,{{{\boldsymbol{\theta }}}}\right).$$
(6)

These biased forces can be used for MD simulations in, e.g., canonical (NVT) statistical ensemble to bias the exploration of the configurational space.

In the case of bulk atomic systems, the configurational space often includes variations in cell parameters, which define the shape and size of the unit cell, necessitating enhanced exploration of them. For this purpose, we propose the concept of bias stress, defined by

$$\frac{1}{V}{\left.{\nabla }_{{{{\boldsymbol{\epsilon }}}}}u\left(S,{{{\boldsymbol{\theta }}}}\right)\right\vert }_{{{{\boldsymbol{\epsilon }}}} = {{{\bf{0}}}}},$$

with V denoting the volume of the periodic cell. This expression is motivated by the definition of the stress tensor69. Here, \(u\left(S,{{{\boldsymbol{\theta }}}}\right)\) denotes the uncertainty after a strain deformation of the bulk atomic system with the symmetric tensor \({{{\boldsymbol{\epsilon }}}}\in {{\mathbb{R}}}^{3\times 3}\), i.e., \(\tilde{{{{\bf{r}}}}}=\left({{{\bf{1}}}}+{{{\boldsymbol{\epsilon }}}}\right)\cdot {{{\bf{r}}}}\). The calculation of the bias stress is straightforward with AD. The total biased stress reads

$${{{{\boldsymbol{\sigma }}}}}^{{{{\rm{biased}}}}}\left(S,{{{\boldsymbol{\theta }}}}\right)={{{\boldsymbol{\sigma }}}}\left(S,{{{\boldsymbol{\theta }}}}\right)-\tau \frac{1}{V}{\left.{\nabla }_{{{{\boldsymbol{\epsilon }}}}}u\left(S,{{{\boldsymbol{\theta }}}}\right)\right\vert }_{{{{\boldsymbol{\epsilon }}}} = {{{\bf{0}}}}}.$$
(7)

The bias stress tensor in Eq. (7) effectively reduces the internal pressure in the bulk atomic system. We propose combining the bias stress tensor with MD simulations conducted in isothermal-isobaric (NpT) statistical ensemble to enhance the data-driven exploration of cell parameters and pressure-induced transitions in bulk materials.

Uncertainty gradients exhibit different magnitudes compared to energy gradients. Thus, re-scaling uncertainty gradients is necessary to ensure consistent driving toward uncertain regions. Building upon the approach introduced in ref. 43, we implement a re-scaling technique that monitors the magnitudes of both actual and bias forces (alternatively, actual and bias stresses) over N steps and then computes the ratio between them. To re-scale bias forces, we use the following expression

$${\tau }_{t}={\tau }_{0}\times \frac{\sqrt{\mathop{\sum }\nolimits_{n = 0}^{N-1}\left\Vert{{{{\bf{F}}}}}_{t-n\Delta t}\right\Vert_{2}^{2}}}{\sqrt{\mathop{\sum }\nolimits_{n = 0}^{N-1}\left\Vert {\nabla }_{{{{{\bf{r}}}}}_{i}}{u}_{t-n\Delta t}\right\Vert_{2}^{2}}}.$$
(8)

An equivalent expression is applied for bias stresses.

The re-scaling of uncertainty gradients is reminiscent of the AdaGrad algorithm70, which dynamically adjusts the learning rate (analogous to the biasing strength) based on historical gradients from previous iterations. While incorporating momentum through exponential moving averages can improve the AdaGrad approach, treating all past gradients with equal weight is essential within the context of this study. Our attempts to damp learning along directions with high curvature (high-frequency oscillations), similar to the Adam optimizer71, did not yield improved performance. We further find that employing species-dependent biasing strengths for bias forces, \(\tau \to {\tau }_{{Z}_{i}}\), with a particular emphasis on damping biasing of hydrogen atoms, improves the efficiency of biased MD simulations.

We employ biased MD simulation to generate a candidate pool for AL, as depicted in Fig. 1a. We employ multiple parallel MD simulations to enhance the exploration of the configurational space further and improve the computational efficiency of AL. We expect biased MD simulations to have relatively short auto-correlation times (ACTs) obtained from position and uncertainty auto-correlation functions (ACFs). Short ACTs imply that the generated candidates will be less correlated than those generated with unbiased MD simulations. However, we cannot guarantee the generation of uncorrelated samples with biased MD simulations throughout AL, particularly in later AL iterations when the uncertainty level is reduced. Therefore, we propose to use batch selection algorithms (see later sections) that select Nbatch > 1 samples at once. These algorithms enforce the informativeness and diversity of the selected atomic configurations and the resulting training data set.

Gaussian moment neural network

This work uses the Gaussian moment neural network (GM-NN) approach for modeling interatomic interactions16,17. GM-NN employs an artificial NN to map a local atomic environment Si to the atomic energy \({E}_{i}\left({S}_{i},{{{\boldsymbol{\theta }}}}\right)\); see Eq. (1). It uses a fully-connected feed-forward NN with two hidden layers16,17

$$\begin{array}{ll}{y}_{i}\,=\,0.1\cdot {{{{\bf{b}}}}}^{(3)}+\frac{1}{\sqrt{{d}_{2}}}{{{{\bf{W}}}}}^{(3)}\phi \bigg(0.1\cdot {{{{\bf{b}}}}}^{(2)}+\\ \,\qquad\frac{1}{\sqrt{{d}_{1}}}{{{{\bf{W}}}}}^{(2)}\phi \left(0.1\cdot {{{{\bf{b}}}}}^{(1)}+\frac{1}{\sqrt{{d}_{0}}}{{{{\bf{W}}}}}^{(1)}{{{{\bf{G}}}}}_{i}\right)\bigg),\end{array}$$
(9)

with \({{{{\bf{W}}}}}^{(l+1)}\in {{\mathbb{R}}}^{{d}_{l+1}\times {d}_{l}}\) and \({{{{\bf{b}}}}}^{(l+1)}\in {{\mathbb{R}}}^{{d}_{l+1}}\) representing the weights and biases of layer l + 1. In this work, we employ a NN with d0 = 910 input neurons (corresponding to the dimension of the input feature vector \({{{{\bf{G}}}}}_{i}={{{{\bf{G}}}}}_{i}\left({S}_{i}\right)\)), d1 = d2 = 512 hidden neurons, and a single output neuron, d3 = 1. The network’s weights W(l+1) are initialized by selecting entries from a normal distribution with zero mean and unit variance. The trainable bias vectors b(l+1) are initialized to zero. To improve the accuracy and convergence of the GM-NN model, we implement a neural tangent parameterization (factors of 0.1 and \(1/\sqrt{{d}_{l}}\))72. For the activation function ϕ, we use the Swish/SiLU function73,74.

To aid the training process, we scale and shift the output of the NN

$${E}_{i}\left({S}_{i},{{{\boldsymbol{\theta }}}}\right)=c\cdot ({\rho }_{{Z}_{i}}{y}_{i}+{\mu }_{{Z}_{i}}),$$
(10)

where the trainable shift parameters \({\mu }_{{Z}_{i}}\) are initialized by solving a linear regression problem, and the trainable scale parameters \({\rho }_{{Z}_{i}}\) are initialized to one. The per-atom RMSE of the regression solution determines the constant c17.

GM-NN models employ the Gaussian moment (GM) representation to encode the invariance of total energy with respect to translations, rotations, and permutations of the same species16. By computing pairwise distance vectors rij = ri − rj and then splitting them into radial and angular components, denoted as rij = rij2 and \({\hat{{{{\bf{r}}}}}}_{ij}={{{{\bf{r}}}}}_{ij}/{r}_{ij}\), respectively, we obtain GMs as follows

$${{{{\boldsymbol{\Psi }}}}}_{i,L,s}=\mathop{\sum}\limits_{j\ne i}{R}_{{Z}_{i},{Z}_{j},s}(r_{ij},{\boldsymbol{\beta}}){\hat{{{{\bf{r}}}}}}_{ij}^{\otimes L},$$
(11)

where \({\hat{{{{\bf{r}}}}}}_{ij}^{\otimes L}={\hat{{{{\bf{r}}}}}}_{ij}\otimes \cdots \otimes {\hat{{{{\bf{r}}}}}}_{ij}\) is the L-fold outer product. The nonlinear radial functions \({R}_{{Z}_{i},{Z}_{j},s}({r}_{ij},{{{\boldsymbol{\beta }}}})\) are defined as a sum of Gaussian functions \({\Phi }_{{s}^{{\prime} }}({r}_{ij})\) (NGauss = 9 for this work)17

$${R}_{{Z}_{i},{Z}_{j},s}({r}_{ij},{{{\boldsymbol{\beta }}}})=\frac{1}{\sqrt{{N}_{{{{\rm{Gauss}}}}}}}\mathop{\sum }\limits_{{s}^{{\prime} }=1}^{{N}_{{{{\rm{Gauss}}}}}}{\beta }_{{Z}_{i},{Z}_{j},s,{s}^{{\prime} }}{\Phi }_{{s}^{{\prime} }}({r}_{ij}).$$
(12)

The factor \(1/\sqrt{{N}_{{{{\rm{Gauss}}}}}}\) impacts the effective learning rate inspired by neural tangent parameterization72. The radial functions are centered at equidistantly spaced grid points ranging from \({r}_{\min }=0.5\) Å to rc, set to 5.0 Å and 6.0 Å for alanine dipeptide and MIL-53(Al), respectively. The radial functions are re-scaled by a cosine cutoff function13, to ensure a smooth dependence on the number of atoms within the cutoff sphere. Chemical information is embedded in the GM representation through trainable parameters \({\beta }_{{Z}_{i},{Z}_{j},s,{s}^{{\prime} }}\), with the index s iterating over the number of independent radial basis functions (Nbasis = 7 for this work).

Features invariant to rotations, Gi, are obtained by computing full tensor contractions of tensors defined in Eq. (11), e.g.16,17,

$${G}_{i,{s}_{1},{s}_{2},{s}_{3}}={({{{{\boldsymbol{\Psi }}}}}_{i,1,{s}_{1}})}_{a}{({{{{\boldsymbol{\Psi }}}}}_{i,1,{s}_{2}})}_{b}{({{{{\boldsymbol{\Psi }}}}}_{i,2,{s}_{3}})}_{a,b},$$
(13)

where we use Einstein notation, i.e., the right-hand side is summed over a, b {1, 2, 3}. Specific full tensor contractions are defined by using generating graphs75. In a practical implementation, we compute all GMs at once and reduce the number of invariant features based on the permutational symmetries of the respective graphs.

All parameters θ = {W, b, β, ρ, μ} of the NN are optimized by minimizing the combined squared loss on training data \({{{{\mathscr{D}}}}}_{{{{\rm{train}}}}}=\left({{{{\mathscr{X}}}}}_{{{{\rm{train}}}}},{{{{\mathscr{Y}}}}}_{{{{\rm{train}}}}}\right)\), with \({{{{\mathcal{X}}}}}_{{{{\rm{train}}}}}={\{{S}^{(k)}\}}_{k = 1}^{{N}_{{{{\rm{train}}}}}}\) and \({{{{\mathcal{Y}}}}}_{{{{\rm{train}}}}}={\{{E}_{k}^{{{{\rm{ref}}}}},{\{{{{{\bf{F}}}}}_{i,k}^{{{{\rm{ref}}}}}\}}_{i = 1}^{{N}_{{{{\rm{at}}}}}},{{{{\boldsymbol{\sigma }}}}}_{k}^{{{{\rm{ref}}}}}\}}_{k = 1}^{{N}_{{{{\rm{train}}}}}}\),

$$\begin{array}{ll}{{{\mathscr{L}}}}\left({{{\boldsymbol{\theta }}}},{{{{\mathcal{D}}}}}_{{{{\rm{train}}}}}\right)=&\mathop{\sum }\limits_{k=1}^{{N}_{{{{\rm{train}}}}}}\left[{C}_{{{{\rm{e}}}}}{\left\Vert {E}_{k}^{{{{\rm{ref}}}}}-E({S}^{(k)},{{{\boldsymbol{\theta }}}})\right\Vert }_{2}^{2}\right.\\&+\, {C}_{{{{\rm{f}}}}}\mathop{\sum }\limits_{i=1}^{{N}_{{{{\rm{at}}}}}^{(k)}}{\left\Vert {{{{\bf{F}}}}}_{i,k}^{{{{\rm{ref}}}}}-{{{{\bf{F}}}}}_{i}\left({S}^{(k)},{{{\boldsymbol{\theta }}}}\right)\right\Vert }_{2}^{2}\\ &\left.+\,{C}_{{{{\rm{s}}}}}{\left\Vert {V}_{k}{{{{\boldsymbol{\sigma }}}}}_{k}^{{{{\rm{ref}}}}}-{V}_{k}{{{\boldsymbol{\sigma }}}}\left({S}^{(k)},{{{\boldsymbol{\theta }}}}\right)\right\Vert }_{2}^{2}\right].\end{array}$$
(14)

We have chosen Ce = 1.0, Cf = 4.0 Å2, and Cs = 0.01 to balance the relative contributions of energies, forces, and stresses, respectively.

Using AD, we compute atomic forces as negative gradients of total energy with respect to atomic coordinates

$${{{{\bf{F}}}}}_{i}\left({S}^{(k)},{{{\boldsymbol{\theta }}}}\right)=-{\nabla }_{{{{{\bf{r}}}}}_{i}}E\left({S}^{(k)},{{{\boldsymbol{\theta }}}}\right).$$
(15)

Furthermore, we use AD to compute stress tensor, defined by69

$${{{\boldsymbol{\sigma }}}}\left({S}^{(k)},{{{\boldsymbol{\theta }}}}\right)=\frac{1}{{V}_{k}}{\left.{\nabla }_{{{{\boldsymbol{\epsilon }}}}}E\left({S}^{(k)},{{{\boldsymbol{\theta }}}}\right)\right| }_{{{{\boldsymbol{\epsilon }}}} = {{{\bf{0}}}}},$$
(16)

where \(E\left({S}^{(k)},{{{\boldsymbol{\theta }}}}\right)\) is total energy after a strain deformation with symmetric tensor \({{{\boldsymbol{\epsilon }}}}\in {{\mathbb{R}}}^{3\times 3}\), i.e., \(\tilde{{{{\bf{r}}}}}=\left({{{\bf{1}}}}+{{{\boldsymbol{\epsilon }}}}\right)\cdot {{{\bf{r}}}}\). As the stress tensor is symmetric, we use only its upper triangular part in the loss function. Here, Vk is the volume of the periodic cell.

We employ the Adam optimizer71 to minimize the loss function. The respective parameters of the optimizer are β1 = 0.9, β2 = 0.999, and ϵ = 10−7. Usually, we work with a mini-batch of 32 molecules. However, smaller mini-batches were used in the initial AL iterations because the training data sizes were less than 32. The layer-wise learning rates are decayed linearly. The initial values are set to 0.03 for the parameters of the fully connected layers, 0.02 for the trainable representation, as well as 0.05 and 0.001 for the scale and shift parameters of atomic energies, respectively. The training is performed for 1000 training epochs. To prevent overfitting during training, we employ the early stopping technique76. All models are trained using PyTorch59.

Sketched gradient features

We obtain atomic gradient features by computing gradients of Eq. (1) with respect to the parameters of the fully connected layers in Eq. (9). Particularly, we make use of the product structure of atomic gradient features. To obtain the latter, we re-write the network in Eq. (9) as follows

$$\begin{array}{ll}{{{{\bf{z}}}}}_{i}^{(l+1)}={\tilde{{{{\bf{W}}}}}}^{(l+1)}{\tilde{{{{\bf{x}}}}}}_{i}^{(l)}\in {{\mathbb{R}}}^{{d}_{l+1}},\\{\tilde{{{{\bf{W}}}}}}^{(l+1)}=\left({{{{\bf{W}}}}}^{(l+1)},{{{{\bf{b}}}}}^{(l+1)}\right)\in {{\mathbb{R}}}^{{d}_{l+1}\times \left({d}_{l}+1\right)},\\ {\tilde{{{{\bf{x}}}}}}_{i}^{(l)}={\left(\frac{1}{\sqrt{{d}_{l}}}{{{{\bf{x}}}}}_{i}^{(l)},0.1\right)}^{\top }\in {{\mathbb{R}}}^{{d}_{l}+1},\end{array}$$
(17)

where z(l) and x(l) denote the pre- and post-activation vectors of layer l. Thus, atomic gradient features read

$$\begin{array}{ll}{\phi }_{i}({S}_{i})=\left(\frac{\partial {{{{\bf{z}}}}}_{i}^{(L)}}{\partial {\tilde{{{{\bf{W}}}}}}^{(1)}},\cdots \,,\frac{\partial {{{{\bf{z}}}}}_{i}^{(L)}}{\partial {\tilde{{{{\bf{W}}}}}}^{(L)}}\right)\\ \qquad\,\,\,\,\,=\left(\frac{\partial {{{{\bf{z}}}}}_{i}^{(L)}}{\partial {{{{\bf{z}}}}}_{i}^{(1)}}\otimes {\tilde{{{{\bf{x}}}}}}_{i}^{(0)},\cdots \,,\frac{\partial {{{{\bf{z}}}}}_{i}^{(L)}}{\partial {{{{\bf{z}}}}}_{i}^{(L)}}\otimes {\tilde{{{{\bf{x}}}}}}_{i}^{(L-1)}\right).\end{array}$$
(18)

To make the calculation of gradient features computationally tractable, we employ the random projections (sketching) technique55, as proposed in refs. 51,52. For atomic gradient features \({\phi }_{i}\left({S}_{i}\right)\in {{{{\bf{R}}}}}^{{N}_{{{{\rm{feat}}}}}}\) and a random matrix \({{{\bf{U}}}}\in {{\mathbb{R}}}^{{N}_{{{{\rm{rp}}}}}\times {N}_{{{{\rm{feat}}}}}}\)—with Nfeat and Nrp denoting the number of atomic features and random projections, respectively—we can define randomly projected atomic gradient features as

$${\phi }_{i}^{{{{\rm{rp}}}}}\left({S}_{i}\right)={{{\bf{U}}}}{\phi }_{i}\left({S}_{i}\right)\in {{\mathbb{R}}}^{{N}_{{{{\rm{rp}}}}}}.$$
(19)

While a Gaussian sketch could be employed, where the elements of U are drawn from standard normal distributions, we use a tensor sketching approach that is more runtime and memory efficient52. Specifically, denoting the element-wise or Hadamard product as  , we compute

$${\phi }_{i}^{{{{\rm{rp}}}}}({S}_{i})=\mathop{\sum }\limits_{l=1}^{L}\left({{{{\bf{U}}}}}_{{{{\rm{out}}}}}^{(l)}{\phi }_{i,{{{\rm{out}}}}}^{(l)}({S}_{i})\right)\odot \left({{{{\bf{U}}}}}_{{{{\rm{in}}}}}^{(l-1)}{\phi }_{i,{{{\rm{in}}}}}^{(l-1)}({S}_{i})\right),$$
(20)

with \({\phi }_{i,{{{\rm{out}}}}}^{(l)}({S}_{i})=\partial {{{{\bf{z}}}}}_{i}^{(L)}/\partial {{{{\bf{z}}}}}_{i}^{(l)}\) and \({\phi }_{i,{{{\rm{in}}}}}^{(l)}({S}_{i})={\tilde{{{{\bf{x}}}}}}_{i}^{(l)}\). All entries of \({{{{\bf{U}}}}}_{{{{\rm{in}}}}}^{(l)}\) and \({{{{\bf{U}}}}}_{{{{\rm{out}}}}}^{(l)}\) are sampled independently from a standard normal distribution.

For atom-based uncertainties, we can directly use the sketched atomic gradient features. For (total) uncertainties per atom, we need to work with a mean \(\phi (S)=\mathop{\sum }\nolimits_{i = 1}^{{N}_{{{{\rm{at}}}}}}{\phi }_{i}({S}_{i})/{N}_{{{{\rm{at}}}}}\). Thus, we use that the individual projections (rows of Eq. (20)) are linear in the features and obtain for the (total) gradient features51

$${\phi }^{{{{\rm{rp}}}}}(S)=\frac{1}{{N}_{{{{\rm{at}}}}}}\mathop{\sum }\limits_{i=1}^{{N}_{{{{\rm{at}}}}}}\mathop{\sum }\limits_{l=1}^{L}\left({{{{\bf{U}}}}}_{{{{\rm{out}}}}}^{(l)}{\phi }_{i,{{{\rm{out}}}}}^{(l)}({S}_{i})\right)\odot \left({{{{\bf{U}}}}}_{{{{\rm{in}}}}}^{(l-1)}{\phi }_{i,{{{\rm{in}}}}}^{(l-1)}({S}_{i})\right),$$
(21)

given that all of the individual random projections use the same random matrices.

Ensemble-based uncertainty quantification

The variance of the predictions of individual models in an ensemble of MLIPs can be used to quantify their uncertainty. Thus, we define the variance of predicted energy as

$${u}^{2}=\frac{1}{M}\mathop{\sum }\limits_{j=1}^{M}\Vert E_{j}-\bar{E}\Vert_{2}^{2},$$
(22)

where M is the number of models in the ensemble. The variance of atomic forces reads

$${u}_{i}^{2}=\frac{1}{3M}\mathop{\sum }\limits_{j=1}^{M}\Vert {{{{\bf{F}}}}}_{i,j}-{\bar{{{{\bf{F}}}}}}_{i}\Vert_{2}^{2},$$
(23)

Here, \(\bar{E}\) and \({\bar{{{{\bf{F}}}}}}_{i}\) denote the arithmetic mean of the predictions from individual models. Our experiments demonstrated that M = 3 is sufficient to obtain good performance. Using larger ensembles would make the ensemble-based uncertainty quantification even more computationally inefficient than gradient-based alternatives.

Batch selection methods

The simplest batch selection method is based on querying points only by their uncertainty values. Specifically, given the already selected structures \({{{{\mathcal{X}}}}}_{{{{\rm{batch}}}}}\) from an unlabeled pool \({{{{\mathcal{X}}}}}_{{{{\rm{pool}}}}}\) we select the next point by

$$S = \mathop{{\arg\max}}\limits_{S \in {\mathcal{X}}_{\mathrm{pool}}\backslash{\mathcal{X}}_{\mathrm{batch}}}\,u\left(S\right),$$
(24)

until Nbatch > 1 structures are selected. In this work, we use this selection method combined with ensemble-based uncertainties.

For the posterior-based uncertainty, we can constrain the diversity of the selected batch by using the posterior covariance between structures

$${{{\rm{Cov}}}}\left(S,{S}^{{\prime} }\right)={\lambda }^{2}\phi {\left(S\right)}^{\top }{\left({\Phi }_{{{{\rm{train}}}}}^{\top }{\Phi }_{{{{\rm{train}}}}}+{\lambda }^{2}{{{\bf{I}}}}\right)}^{-1}\phi \left({S}^{{\prime} }\right),$$
(25)

with \({\Phi }_{{{{\rm{train}}}}}=\phi \left({{{{\mathcal{X}}}}}_{{{{\rm{train}}}}}\right)\). The corresponding method greedily selects structures, i.e., one structure per iteration, such that the determinant of the covariance matrix is maximized51,52,77

$$S = \mathop{{\arg\max}}\limits_{S \in {\mathcal{X}}_{\mathrm{pool}}\backslash{\mathcal{X}}_{\mathrm{batch}}}\,\det \left[ {\mathrm{Cov}}\left({\mathcal{X}}_{\mathrm{batch}} \cup \{S\}, {\mathcal{X}}_{\mathrm{batch}} \cup \{S\}\right)\right].$$
(26)

For the distance-based uncertainty, we ensure the diversity of the acquired batch by greedily selecting structures with a maximum distance to all previously selected and training data points. The respective selection method reads51,52,78

$$S = \mathop{\mathrm{argmax}}\limits_{S \in {\mathcal{X}}_{\mathrm{pool}}\backslash{\mathcal{X}}_{\mathrm{batch}}}\,\mathop{\mathrm{min}}\limits_{S^\prime \in {\mathcal{X}}_{\mathrm{train}} \cup {\mathcal{X}}_{\mathrm{batch}}}\, \left\Vert \phi\left(S\right) - \phi\left(S^\prime\right) \right\Vert_2^2.$$
(27)

We also applied this batch selection method to define the most representative subset of atomic gradient features when calculating atom-based uncertainty using feature space distances.

Lastly, to compare the performance of uncertainty-based data generation approaches with conventional random sampling from an ab initio MD, we employ a random selection strategy combined with posterior-based uncertainty to terminate MD simulations. We define random selection as

$$S \sim {{{\mathscr{U}}}}\left({{{{\mathscr{X}}}}}_{{{{\rm{pool}}}}}\right),$$
(28)

where \({{{\mathcal{U}}}}\) is the uniform distribution over \({{{{\mathcal{X}}}}}_{{{{\rm{pool}}}}}\).

Conformal prediction

Conformal prediction methods offer distribution-free uncertainty quantification with guaranteed finite sample coverage49,79,80,81,82, thus ensuring calibration. Finite sample coverage can be defined as

$${\mathbb{P}}\{{y}_{{{{\rm{test}}}}}\in C\left({x}_{{{{\rm{test}}}}}\right)\}\ge 1-\alpha .$$
(29)

Here, \(\left({x}_{{{{\rm{test}}}}},{y}_{{{{\rm{test}}}}}\right)\) are the newly observed data, while C defines the prediction set based on previous observations \({\{\left({x}_{k},{y}_{k}\right)\}}_{k = 1}^{{N}_{{{{\rm{calibr}}}}}}\). The user determines the hyper-parameter α and defines the desired confidence level. CP methods guarantee that the prediction set contains the true label with a probability of almost 1 − α.

We employ inductive CP, which comprises the following steps49,79: (1) A subset of calibration data, sized Ncalibr, is selected, and the corresponding errors are computed on this subset. For atomic forces, we employ RMSEs \(\Delta {{{{\bf{F}}}}}_{i}^{2}=\frac{1}{3}\left\Vert {{{{\bf{F}}}}}_{i}-{{{{\bf{F}}}}}_{i}^{{{{\rm{ref}}}}}\right\Vert_{2}^{2}\), while for total energies the respective energy absolute errors per atom, Δe = E − Eref/Nat, are used. (2) The uncertainty \(u\left(S\right)\) is calculated for this subset of data. (3) The ratio \(\Delta e/u\left(S\right)\) or \(\Delta {{{{\bf{F}}}}}_{i}/u\left({S}_{i}\right)\) is computed. (4) Utilizing quantile regression, the \(\left(1-\alpha \right)\left({N}_{{{{\rm{calibr}}}}}+1\right)/{N}_{{{{\rm{calibr}}}}}\)-th quantile, denoted as s, is determined. (5) This s value is applied to new observations, resulting in the re-scaled and calibrated uncertainty, \(\tilde{u}=s\cdot u\).

Coverage of collective variable space

To measure how well different methods explore the (bounded) space of interest, we implement a tree-based weighted recursive partitioning of a d-dimensional Euclidean space, which is reminiscent of quadtrees83 and matrix-based octrees84 but allows to choose how many times n to split each dimension. Thus, the variety of the tree is k = nd. Each node of this complete k-ary tree encodes a generalized hypercube of d dimensions, where each side length depends on the boundaries of the original space. The root node represents the full bounded space. A tree of height L has total number of partitions equal to (kL+1 − 1)/(k − 1), and each level has k nodes. The hyper-parameters we choose in this paper are n = 2, d = 2 (for the CVs ϕ and ψ of alanine dipeptide), and L = 5, for a total of 1365 partitions of the space of interest.

Our proposed surface coverage metric uses this data structure as a proxy to capture how many space partitions a method can explore in the least amount of iterations. At the same time, we need to penalize methods that get stuck in a region of the space, exploring partitions of smaller volumes, that is, those represented by nodes at deeper levels in the tree. For this reason, each node at level is associated with a reward (or weight) of 1/k, so each level of the tree has a cumulative reward of 1. The optimal strategy would be to perform a breadth-first search of the nodes of this tree, which translates into observing the largest partitions of unobserved space first. In addition, partitions that are revisited by the methods give no additional reward, so there is no gain in getting stuck in a certain partition. We visually represent the idea of the algorithm in the Supplementary Information for the simple case of d = 2.

Auto-correlation analysis

We evaluate the performance of uncertainty-biased MD simulations by investigating the auto-correlation between subsequent time frames of the MD trajectory. The auto-correlation function (ACF) is defined as85

$${A}_{{{{\mathcal{O}}}}}\left(k\right)=\frac{\langle {{{{\mathcal{O}}}}}_{i}{{{{\mathcal{O}}}}}_{i+k}\rangle -{\langle {{{{\mathcal{O}}}}}_{i}\rangle }^{2}}{\langle {{{{\mathcal{O}}}}}_{i}^{2}\rangle -{\langle {{{{\mathcal{O}}}}}_{i}\rangle }^{2}},$$
(30)

where 〈  〉 denotes the thermodynamic expectation value, k is the lag time, and \({{{\mathcal{O}}}}\) is an observable, e.g., atomic positions or atom-based uncertainties. From ACF, we can calculate the auto-correlation time (ACT) for an MD trajectory of length N

$${{{{\rm{ACT}}}}}_{{{{\mathcal{O}}}}}=\frac{1}{2}+\mathop{\sum }\limits_{k=1}^{N}{A}_{{{{\mathcal{O}}}}}\left(k\right)\left(1-\frac{k}{N}\right).$$
(31)

ACT is related to effective sample size (ESS) by

$${{{{\rm{ESS}}}}}_{{{{\mathcal{O}}}}}=\frac{N}{2\cdot {{{{\rm{ACT}}}}}_{{{{\mathcal{O}}}}}}.$$
(32)

In this work, we calculate ESS as implemented in TensorFlow86 and use it to estimate the ACT.

Test data set for alanine dipeptide

The test data set for alanine dipeptide comprises 2000 configurations randomly selected from an MD trajectory at 1200 K. This trajectory was generated within the ASE simulation package87 by running an MD simulation in the canonical (NVT) statistical ensemble using the Langevin thermostat. We have used a time step of 0.5 fs and a total simulation time of 1 ns. The AMBER ff19SB force field has provided forces57, as implemented in the TorchMD package using PyTorch58,59. The data set effectively covers the relevant configurational space of alanine dipeptide, representing an upper boundary in exploring its collective variables (CVs).

MLIP learning details for alanine dipeptide

Each AL experiment starts with training an MLIP with eight alanine dipeptide configurations randomly perturbed from its initial configuration in the C7eq state. Trained MLIPs are then used to run eight parallel MD simulations, initialized from the initial configuration or configurations selected in later iterations. Each MD simulation runs until reaching an empirically defined uncertainty threshold of 1.5 eV Å−1. A lower threshold value may result in slower CV space exploration, while a larger one would lead to the exploration of unphysical configurations. The maximum data set size, comprising training and validation data, is limited to 512 configurations. The Supplementary Information presents the scaling of the presented AL experiments to larger data set sizes, acquiring data sets of 1024 samples. Biased (bias-forces-driven) and unbiased MD simulations are performed using the canonical (NVT) statistical ensemble within the ASE simulation package87. Unbiased MD simulations are run with the Langevin thermostat at temperatures of 300 K, 600 K, and 1200 K, whereas biased simulations are performed at a constant temperature of 300 K. We have chosen an integration time step of 0.5 fs and set a maximum of 20,000 steps for an MD simulation. A biasing strength of τ = 0.25 was also chosen for biased AL experiments. In reference calculations, we employ a force threshold of 20 eV Å−1 to exclude unphysical structures, potentially expected at high biasing strengths (equivalently, a smaller integration time step could be used). All AL experiments have been repeated five times.

Reference DFT calculations for MIL-53(Al)

DFT calculations for MIL-53(Al) were performed using the CP2K simulation package (version 2023.1)61. To ensure consistency with incremental learning experiments41, we employed the PBE functional62 with Grimme D3 dispersion correction63. A hybrid basis set, combining TZVP Gaussian basis functions and plane waves, was employed88. GTH pseudopotentials were used to smoothen the electron density near the nuclei89. To ensure the convergence of force and stress calculations, a plane wave cutoff energy of 1000 Ry was selected.

MLIP learning details for MIL-53(Al)

In each AL experiment, we start with 32 MIL-53(Al) configurations randomly perturbed around its closed-pore state, with 90% reserved for training. Trained MLIPs are then used to perform 32 parallel MD simulations, each running until it reaches an uncertainty threshold of 1.0 eV Å−1. The maximum data set size is limited to 512 configurations, comprising training and validation data. The Supplementary Information presents the scaling of the presented AL experiments to larger data set sizes, acquiring data sets of 1024 samples. Both biased (bias-stress-driven) and unbiased MD simulations use the isothermal-isobaric form of the Nosé–Hoover dynamics90,91. Unbiased MD simulations are carried out at 600 K and 0 MPa, as well as ± 250 MPa (half of the simulations each), while biased simulations are performed at 600 K and 0 MPa. The characteristic time scales of the thermostat and barostat are set to 0.1 ps and 1 ps, respectively. We have chosen an integration time step of 0.5 fs and set a maximum of 20,000 MD steps for an MD simulation. A stress-biasing strength of τ = 0.5 is used in biased AL experiments. In reference calculations, we employ a force threshold of 20 eV Å−1 to exclude strongly distorted structures. We use the data set from ref. 41 as a metadynamics-generated baseline and select the first 500 sequentially generated configurations. All AL experiments are repeated three times, except for metadynamics, which was run once41. For metadynamics, we train three MLIPs initialized using different random seeds.

Random perturbation of atomic configurations

We obtain randomly perturbed atomic configurations by adding atomic shifts, denoted as δi, to the original atomic positions ri

$${\tilde{{{{\bf{r}}}}}}_{i}={{{{\bf{r}}}}}_{i}+{{{{\boldsymbol{\delta }}}}}_{i}.$$
(33)

The components of δi are sampled independently from a uniform distribution: for alanine dipeptide, the range is between −0.02 Å and 0.02 Å, and for MIL-53(Al), it is between −0.08 Å and 0.08 Å. Additionally, for MIL-53(Al), we introduce random perturbations to its periodic cell B using a strain deformation \({{{\boldsymbol{\epsilon }}}}=\left({{{\bf{A}}}}+{{{{\bf{A}}}}}^{\top }\right)/2\), where the components of A are sampled independently from a uniform distribution between −0.02 and 0.02. This transformation can be expressed as

$$\tilde{{{{\bf{B}}}}}={{{\bf{B}}}}{\left({{{\bf{I}}}}+2{{{\boldsymbol{\epsilon }}}}\right)}^{1/2}.$$
(34)

The shifted atomic positions are re-scaled according to

$${\tilde{\tilde{{{{\bf{r}}}}}}}_{i}={\left({{{\bf{I}}}}+2{{{\boldsymbol{\epsilon }}}}\right)}^{1/2}{\tilde{{{{\bf{r}}}}}}_{i}.$$
(35)

Sine wave with additive random noise

We model large-amplitude volume fluctuations in MIL-53(Al) induced by the bias stress using a sine wave with period T0 and additive random noise \(N\left(t\right)\)

$$A\sin \left(\frac{2\pi t}{{T}_{0}}\right)+BN\left(t\right),$$

where A and B denote the sine wave’s amplitude and random noise, respectively. In this work, \(N\left(t\right) \sim {{{\mathcal{N}}}}\left(0,1\right)\) represents random noise following a normal distribution with zero mean and unit variance. We chose A = 1.0 and B = 0.5 for the blue line in Fig. 7. For the red line, we increase the noise amplitude to B = 2.0. To represent the volume fluctuations induced in MIL-53(Al) (see Fig. 7), a sine wave with the period twice the length of the MD simulation, i.e., T0 = 3.2 ns is required.