论文标题
从数据集的“小”分子电位的数据集的角度来看,MD17数据集
The MD17 Datasets from the Perspective of Datasets for Gas-Phase "Small" Molecule Potentials
论文作者
论文摘要
在开发机器学习势能表面的方法方面取得了巨大进展。通过比较电子和力数据集中的所谓学习曲线,尤其是MD17数据库,对这些方法也进行了重要评估。 The dataset for each molecule in this database generally consists of tens of thousands of energies and forces obtained from DFT direct dynamics at 500 K. We contrast the datasets from this database for three "small" molecules, ethanol, malonaldehyde, and glycine, with datasets we have generated with specific targets for the PESs in mind: a rigorous calculation of the zero-point energy and波函数,在马诺醛中的隧道分裂,在甘氨酸的情况下,描述了所有八种低洼的构象异构体。我们发现,对于这些目标,MD17数据集太过限制了。我们还检查了几种描述小分子但复杂的化学反应的pess的最新数据集。最后,我们引入了一个新的数据库“ QM-22”,其中包含从4至15个原子的分子数据集,这些原子延伸至高能和大的配置范围。
There has been great progress in developing methods for machine-learned potential energy surfaces. There have also been important assessments of these methods by comparing so-called learning curves on datasets of electronic energies and forces, notably the MD17 database. The dataset for each molecule in this database generally consists of tens of thousands of energies and forces obtained from DFT direct dynamics at 500 K. We contrast the datasets from this database for three "small" molecules, ethanol, malonaldehyde, and glycine, with datasets we have generated with specific targets for the PESs in mind: a rigorous calculation of the zero-point energy and wavefunction, the tunneling splitting in malonaldehyde and in the case of glycine a description of all eight low-lying conformers. We found that the MD17 datasets are too limited for these targets. We also examine recent datasets for several PESs that describe small-molecule but complex chemical reactions. Finally, we introduce a new database, "QM-22", which contains datasets of molecules ranging from 4 to 15 atoms that extend to high energies and a large span of configurations.
