Results: 504

Name Description DOI Elements Authors # Configurations # Atoms # Elements Methods Software Downloads Source Data Source Pub. Other Links
The Alexandria Materials Database contains theoretical crystal structures in 1D, 2D and 3D discovered by machine learning approaches using DFT with PBE, PBEsol and SCAN methods. This dataset represents the geometry optimization paths for 3D crystal structures from Alexandria calculated using PBE methods.
10.60732/c88da7df
Ac, Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr
Jonathan Schmidt, Noah Hoffmann, Hai-Chen Wang, Pedro Borlido, Pedro J. M. A. Carriço, Tiago F. T. Cerqueira, Silvana Botti, Miguel A. L. Marques
106825218
1313552132
89
DFT-PBE
VASP
101401
The Alexandria Materials Database contains theoretical crystal structures in 1D, 2D and 3D discovered by machine learning approaches using DFT with PBE, PBEsol and SCAN methods. This dataset represents the geometry optimization paths for 2D crystal structures from Alexandria calculated using PBE methods.
10.60732/8781419f
Ac, Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, Hf, Hg, Ho, I, In, Ir, K, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Y, Yb, Zn, Zr
Jonathan Schmidt, Noah Hoffmann, Hai-Chen Wang, Pedro Borlido, Pedro J. M. A. Carriço, Tiago F. T. Cerqueira, Silvana Botti, Miguel A. L. Marques
11742482
118265549
84
DFT-PBE
VASP
3709
The full-size training set from OMol25. From the dataset creator: OMol25 represents the largest high quality molecular DFT dataset spanning biomolecules, metal complexes, electrolytes, and community datasets. OMol25 was generated at the ω B97M-V/def2-TZVPD level of theory.
10.60732/41666b82
Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, O, Os, P, Pb, Pd, Pm, Pr, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Ti, Tl, Tm, V, W, Xe, Y, Yb, Zn, Zr
Daniel S. Levine, Muhammed Shuaibi, Evan Walter Clark Spotte-Smith, Michael G. Taylor, Muhammad R. Hasyim, Kyle Michel, Ilyes Batatia, Gábor Csányi, Misko Dzamba, Peter Eastman, Nathan C. Frey, Xiang Fu, Vahe Gharakhanyan, Aditi S. Krishnapriyan, Joshua A. Rackers, Sanjeev Raja, Ammar Rizvi, Andrew S. Rosen, Zachary Ulissi, Santiago Vargas, C. Lawrence Zitnick, Samuel M. Blau, Brandon M. Wood
101666280
5237539207
83
DFT-ωB97M-V
ORCA
2552
This is the filtered training split of ODAC25. ODAC25 is a large-scale DFT dataset intended to advance the computational screening of Metal-Organic Framework (MOF) sorbents for direct air capture (DAC) of atmospheric CO2 from humid air. Spanning ~15,000 MOFs, including experimental, defective, synthetic, and amine-functionalized frameworks, the dataset comprises nearly 60 million single-point calculations covering four adsorbates: CO2, H2O, N2, and O2. All calculations were performed with VASP 6.3 using the PBE functional augmented with D3 dispersion corrections (Becke-Johnson damping). Spin-polarized calculations (ISPIN=2) were used throughout. Relative to ODAC23, ODAC25 adds two new adsorbates (N2 and O2), functionalized MOF variants, improved k-point convergence, and re-relaxations of bare MOF cells. Three configuration sets are provided: mof_plus_adsorbate (full DFT relaxations of adsorbate-loaded MOFs), mof (re-relaxations of empty frameworks), and gcmc (DFT single points derived from Grand Canonical Monte Carlo simulations). Structures identified as problematic by Jin et al. (2025) have been excluded (see https://zenodo.org/records/14802658).
Ag, Al, Au, B, Ba, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, H, Hf, Hg, Ho, I, In, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, Np, O, P, Pd, Pr, Pt, Re, Ru, S, Sc, Se, Si, Sm, Sr, Tb, Ti, Tm, U, V, W, Y, Zn, Zr
Anuroop Sriram, Logan M. Brabson, Xiaohan Yu, Sihoon Choi, Kareem Abdelmaqsoud, Elias Moubarak, Pim de Haan, Sindy Löwe, Johann Brehmer, John R. Kitchin, Max Welling, C. Lawrence Zitnick, Zachary Ulissi, Andrew J. Medford, David S. Sholl
36058136
8523144395
62
DFT-PBE+D3
VASP 6.3
1142
The training split of sAlex. sAlex is a subsample of the Alexandria dataset that was used to fine tune the OMat24 (Open Materials 2024) models. From the site: sAlex was created by removing structures matched in WBM and only sampling structure along a trajectory with an energy difference greater than 10 meV/atom.
10.60732/efbb7935
Ac, Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr
Luis Barroso-Luque, Muhammed Shuaibi, Xiang Fu, Brandon M. Wood, Misko Dzamba, Meng Gao, Ammar Rizvi, C. Lawrence Zitnick, Zachary W. Ulissi
10345613
106888622
89
DFT-PBE+U
VASP
1075
Training configurations for the initial structure to relaxed total energy (IS2RE) task of OC22. Open Catalyst 2022 (OC22) is a database of training trajectories for predicting catalytic reactions on oxide surfaces meant to complement OC20, which did not contain oxide surfaces.
10.60732/722bcab6
Ag, Al, As, Au, Ba, Be, Bi, C, Ca, Cd, Ce, Co, Cr, Cs, Cu, Fe, Ga, Ge, H, Hf, Hg, In, Ir, K, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Ni, O, Os, Pb, Pd, Pt, Rb, Re, Rh, Ru, Sb, Sc, Se, Si, Sn, Sr, Ta, Te, Ti, Tl, V, W, Y, Zn, Zr
Richard Tran, Janice Lan, Muhammed Shuaibi, Brandon M. Wood, Siddharth Goyal, Abhishek Das, Javier Heras-Domingo, Adeesh Kolluru, Ammar Rizvi, Nima Shoghi, Anuroop Sriram, Felix Therrien, Jehad Abed, Oleksandr Voznyy, Edward H. Sargent, Zachary Ulissi, C. Lawrence Zitnick
7861269
633950726
57
DFT-PBE+U
VASP
771
This DFT dataset is curated in response to the growing interest in property-guided molecule genaration using generative AI models. Typically, the properties of generated molecules are evaluated using machine learning (ML) property predictors trained on fully relaxed dataset. However, since generated molecules may deviate significantly from relaxed structures, these predictors can be highly unreliable for assessing their quality. This data provides DFT-evaluated properties, energy and forces for generated molecules. These structures are unrelaxed and can serve as a validation set for machine learning property predictors used in conditional molecule generation. It includes 10,773 molecules generated using PropMolFlow, a state-of-the-art conditional molecule generation model. PropMolFlow employs a flow matching process parameterized with an SE(3)-equivariant graph neural network. PropMolFlow models are trained on QM9 dataset. Molecules are generated by conditioning on six properties---polarizibility, gap, HOMO, LUMO, dipole moment and heat capacity at room temperature 298K---across two tasks: in-distribution and out-of-distribution generation. Full details are available in the corresponding paper.
10.60732/1f7cae3c
C, F, H, N, O
Cheng Zeng, Jirui Jin, George Karypis, Mark Transtrum, Ellad B. Tadmor, Richard G. Hennig, Adrian Roitberg, Stefano Martiniani, Mingjie Liu
10773
205304
5
DFT-B3LYP
Gaussian 16
715
The Train 4M set from OMol25 (~4 million structure training subset). From the dataset creator: OMol25 represents the largest high quality molecular DFT dataset spanning biomolecules, metal complexes, electrolytes, and community datasets. OMol25 was generated at the ω B97M-V/def2-TZVPD level of theory.
10.60732/b6f9382a
Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, O, Os, P, Pb, Pd, Pm, Pr, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Ti, Tl, Tm, V, W, Xe, Y, Yb, Zn, Zr
Daniel S. Levine, Muhammed Shuaibi, Evan Walter Clark Spotte-Smith, Michael G. Taylor, Muhammad R. Hasyim, Kyle Michel, Ilyes Batatia, Gábor Csányi, Misko Dzamba, Peter Eastman, Nathan C. Frey, Xiang Fu, Vahe Gharakhanyan, Aditi S. Krishnapriyan, Joshua A. Rackers, Sanjeev Raja, Ammar Rizvi, Andrew S. Rosen, Zachary Ulissi, Santiago Vargas, C. Lawrence Zitnick, Samuel M. Blau, Brandon M. Wood
3986754
218680957
83
DFT-ωB97M-V
ORCA
705
OC20_S2EF_train_20M is the 20 million structure training subset of the OC20 Structure to Energy and Forces dataset. Features include potential energy, free energy and atomic forces. Data from the OC20 mappings file, including adsorbate id, materials project bulk id, miller index, shift and others.
10.60732/9f03e9be
Ag, Al, As, Au, B, Bi, C, Ca, Cd, Cl, Co, Cr, Cs, Cu, Fe, Ga, Ge, H, Hf, Hg, In, Ir, K, Mn, Mo, N, Na, Nb, Ni, O, Os, P, Pb, Pd, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Tc, Te, Ti, Tl, V, W, Y, Zn, Zr
Lowik Chanussot, Abhishek Das, Siddharth Goyal, Thibaut Lavril, Muhammed Shuaibi, Morgane Riviere, Kevin Tran, Javier Heras-Domingo, Caleb Ho, Weihua Hu, Aini Palizhati, Anuroop Sriram, Brandon Wood, Junwoong Yoon, Devi Parikh, C. Lawrence Zitnick, Zachary Ulissi
20000000
1465265878
56
DFT-rPBE
VASP
693
Training set of the Open Polymers 2026 (OPoly26) dataset. OPoly26 contains over 6.57 million density functional theory (DFT) calculations on cluster fragments of up to 360 atoms derived from polymeric systems, comprising over 1.2 billion total atoms. The dataset encompasses variations in monomer composition, polymerization degree, chain architectures, and solvation environments to improve machine learning model performance for polymer property prediction. Calculations were performed at the B97M-V/def2-SVP level of theory using ORCA.
Al, B, Br, C, Ca, Cl, Co, Cs, Cu, F, Fe, H, I, K, La, Li, Mg, N, Na, Ni, O, P, S, Sr, Zn
Daniel S. Levine, Nicholas Liesen, Lauren Chua, James Diffenderfer, Helgi I. Ingolfsson, Matthew P. Kroonblawd, Nitesh Kumar, Amitesh Maiti, Supun S. Mohottalalage, Muhammed Shuaibi, Brian Van Essen, Brandon M. Wood, C. Lawrence Zitnick, Samuel M. Blau, Evan R. Antoniuk
6104876
1125111811
25
DFT-ωB97M-V
ORCA
619
Training configurations for the structure to total energy and forces task (S2EF) of OC22. Open Catalyst 2022 (OC22) is a database of training trajectories for predicting catalytic reactions on oxide surfaces meant to complement OC20, which did not contain oxide surfaces.
10.60732/68160e50
Ag, Al, As, Au, Ba, Be, Bi, C, Ca, Cd, Ce, Co, Cr, Cs, Cu, Fe, Ga, Ge, H, Hf, Hg, In, Ir, K, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Ni, O, Os, Pb, Pd, Pt, Rb, Re, Rh, Ru, Sb, Sc, Se, Si, Sn, Sr, Ta, Te, Ti, Tl, V, W, Y, Zn, Zr
Richard Tran, Janice Lan, Muhammed Shuaibi, Brandon M. Wood, Siddharth Goyal, Abhishek Das, Javier Heras-Domingo, Adeesh Kolluru, Ammar Rizvi, Nima Shoghi, Anuroop Sriram, Felix Therrien, Jehad Abed, Oleksandr Voznyy, Edward H. Sargent, Zachary Ulissi, C. Lawrence Zitnick
8356688
668033119
57
DFT-PBE+U
VASP
591
The Train neutral set from OMol25. From the dataset creator: OMol25 represents the largest high quality molecular DFT dataset spanning biomolecules, metal complexes, electrolytes, and community datasets. OMol25 was generated at the ω B97M-V/def2-TZVPD level of theory.
10.60732/3c2ddc75
B, Br, C, Ca, Cl, F, H, I, K, Li, Mg, N, Na, O, P, S, Si
Daniel S. Levine, Muhammed Shuaibi, Evan Walter Clark Spotte-Smith, Michael G. Taylor, Muhammad R. Hasyim, Kyle Michel, Ilyes Batatia, Gábor Csányi, Misko Dzamba, Peter Eastman, Nathan C. Frey, Xiang Fu, Vahe Gharakhanyan, Aditi S. Krishnapriyan, Joshua A. Rackers, Sanjeev Raja, Ammar Rizvi, Andrew S. Rosen, Zachary Ulissi, Santiago Vargas, C. Lawrence Zitnick, Samuel M. Blau, Brandon M. Wood
34335828
929562799
17
DFT-ωB97M-V
ORCA
548
This is the full (unfiltered) training split of ODAC25. ODAC25 is a large-scale DFT dataset intended to advance the computational screening of Metal-Organic Framework (MOF) sorbents for direct air capture (DAC) of atmospheric CO2 from humid air. Spanning ~15,000 MOFs, including experimental, defective, synthetic, and amine-functionalized frameworks, the dataset comprises nearly 60 million single-point calculations covering four adsorbates: CO2, H2O, N2, and O2. All calculations were performed with VASP 6.3 using the PBE functional augmented with D3 dispersion corrections (Becke-Johnson damping). Spin-polarized calculations (ISPIN=2) were used throughout. Relative to ODAC23, ODAC25 adds two new adsorbates (N2 and O2), functionalized MOF variants, improved k-point convergence, and re-relaxations of bare MOF cells. Three configuration sets are provided: mof_plus_adsorbate (full DFT relaxations of adsorbate-loaded MOFs), mof (re-relaxations of empty frameworks), and gcmc (DFT single points derived from Grand Canonical Monte Carlo simulations).
Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, Hf, Hg, Ho, I, In, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, Np, O, P, Pd, Pr, Pt, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Tb, Te, Th, Ti, Tm, U, V, W, Y, Zn, Zr
Anuroop Sriram, Logan M. Brabson, Xiaohan Yu, Sihoon Choi, Kareem Abdelmaqsoud, Elias Moubarak, Pim de Haan, Sindy Löwe, Johann Brehmer, John R. Kitchin, Max Welling, C. Lawrence Zitnick, Zachary Ulissi, Andrew J. Medford, David S. Sholl
54620287
12829845961
71
DFT-PBE+D3
VASP 6.3
533
Out-of-domain validation configurations for the initial structure to relaxed total energy (IS2RE) task of OC22. Open Catalyst 2022 (OC22) is a database of training trajectories for predicting catalytic reactions on oxide surfaces meant to complement OC20, which did not contain oxide surfaces.
10.60732/15fa94f2
Au, Ba, Be, Bi, C, Ca, Cd, Ce, Co, Cr, Cs, Cu, Fe, Ga, Ge, H, Hf, Hg, In, Ir, K, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Ni, O, Os, Pb, Pd, Pt, Rb, Re, Rh, Ru, Sb, Sc, Se, Si, Sn, Sr, Ta, Ti, Tl, V, W, Zn, Zr
Richard Tran, Janice Lan, Muhammed Shuaibi, Brandon M. Wood, Siddharth Goyal, Abhishek Das, Javier Heras-Domingo, Adeesh Kolluru, Ammar Rizvi, Nima Shoghi, Anuroop Sriram, Felix Therrien, Jehad Abed, Oleksandr Voznyy, Edward H. Sargent, Zachary Ulissi, C. Lawrence Zitnick
520744
42168125
52
DFT-PBE+U
VASP
529
Configurations from the Materials Project database: an online resource with the goal of computing properties of all inorganic materials.
10.60732/4bf2e346
Ac, Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr
Anubhav Jain, Shyue Ping Ong, Geoffroy Hautier, Wei Chen, William Davidson Richards, Stephen Dacek, Shreyas Cholia, Dan Gunter, David Skinner, Gerbrand Ceder, Kristin A. Persson
6125462
194446050
89
DFT-R2SCAN, DFT-PBEsol, DFT-SCAN, DFT-GGA+U, DFT-GGA
VASP
529
MP-ALOE is a dataset of nearly 1 million DFT calculations computed with the r2SCAN meta-generalized gradient approximation, covering 89 elements. The dataset was constructed using active learning via Query by Committee (QBC) and downsampling via the DIRECT method, and primarily consists of off-equilibrium structures. Initial structures were generated by elemental substitution into prototype structures from the ICSD and Materials Project databases (restricted to 2-8 atoms and up to ternary compositions). QBC used an ensemble of interatomic potentials (initially MACE-MP-0, CHGNet, and M3GNet, followed by iteratively trained MACE models) to select structures with energy uncertainty exceeding 100 meV/atom, force uncertainty exceeding 100 meV/Å, or stress uncertainty exceeding 100 meV/ų. DIRECT downsampling reduced approximately 500,000 selected structures to approximately 125,000 for DFT calculation. Near-equilibrium structures from the Materials Project (up to 3 elements, up to 32 atoms, approximately 30,000 structures) were recalculated with identical DFT settings. A two-stage VASP workflow was applied: an initial static calculation using PBE, followed by r2SCAN relaxation for three ionic steps. In total, 909,792 frames from 303,264 structure relaxations are included. DFT calculations used projector-augmented wave (PAW) potentials, a 680 eV plane-wave cutoff, and KSPACING=0.2, with additional parameters from the MP24RelaxSet in pymatgen. Calculations were managed by the atomate2 workflow package.
Ac, Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr
Matthew C. Kuner, Aaron D. Kaplan, Kristin A. Persson, Mark Asta, Daryl C. Chrzan
909789
5891262
89
DFT-R2SCAN
VASP
499
The QE-TB dataset is part of the joint automated repository for various integrated simulations (JARVIS) DFT database. This subset contains configurations generated in Quantum ESPRESSO. JARVIS is a set of tools and datasets built to meet current materials design challenges.
10.60732/9e9e5b29
Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Cl, Co, Cr, Cs, Cu, F, Fe, Ga, Ge, H, Hf, Hg, I, In, Ir, K, La, Li, Mg, Mn, Mo, N, Na, Nb, Ni, O, Os, P, Pb, Pd, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Tc, Te, Ti, Tl, V, W, Y, Zn, Zr
Kevin F. Garrity, Kamal Choudhary
829576
2578920
64
DFT-PBEsol
Quantum ESPRESSO
492
A dataset of 10 molecules (aspirin, azobenzene, benzene, ethanol, malonaldehyde, naphthalene, paracetamol, salicylic, toluene, uracil) with 100,000 structures calculated for each at the PBE/def2-SVP level of theory using ORCA. Based on the MD17 dataset, but with refined measurements.
10.60732/682fe04a
C, H, N, O
Anders S. Christensen, O. Anatole von Lilienfeld
999906
15598381
4
DFT-PBE
ORCA 4.0.1
391
OC20_IS2RES_val_id is the in-domain validation set for the OC20 Initial Structure to Relaxed Structure (IS2RS) and Initial Structure to Relaxed Energy (IS2RE) tasks. Features include energy, atomic forces and data from the OC20 mappings file, including adsorbate id, materials project bulk id and miller index.
10.60732/b4005525
Ag, Al, As, Au, B, Bi, C, Ca, Cd, Cl, Co, Cr, Cs, Cu, Fe, Ga, Ge, H, Hf, Hg, In, Ir, K, Mn, Mo, N, Na, Nb, Ni, O, Os, P, Pb, Pd, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Tc, Te, Ti, Tl, V, W, Y, Zn, Zr
Lowik Chanussot, Abhishek Das, Siddharth Goyal, Thibaut Lavril, Muhammed Shuaibi, Morgane Riviere, Kevin Tran, Javier Heras-Domingo, Caleb Ho, Weihua Hu, Aini Palizhati, Anuroop Sriram, Brandon Wood, Junwoong Yoon, Devi Parikh, C. Lawrence Zitnick, Zachary Ulissi
5024223
406465318
56
DFT-rPBE
VASP
379
Matbench v0.1 test dataset for predicting DFT formation energy from structure. Adapted from Materials Project database. Entries having formation energy more than 2.5eV and those containing noble gases are removed. Retrieved April 2, 2019. For benchmarking w/ nested cross validation, the order of the dataset must be identical to the retrieved data; refer to the Automatminer/Matbench publication for more details.Matbench is an automated leaderboard for benchmarking state of the art ML algorithms predicting a diverse range of solid materials' properties. It is hosted and maintained by the Materials Project.
10.60732/3cef7b09
Ac, Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, Hf, Hg, Ho, I, In, Ir, K, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Y, Yb, Zn, Zr
Alexander Dunn, Qi Wang, Alex Ganose, Daniel Dopp, Anubhav Jain
132741
3869238
84
DFT-undefined
VASP
323
The Matbench_mp_gap dataset is a Matbench v0.1 test dataset for predicting DFT PBE band gap from structure, adapted from the Materials Project database. Entries having a formation energy (or energy above the convex hull) greater than 150meV and those containing noble gases have been removed. Retrieved April 2, 2019. Refer to the Automatminer/Matbench publication for more details. This dataset contains band gap as calculated by PBE DFT from the Materials Project, in eV. Matbench is an automated leaderboard for benchmarking state of the art ML algorithms predicting a diverse range of solid materials' properties. It is hosted and maintained by the Materials Project.
10.60732/fb4d895d
Ac, Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, Hf, Hg, Ho, I, In, Ir, K, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Y, Yb, Zn, Zr
Alexander Dunn, Qi Wang, Alex Ganose, Daniel Dopp, Anubhav Jain
106105
3184639
84
DFT-PBE
VASP
311
The DFT_3D_12_12_2022 dataset is part of the joint automated repository for various integrated simulations (JARVIS) DFT database. This subset contains configurations of 3D materials. JARVIS is a set of tools and datasets built to meet current materials design challenges.
10.60732/e9e65ccd
Ac, Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr
Kamal Choudhary, Kevin F. Garrity, Andrew C. E. Reid, Brian DeCost, Adam J. Biacchi, Angela R. Hight Walker, Zachary Trautt, Jason Hattrick-Simpers, A. Gilad Kusne, Andrea Centrone, Albert Davydov, Jie Jiang, Ruth Pachter, Gowoon Cheon, Evan Reed, Ankit Agrawal, Xiaofeng Qian, Vinit Sharma, Houlong Zhuang, Sergei V. Kalinin, Bobby G. Sumpter, Ghanshyam Pilania, Pinar Acar, Subhasish Mandal, Kristjan Haule, David Vanderbilt, Karin Rabe, Francesca Tavazza
66617
683506
89
DFT-optB88-vdW, DFT-TBmBJ
VASP
298
OC20_IS2RES_ood_ads is the out-of-domain validation set for the OC20 Initial Structure to Relaxed Structure (IS2RS) and Initial Structure to Relaxed Energy (IS2RE) tasks with unseen adsorbates. Features include energy, atomic forces and data from the OC20 mappings file, including adsorbate id, materials project bulk id and miller index.
10.60732/0947596b
Ag, Al, As, Au, B, Bi, C, Ca, Cd, Cl, Co, Cr, Cs, Cu, Fe, Ga, Ge, H, Hf, Hg, In, Ir, K, Mn, Mo, N, Na, Nb, Ni, O, Os, P, Pb, Pd, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Tc, Te, Ti, Tl, V, W, Y, Zn, Zr
Lowik Chanussot, Abhishek Das, Siddharth Goyal, Thibaut Lavril, Muhammed Shuaibi, Morgane Riviere, Kevin Tran, Javier Heras-Domingo, Caleb Ho, Weihua Hu, Aini Palizhati, Anuroop Sriram, Brandon Wood, Junwoong Yoon, Devi Parikh, C. Lawrence Zitnick, Zachary Ulissi
4883196
390308139
56
DFT-rPBE
VASP
297
Configurations of Mo from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K.
10.60732/b6ece7fd
Mo
Christopher M. Andolina, Wissam A. Saidi
3663
66220
1
DFT-PBE
VASP
288
OC20_S2EF_train_2M is the 2 million structure training subset of the OC20 Structure to Energy and Forces dataset. Features include potential energy, free energy and atomic forces. Data from the OC20 mappings file, including adsorbate id, materials project bulk id, miller index, shift and others.
10.60732/672cc613
Ag, Al, As, Au, B, Bi, C, Ca, Cd, Cl, Co, Cr, Cs, Cu, Fe, Ga, Ge, H, Hf, Hg, In, Ir, K, Mn, Mo, N, Na, Nb, Ni, O, Os, P, Pb, Pd, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Tc, Te, Ti, Tl, V, W, Y, Zn, Zr
Lowik Chanussot, Abhishek Das, Siddharth Goyal, Thibaut Lavril, Muhammed Shuaibi, Morgane Riviere, Kevin Tran, Javier Heras-Domingo, Caleb Ho, Weihua Hu, Aini Palizhati, Anuroop Sriram, Brandon Wood, Junwoong Yoon, Devi Parikh, C. Lawrence Zitnick, Zachary Ulissi
2000000
146496199
56
DFT-rPBE
VASP
284
The training split of the Open Catalyst 2025 (OC25) dataset for solid-liquid interfaces. OC25 consists of single-point DFT calculations of catalyst/solvent/ion/adsorbate structures, covering 88 elements, 8 solvents (water, methanol, CCl4, DMSO, benzene, hexane, THF, diethyl ether), 9 ionic species (Cs+, OH-, Li+, SO4^2-, Ca^2+, [Me4N]+, HCO3-, H+, F-), and adsorbates from the OC20 set plus reactive intermediates. Surfaces are derived from 39,821 Materials Project bulk structures with miller indices <= 3. Structures are highly off-equilibrium, sampled from short ab initio molecular dynamics simulations (10-50 steps, 1000K, NVT) or short DFT relaxations (5 ionic steps). The training split contains ~7.4 million structures filtered to total force drift < 1 eV/Å. All DFT calculations used VASP 6.3.2 with the non-spin-polarized RPBE functional supplemented with D3 dispersion correction (zero damping), plane wave cutoff 400 eV, EDIFF=1e-4 eV, k-point reciprocal density of 40, and a dipole correction in the z-direction.
Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, F, Fe, Ga, Ge, H, He, Hf, Hg, I, In, Ir, K, Kr, La, Li, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, O, Os, P, Pb, Pd, Pm, Pr, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Tc, Te, Ti, Tl, V, W, Xe, Y, Zn, Zr
Sushree Jagriti Sahoo, Mikael Maroschin, Daniel S. Levine, Zachary Ulissi, C. Lawrence Zitnick, Joel B Varley, Joseph A. Gauthier, Nitish Govindarajan, Muhammed Shuaibi
7395509
1068208517
73
DFT-rPBE+D3
VASP 6.3.2
284
OC20_IS2RES_val_ood_cat is the out-of-domain validation set for the OC20 Initial Structure to Relaxed Structure (IS2RS) and Initial Structure to Relaxed Energy (IS2RE) tasks with unseen catalyst composition. Features include energy, atomic forces and data from the OC20 mappings file, including adsorbate id, materials project bulk id and miller index.
10.60732/3c47e0d4
Ag, Al, As, Au, Bi, C, Ca, Cd, Cl, Co, Cr, Cs, Cu, Fe, Ga, Ge, H, Hf, Hg, In, Ir, K, Mn, Mo, N, Na, Nb, Ni, O, Os, P, Pb, Pd, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Tc, Te, Ti, Tl, V, W, Y, Zn, Zr
Lowik Chanussot, Abhishek Das, Siddharth Goyal, Thibaut Lavril, Muhammed Shuaibi, Morgane Riviere, Kevin Tran, Javier Heras-Domingo, Caleb Ho, Weihua Hu, Aini Palizhati, Anuroop Sriram, Brandon Wood, Junwoong Yoon, Devi Parikh, C. Lawrence Zitnick, Zachary Ulissi
5151015
411767380
55
DFT-rPBE
VASP
279
The Alexandria Materials Database contains theoretical crystal structures in 1D, 2D and 3D discovered by machine learning approaches using DFT with PBE, PBEsol and SCAN methods. This dataset represents the geometry optimization paths for 1D crystal structures from Alexandria calculated using PBE methods.
10.60732/12246d46
Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, Hf, Hg, Ho, I, In, Ir, K, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, O, Os, P, Pb, Pd, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Te, Ti, Tl, Tm, V, W, Y, Yb, Zn, Zr
Jonathan Schmidt, Noah Hoffmann, Hai-Chen Wang, Pedro Borlido, Pedro J. M. A. Carriço, Tiago F. T. Cerqueira, Silvana Botti, Miguel A. L. Marques
614833
6062475
74
DFT-PBE
VASP
265
OC20_IS2RES_ood_ads is the out-of-domain validation set for the OC20 Initial Structure to Relaxed Structure (IS2RS) and Initial Structure to Relaxed Energy (IS2RE) tasks with unseen adsorbates. Features include energy, atomic forces and data from the OC20 mappings file, including adsorbate id, materials project bulk id and miller index.
10.60732/b8c9473b
Ag, Al, As, Au, Bi, C, Ca, Cd, Cl, Co, Cr, Cs, Cu, Fe, Ga, Ge, H, Hf, Hg, In, Ir, K, Mn, Mo, N, Na, Nb, Ni, O, Os, P, Pb, Pd, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Tc, Te, Ti, Tl, V, W, Y, Zn, Zr
Lowik Chanussot, Abhishek Das, Siddharth Goyal, Thibaut Lavril, Muhammed Shuaibi, Morgane Riviere, Kevin Tran, Javier Heras-Domingo, Caleb Ho, Weihua Hu, Aini Palizhati, Anuroop Sriram, Brandon Wood, Junwoong Yoon, Devi Parikh, C. Lawrence Zitnick, Zachary Ulissi
3665193
308297930
55
DFT-rPBE
VASP
257
ANI-2x-B973c-def2mTZVP is a portion of the ANI-2x dataset, which includes DFT-calculated energies for structures from 2 to 63 atoms in size containing H, C, N, O, S, F, and Cl. This portion of ANI-2x was calculated in ORCA at the B973c level of theory using the def2m-TZVP basis set. Configuration sets are divided by number of atoms per structure. Force corrections and dipoles are recorded in the metadata.
10.60732/d4e67cf8
C, Cl, F, H, N, O, S
Kate Huddleston, Roman Zubatyuk, Justin Smith, Adrian Roitberg, Olexandr Isayev, Ignacio Pickering, Christian Devereux, Kipton Barros
9642825
146644476
7
DFT-B973c
ORCA 4.2.1
254
Configurations of Zr from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K.
10.60732/efb626b6
Zr
Christopher M. Andolina, Wissam A. Saidi
4637
80393
1
DFT-PBE
VASP
252
This dataset contains data from eight AIMD simulations run in VASP to study electrochemical *CO-*CO coupling -- coupling of two *CO molecules -- at the liquid water-Cu(100) interface.
10.60732/62aed547
C, Cs, Cu, H, Li, O
Henrik H. Kristoffersen, Karen Chan
1671061
226245754
6
DFT-RPBE+D3
VASP
244
Approximately 300,000 benchmarking configurations derived partly from the MD-17 and LiPS datasets, partly from original simulated water and alanine dipeptide configurations.
10.60732/62c08514
C, H, Li, N, O, P, S
Xiang Fu, Zhenghao Wu, Wujie Wang, Tian Xie, Sinan Keten, Rafael Gomez-Bombarelli, Tommi Jaakkola
294980
23733532
7
IP-AMBER-03, DFT-PBE
DLPOLY, i-PI, VASP, GROMACS
243
Dataset contains DFT calculations of oxygen-deficient perovskites from the Ca2Fe2O5-brownmillerite and Ca2Mn2O5 structures; and tunnel CaMn4O8, a derivative of the CaMn2O4-marokite with Ca vacancies. The dataset was produced to investigate the effects of oxygen introduction or Ca vacancy introduction in ternary transition metal oxides, as a means to assess potential new Ca-ion battery materials.
10.60732/8dfc08c5
Ca, Fe, Mn, O
M. Elena Arroyo-de Dompablo, José Luis Casals
2919
387258
4
DFT-PBE
VASP
228
Configurations of Ag from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K.
10.60732/7cf58a2f
Ag
Christopher M. Andolina, Wissam A. Saidi
3654
99918
1
DFT-PBE
VASP
228
OC20_S2EF_val_ood_cat is the out-of-domain validation set of the OC20 Structure to Energy and Forces (S2EF) dataset featuring unseen catalyst composition. Features include energy, atomic forces and data from the OC20 mappings file, including adsorbate id, materials project bulk id and miller index.
10.60732/4221d2fa
Ag, Al, As, Au, Bi, C, Ca, Cd, Cl, Co, Cr, Cs, Cu, Fe, Ga, Ge, H, Hf, Hg, In, Ir, K, Mn, Mo, N, Na, Nb, Ni, O, Os, P, Pb, Pd, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Tc, Te, Ti, Tl, V, W, Y, Zn, Zr
Lowik Chanussot, Abhishek Das, Siddharth Goyal, Thibaut Lavril, Muhammed Shuaibi, Morgane Riviere, Kevin Tran, Javier Heras-Domingo, Caleb Ho, Weihua Hu, Aini Palizhati, Anuroop Sriram, Brandon Wood, Junwoong Yoon, Devi Parikh, C. Lawrence Zitnick, Zachary Ulissi
999809
74059718
55
DFT-rPBE
VASP
216
133885 molecular structures from the QM9 with revised bond and charges in the SDF format. Bond information can be gathered from the metadata column of the parquet files, a map where the key bonds contains the bond indices as they appear in the final rows of an SDF molecule block. If additional charges are present, these are contained under the key charge_info. rQM9 is derived from DeepChem's QM9 SDF dataset and rectifies the original dataset's net-charge discrepancies and invalid bond orders by enforcing correct valency-charge configurations. Nevertheless, a subset of molecules remains problematic, as they either fail RDKit sanitization or fragment into multiple components. The zero-based indices of these unresolved molecules are provided in a NumPy file in the original data file.
C, F, H, N, O
Cheng Zeng, Jirui Jin, George Karypis, Mark Transtrum, Ellad B. Tadmor, Richard G. Hennig, Adrian Roitberg, Stefano Martiniani, Mingjie Liu
133885
2407753
5
DFT-B3LYP
Gaussian 09
215
The JARVIS_DFT_3D_8_18_2021 dataset is part of the joint automated repository for various integrated simulations (JARVIS) DFT database. This subset contains configurations of 3D materials. JARVIS is a set of tools and datasets built to meet current materials design challenges.
10.60732/a9dd64f6
Ac, Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr
Kamal Choudhary, Kevin F. Garrity, Andrew C. E. Reid, Brian DeCost, Adam J. Biacchi, Angela R. Hight Walker, Zachary Trautt, Jason Hattrick-Simpers, A. Gilad Kusne, Andrea Centrone, Albert Davydov, Jie Jiang, Ruth Pachter, Gowoon Cheon, Evan Reed, Ankit Agrawal, Xiaofeng Qian, Vinit Sharma, Houlong Zhuang, Sergei V. Kalinin, Bobby G. Sumpter, Ghanshyam Pilania, Pinar Acar, Subhasish Mandal, Kristjan Haule, David Vanderbilt, Karin Rabe, Francesca Tavazza
47036
465994
89
DFT-optB88-vdW, DFT-TBmBJ
VASP
204
Dataset containing MD trajectories of the 42-atom tetrapeptide Ac-Ala3-NHMe from the MD22 benchmark set. MD22 represents a collection of datasets in a benchmark that can be considered an updated version of the MD17 benchmark datasets, including more challenges with respect to system size, flexibility and degree of non-locality. The datasets in MD22 include MD trajectories of the protein Ac-Ala3-NHMe; the lipid DHA (docosahexaenoic acid); the carbohydrate stachyose; nucleic acids AT-AT and AT-AT-CG-CG; and the buckyball catcher and double-walled nanotube supramolecules. Each of these is included here in a separate dataset, as represented on sgdml.org. Calculations were performed using FHI-aims and i-Pi software at the DFT-PBE+MBD level of theory. Trajectories were sampled at temperatures between 400-500 K at 1 fs resolution.
10.60732/4bc7295f
C, H, N, O
Stefan Chmiela, Valentin Vassilev-Galindo, Oliver T. Unke, Adil Kabylda, Huziel E. Sauceda, Alexandre Tkatchenko, Klaus-Robert Müller
85099
3574158
4
DFT-PBE+MBE
FHI-aims
202
The aimd-from-PBE-3000-nvt training split of OMat24 (Open Materials 2024). OMat24 is a large-scale open dataset of density functional theory (DFT) calculations. The dataset is available in sub-datasets and subsampled sub-datasets based on the structure generation strategy used. There are two main splits in OMat24: train and validation, each divided into the aforementioned subsampling and sub-datasets.
10.60732/105da475
Ac, Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr
Luis Barroso-Luque, Muhammed Shuaibi, Xiang Fu, Brandon M. Wood, Misko Dzamba, Meng Gao, Ammar Rizvi, C. Lawrence Zitnick, Zachary W. Ulissi
7839846
530963613
86
DFT-PBE+U
VASP
202
Configurations of Al from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K.
10.60732/ef6f5966
Al
Christopher M. Andolina, Wissam A. Saidi
2537
86924
1
DFT-PBE
VASP
202
The aimd-from-PBE-3000-npt training split of OMat24 (Open Materials 2024). OMat24 is a large-scale open dataset of density functional theory (DFT) calculations. The dataset is available in sub-datasets and subsampled sub-datasets based on the structure generation strategy used. There are two main splits in OMat24: train and validation, each divided into the aforementioned subsampling and sub-datasets.
10.60732/edd12490
Ac, Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr
Luis Barroso-Luque, Muhammed Shuaibi, Xiang Fu, Brandon M. Wood, Misko Dzamba, Meng Gao, Ammar Rizvi, C. Lawrence Zitnick, Zachary W. Ulissi
6076290
411540573
89
DFT-PBE+U
VASP
200
Test configurations with fixed value for dihedral beta in alpha-gamma plane of 150 degreesfrom 3BPA dataset. Used to showcase the performance of linear atomic cluster expansion (ACE) force fields in a machine learning model to predict the potential energy surfaces of organic molecules.
10.60732/1c4b1e1c
C, H, N, O
Dávid Péter Kovács, Cas van der Oord, Jiri Kucera, Alice E. A. Allen, Daniel J. Cole, Christoph Ortner, Gábor Csányi
2350
63450
4
DFT-ωB97X
ORCA
193
Test configurations from the 'random' split of Chig-AIMD. This dataset covers the conformational space of chignolin with DFT-level precision. We sequentially applied replica exchange molecular dynamics (REMD), conventional MD, and ab initio MD (AIMD) simulations on a 10 amino acid protein, Chignolin, and finally collected 2 million biomolecule structures with quantum level energy and force records.
10.60732/ec7bfb65
C, H, N, O
Tong Wang, Xinheng He, Mingyu Li, Bin Shao, Tie-Yan Liu
198983
33031178
4
DFT-M06-2X
ORCA 4.2.1
190
The training split of OMC25. Open Molecular Crystals 2025 (OMC25) is a molecular crystal dataset produced by Meta. The OE62 dataset was used as a source for sampling molecules; crystals were generated with Genarris 3.0; from these, relaxation trajectories were generated and sampled to create the final dataset. See the publication for details.
B, Br, C, Cl, F, H, I, N, O, P, S, Si
Vahe Gharakhanyan, Luis Barroso-Luque, Yi Yang, Muhammed Shuaibi, Kyle Michel, Daniel S. Levine, Misko Dzamba, Xiang Fu, Meng Gao, Xingyu Liu, Haoran Ni, Keian Noori, Brandon M. Wood, Matt Uyttendaele, Arman Boromand, C. Lawrence Zitnick, Noa Marom, Zachary W. Ulissi, Anuroop Sriram
24870226
3222851761
12
DFT-PBE
VASP 6.3
189
Training configurations from the 'scaffold' split of Chig-AIMD. This dataset covers the conformational space of chignolin with DFT-level precision. We sequentially applied replica exchange molecular dynamics (REMD), conventional MD, and ab initio MD (AIMD) simulations on a 10 amino acid protein, Chignolin, and finally collected 2 million biomolecule structures with quantum level energy and force records.
10.60732/facc4255
C, H, N, O
Tong Wang, Xinheng He, Mingyu Li, Bin Shao, Tie-Yan Liu
1592662
264381892
4
DFT-M06-2X
ORCA 4.2.1
185
The rattled-relax training split of OMat24 (Open Materials 2024). OMat24 is a large-scale open dataset of density functional theory (DFT) calculations. The dataset is available in subdatasets and subsampled sub-datasets based on the structure generation strategy used. There are two main splits in OMat24: train and validation, each divided into the aforementioned subsampling and sub-datasets.
10.60732/a096865d
Ac, Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr
Luis Barroso-Luque, Muhammed Shuaibi, Xiang Fu, Brandon M. Wood, Misko Dzamba, Meng Gao, Ammar Rizvi, C. Lawrence Zitnick, Zachary W. Ulissi
9433298
78952123
87
DFT-PBE+U
VASP
182
Training dataset that captures chemical short-range order in equiatomic CrCoNi medium-entropy alloy published with our work Quantifying chemical short-range order in metallic alloys (description provided by authors)
10.60732/76208b62
Co, Cr, Ni
Yifan Cao, Killian Sheriff, Rodrigo Freitas
1257
108684
3
DFT-PBE
VASP 6.2.1
177
The neutral validation set from OMol25. From the dataset creator: OMol25 represents the largest high quality molecular DFT dataset spanning biomolecules, metal complexes, electrolytes, and community datasets. OMol25 was generated at the ω B97M-V/def2-TZVPD level of theory.
10.60732/0d5818c5
B, Br, C, Ca, Cl, F, H, I, K, Li, Mg, N, Na, O, P, S, Si
Daniel S. Levine, Muhammed Shuaibi, Evan Walter Clark Spotte-Smith, Michael G. Taylor, Muhammad R. Hasyim, Kyle Michel, Ilyes Batatia, Gábor Csányi, Misko Dzamba, Peter Eastman, Nathan C. Frey, Xiang Fu, Vahe Gharakhanyan, Aditi S. Krishnapriyan, Joshua A. Rackers, Sanjeev Raja, Ammar Rizvi, Andrew S. Rosen, Zachary Ulissi, Santiago Vargas, C. Lawrence Zitnick, Samuel M. Blau, Brandon M. Wood
27697
1238644
17
DFT-ωB97M-V
ORCA
176
OC20_S2EF_val_id is the ~1-million structure in-domain validation set of the OC20 Structure to Energy and Forces (S2EF) dataset. Features include energy, atomic forces and data from the OC20 mappings file, including adsorbate id, materials project bulk id and miller index.
10.60732/eaea5062
Ag, Al, As, Au, B, Bi, C, Ca, Cd, Cl, Co, Cr, Cs, Cu, Fe, Ga, Ge, H, Hf, Hg, In, Ir, K, Mn, Mo, N, Na, Nb, Ni, O, Os, P, Pb, Pd, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Tc, Te, Ti, Tl, V, W, Y, Zn, Zr
Lowik Chanussot, Abhishek Das, Siddharth Goyal, Thibaut Lavril, Muhammed Shuaibi, Morgane Riviere, Kevin Tran, Javier Heras-Domingo, Caleb Ho, Weihua Hu, Aini Palizhati, Anuroop Sriram, Brandon Wood, Junwoong Yoon, Devi Parikh, C. Lawrence Zitnick, Zachary Ulissi
999866
73147343
56
DFT-rPBE
VASP
175
Configurations of Pt from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K.
10.60732/49b97320
Pt
Christopher M. Andolina, Wissam A. Saidi
2605
62053
1
DFT-PBE
VASP
175
The full (unfiltered) validation split of ODAC25.Open Direct Air Capture 2025 (ODAC25) is the largest high-quality DFT dataset for Direct Air Capture, containing over 15,000 Metal-Organic Frameworks (MOFs), including experimental, defective, synthetic, and amine-functionalized MOFs, with 4 adsorbates: CO2, H2O, N2, and O2. ODAC25 significantly improves upon ODAC23 by adding functionalized MOFs, new adsorbates (N2 and O2), higher k-point convergence, and re-relaxations of empty MOFs. The dataset contains three partitions: (1) mof_plus_adsorbate includes full DFT relaxations of different adsorbates on various MOFs; (2) mof includes re-relaxations of empty MOFs; (3) gcmc includes DFT single points of configurations derived from Grand Canonical Monte Carlo (GCMC) simulations.
Ag, Al, Bi, Br, C, Cd, Ce, Cl, Co, Cr, Cu, Eu, F, Fe, Gd, H, Hg, I, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, O, P, Pr, S, Sc, Se, Si, Sm, Sr, Tb, Th, U, V, Y, Zn, Zr
Anuroop Sriram, Logan M. Brabson, Xiaohan Yu, Sihoon Choi, Kareem Abdelmaqsoud, Elias Moubarak, Pim de Haan, Sindy Löwe, Johann Brehmer, John R. Kitchin, Max Welling, C. Lawrence Zitnick, Zachary Ulissi, Andrew J. Medford, David S. Sholl
1290240
286971239
42
DFT-PBE+D3
VASP 6.3
172
The JARVIS_CFID_OQMD dataset is part of the joint automated repository for various integrated simulations (JARVIS) database. This dataset contains configurations from the Open Quantum Materials Database (OQMD), created to hold information about the electronic structure and stability of organic materials for the purpose of aiding in materials discovery. Calculations were performed at the DFT level of theory, using the PAW-PBE functional implemented by VASP. This dataset also includes classical force-field inspired descriptors (CFID) for each configuration. JARVIS is a set of tools and collected datasets built to meet current materials design challenges.
10.60732/967596c1
Ac, Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr
Scott Kirklin, James E Saal, Bryce Meredig, Alex Thompson, Jeff W Doak, Muratahan Aykol, Stephan Rühl, Chris Wolverton
459943
2365987
89
DFT-PBE
VASP
172
Approximately 57,000 configurations from the evaluation datasets for NequIP graph neural network model for interatomic potentials. Trajectories have been taken from LIPS, LIPO glass melt-quench simulation, and formate decomposition on Cu datasets.
10.60732/e05d99fd
C, Cu, H, Li, O, P, S
Simon Batzner, Albert Musaelian, Lixin Sun, Mario Geiger, Jonathan P. Mailoa, Mordechai Kornbluth, Nicola Molinari, Tess E. Smidt, Boris Kozinsky
56822
7629463
7
DFT-PBE
CP2K, VASP
171
Training configurations from the 'random' split of Chig-AIMD. This dataset covers the conformational space of chignolin with DFT-level precision. We sequentially applied replica exchange molecular dynamics (REMD), conventional MD, and ab initio MD (AIMD) simulations on a 10 amino acid protein, Chignolin, and finally collected 2 million biomolecule structures with quantum level energy and force records.
10.60732/fac841ac
C, H, N, O
Tong Wang, Xinheng He, Mingyu Li, Bin Shao, Tie-Yan Liu
1592677
264384382
4
DFT-M06-2X
ORCA 4.2.1
169
This dataset is composed of fully-deuterated Gd(III) analogue d-[GdL] in a variety of solvent materials, including MeOH, D2O and d6-DMSO.
10.60732/cd8c58b7
C, Gd, H, N, O, S
Barak Alnami, Jon G. C. Kragskow, Jakob K. Staab, Jonathan M. Skelton, Nicholas F. Chilton
41746
28418566
6
DFT-PBE+D3
VASP 6.2.0
166
ANI-1x contains DFT calculations for approximately 5 million molecular conformations. From an initial training set, an active learning method was used to iteratively add conformations where insufficient diversity was detected. Additional conformations were sampled from existing databases of molecules, such as GDB-11 and ChEMBL. On each of these configurations, one of molecular dynamics sampling, normal mode sampling, dimer sampling, or torsion sampling was performed.
10.60732/dd0270c8
C, H, N, O
Justin S. Smith, Roman Zubatyuk, Benjamin Nebgen, Nicholas Lubbers, Kipton Barros, Adrian E. Roitberg, Olexandr Isayev, Sergei Tretiak
308645
5229919
4
DFT-ωB97X
Gaussian 09
163
The Matbench_perovskites dataset is a Matbench v0.1 test dataset for predicting formation energy from crystal structure. Adapted from an original dataset generated by Castelli et al. Refer to the Automatminer/Matbench publication for more details. This dataset contains the energy of formation of the entire 5-atom perovskite cell in eV as calculated by RPBE GGA-DFT. Note the reference state for oxygen was computed from oxygen's chemical potential in water vapor, not as oxygen molecules, to reflect the application for which these perovskites were studied. Matbench is an automated leaderboard for benchmarking state of the art ML algorithms predicting a diverse range of solid materials' properties. It is hosted and maintained by the Materials Project.
10.60732/c2d25b5f
Ag, Al, As, Au, B, Ba, Be, Bi, Ca, Cd, Co, Cr, Cs, Cu, F, Fe, Ga, Ge, Hf, Hg, In, Ir, K, La, Li, Mg, Mn, Mo, N, Na, Nb, Ni, O, Os, Pb, Pd, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Si, Sn, Sr, Ta, Te, Ti, Tl, V, W, Y, Zn, Zr
Alexander Dunn, Qi Wang, Alex Ganose, Daniel Dopp, Anubhav Jain
18926
94630
56
DFT-rPBE
GPAW
160
The train set of a train/test pair from the aspirin dataset from sGDML. To create the coupled cluster datasets, the data used for training the models were created by running abinitio MD in the NVT ensemble using the Nosé-Hoover thermostat at 500 K during a 200 ps simulation with a resolution of 0.5 fs. Energies and forces were recalculated by all-electron coupled cluster with single, double and perturbative triple excitations (CCSD(T)). The Dunning correlation-consistent basis set CCSD/cc-pVDZ was used for aspirin. All calculations were performed with the Psi4 software suite.
10.60732/51775b8b
C, H, O
Stefan Chmiela, Huziel E. Sauceda, Klaus-Robert Müller, Alexandre Tkatchenko
996
20916
3
CCSD
Psi4
159
The JARVIS_SNUMAT dataset is part of the joint automated repository for various integrated simulations (JARVIS) database. This dataset contains band gap data for >10,000 materials, computed using a hybrid functional and considering the stable magnetic ordering. Structure relaxation and band edges are obtained using the PBE XC functional; band gap energy is subsequently obtained using the HSE06 hybrid functional. Optical and fundamental band gap energies are included. Some gap energies are recalculated by including spin-orbit coupling. These are noted in the band gap metadata as "SOC=true". JARVIS is a set of tools and collected datasets built to meet current materials design challenges.
10.60732/d2b06d5a
Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, F, Fe, Ga, Ge, H, He, Hf, Hg, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Ne, Ni, O, Os, P, Pb, Pd, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Tc, Te, Th, Ti, Tl, V, W, Xe, Y, Zn, Zr
Sangtae Kim, Miso Lee, Changho Hong, Youngchae Yoon, Hyungmin An, Dongheon Lee, Wonseok Jeong, Dongsun Yoo, Youngho Kang, Yong Youn, Seungwu Han
10481
216749
73
DFT-PBE, DFT-HSE06
VASP
157
COMP6v2-B973c-def2mTZVP is the portion of COMP6v2 calculated at the B973c/def2mTZVP level of theory. COmprehensive Machine-learning Potential (COMP6) Benchmark Suite version 2.0 is an extension of the COMP6 benchmark found in the following repository: https://github.com/isayev/COMP6. COMP6v2 is a data set of density functional properties for molecules containing H, C, N, O, S, F, and Cl. It is available at the following levels of theory: wB97X/631Gd (data used to train model in the ANI-2x paper); wB97MD3BJ/def2TZVPP; wB97MV/def2TZVPP; B973c/def2mTZVP. The 6 subsets from COMP6 (ANI-MD, DrugBank, GDB07to09, GDB10to13 Tripeptides, and s66x8) are contained in each of the COMP6v2 datasets corresponding to the above levels of theory.
10.60732/2228cf4a
C, Cl, F, H, N, O, S
Kate Huddleston, Roman Zubatyuk, Justin Smith, Adrian Roitberg, Olexandr Isayev, Ignacio Pickering, Christian Devereux, Kipton Barros
156317
3785763
7
DFT-B973c
ORCA 4.2.1
156
Test configurations with MD simulations performed at 300K from 3BPA, used to showcase the performance of linear atomic cluster expansion (ACE) force fields in a machine learning model to predict the potential energy surfaces of organic molecules.
10.60732/5737de70
C, H, N, O
Dávid Péter Kovács, Cas van der Oord, Jiri Kucera, Alice E. A. Allen, Daniel J. Cole, Christoph Ortner, Gábor Csányi
1669
45063
4
DFT-ωB97X
ORCA
155
Training configurations with MD simulation performed at 300K, 600K and 1200K from 3BPA dataset, used to showcase the performance of linear atomic cluster expansion (ACE) force fields in a machine learning model to predict the potential energy surfaces of organic molecules.
10.60732/1dbc6d0a
C, H, N, O
Dávid Péter Kovács, Cas van der Oord, Jiri Kucera, Alice E. A. Allen, Daniel J. Cole, Christoph Ortner, Gábor Csányi
500
13500
4
DFT-ωB97X
ORCA
154
The water data set comprises energies and forces of 9,189 condensed-phase structures. The data was obtained in an iterative procedure described in detail in Ref. [4]. The final ANN potential was employed in Refs. [4,5] to analyze temperature-dependent Raman spectra of liquid water. The data set contains structures from four iterations: Initial structures (iteration 0) were obtained from classical and path integral AIMD simulations of bulk liquid water in a cubic box containing 64 water molecules at 300 K as reported in Ref. [6]. Distorted configurations with higher forces were added by randomly displacing the Cartesian coordinates of these configurations. Iteration 1 contains a set of 500 configurations from MD simulations with the fully flexible SPC/E flex water model [7] employing a 25 % increased water density (simulation box with 80 water molecules) and elevated temperatures (T = 500 K) in order to sample highly repulsive configurations. Structures in iteration 2 were obtained by classical MD simulations with preliminary ANN potentials at T = 300 K, 325 K, 350 K, and 370 K employing cubic boxes with 64 molecules and the corresponding experimental densities. The final iteration 3 data contains structures from preliminary ANN simulations with classical and quantum nuclei, respectively, at a wide range of temperatures (T = 258 K, 268 K, 280 K, 290 K, 300 K, 310 K, 320 K, 330 K, 340 K, 350 K, 360 K, and 370 K) using cubic boxes with 64 molecules and the corresponding experimental densities. Energies and atomic forces were calculated with the CP2K program [8,9] using the revPBE exchange-correlation functional [10,11] with D3 dispersion correction [12] following the setup reported in Ref. [4]. Atomic cores were represented using the dual-space Goedecker-Teter-Hutter pseudopotentials [13], Kohn-Sham orbitals were expanded in the TZV2P basis set within the GPW method [14], and the density was represented by an auxiliary plane-wave basis with a cutoff of 400 Ry. [1] A. Kokalj, J. Mol. Graphics Modell. 17, 176–179 (1999). [2] N. Artrith, A. Urban, Comput. Mater. Sci. 114, 135–150 (2016). [3] N. Artrith, A. Urban, G. Ceder, Phys. Rev. B 96, 014112 (2017). [4] T. Morawietz, O. Marsalek, S. R. Pattenaude, L. M. Streacker, D. Ben-Amotz, and T. E. Markland, J. Phys. Chem. Lett. 9, 851 (2018). [5] T. Morawietz, A. S. Urbina, P. K. Wise, X. Wu, W. Lu, D. Ben-Amotz, and T. E. Markland, J. Phys. Chem. Lett. 10, 6067 (2019). [6] Marsalek and T. E. Markland, J. Phys. Chem. Lett. 8, 1545 (2017). [7] X. B. Zhang, Q. L. Liu, and A. M. Zhu, Fluid Ph. Equilibria 262, 210(2007). [8] J. VandeVondele, M. Krack, F. Mohamed, M. Parrinello, T. Chassaing, and J. Hutter, Comput. Phys. Commun. 167, 103 (2005). [9] J. Hutter, M. Iannuzzi, F. Schiffmann, and J. VandeVondele, WIRES Comput. Mol. Sci. 4, 15 (2014). [10] J. P. Perdew, K. Burke, and M. Ernzerhof, Phys. Rev. Lett. 77, 3865 (1996). [11] Y. Zhang and W. Yang, Phys. Rev. Lett. 80, 890 (1998). [12] S. Grimme, J. Antony, S. Ehrlich, and H. Krieg, J. Chem. Phys. 132, 154104 (2010). [13] S. Goedecker, M. Teter, and J. Hutter, Phys. Rev. B 54, 1703 (1996). [14] B. G. Lippert, J. Hutter, and M. Parrinello, Mol. Phys. 92, 477 (1997).
10.60732/6ff013d4
H, O
Michael S. Chen, Tobias Morawietz, Thomas E. Markland, Nongnuch Artrith
9188
1788096
2
DFT-revPBE+D3
CP2K
154
Configurations of Co from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K.
10.60732/15bb1dca
Co
Christopher M. Andolina, Wissam A. Saidi
3337
67026
1
DFT-PBE
VASP
152
Configurations of sma from WS22. The WS22 database combines Wigner sampling with geometry interpolation to generate 1.18 million molecular geometries equally distributed into 10 independent datasets of flexible organic molecules with varying sizes and chemical complexity. In addition to the potential energy and forces required to construct potential energy surfaces, the WS22 database provides several other quantum chemical properties, all obtained via single-point calculations for each molecular geometry. All quantum chemical calculations were performed with the Gaussian 09 program.
10.60732/bd5d6dc9
C, H, N, O
Max Pinheiro Jr, Shuang Zhang, Pavlo O. Dral, Mario Barbatti
120028
2280532
4
DFT-PBE0
Gaussian 09
150
Training split of the MAD-1.5 (Massive Atomic Diversity version 1.5) dataset, a highly curated collection designed for training broadly applicable atomistic machine-learning models across the full periodic table. MAD-1.5 extends the original MAD dataset with targeted enrichment strategies covering 102 chemical elements (all isotopes with half-life above one day). All 216,803 structures are computed with a single standardized all-electron DFT workflow using the r2SCAN meta-GGA functional in FHI-aims (version 250806), with tight basis sets, 8 Angstrom^-1 k-point density, Gaussian smearing of 0.05 eV, and SCF convergence thresholds of 1e-6 eV (energy), 1e-4 eV/Angstrom (forces), and 1e-5 e*a0^-3 (electron density). The dataset spans molecules (monomers, dimers, trimers, molecular crystals), bulk crystals, surfaces, nanoclusters, and low-dimensional structures organized into 14 subsets. Quality is ensured by two-step outlier removal: heuristic filtering of structures with forces >100 eV/Angstrom, followed by LLPR uncertainty-based filtering. The training split (~83% of cleaned data) includes all monomers, dimers, and trimers to anchor low-body-order interactions. A companion PBE-functional dataset (Massive_Atomic_Diversity_MAD-1.5_PBE) was used during model training with separate prediction heads.
Ac, Ag, Al, Am, Ar, As, At, Au, B, Ba, Be, Bi, Bk, Br, C, Ca, Cd, Ce, Cf, Cl, Cm, Co, Cr, Cs, Cu, Dy, Er, Es, Eu, F, Fe, Fm, Fr, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Md, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, No, Np, O, Os, P, Pa, Pb, Pd, Pm, Po, Pr, Pt, Pu, Ra, Rb, Re, Rh, Rn, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr
Cesare Malosso, Filippo Bigi, Paolo Pegolo, Joseph W. Abbott, Philip Loche, Mariana Rossi, Michele Ceriotti, Arslan Mazitov
180174
2630122
102
DFT-r2SCAN
FHI-aims v250806
149
In-domain validation configurations for the structure to total energy and forces (S2EF) task of OC22. Open Catalyst 2022 (OC22) is a database of training trajectories for predicting catalytic reactions on oxide surfaces meant to complement OC20, which did not contain oxide surfaces.
10.60732/2e72b273
Ag, Al, As, Au, Ba, Be, Bi, C, Ca, Cd, Ce, Co, Cr, Cs, Cu, Fe, Ga, Ge, H, Hf, Hg, In, Ir, K, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Ni, O, Os, Pb, Pd, Pt, Rb, Re, Rh, Ru, Sb, Sc, Se, Si, Sn, Sr, Ta, Te, Ti, Tl, V, W, Y, Zn, Zr
Richard Tran, Janice Lan, Muhammed Shuaibi, Brandon M. Wood, Siddharth Goyal, Abhishek Das, Javier Heras-Domingo, Adeesh Kolluru, Ammar Rizvi, Nima Shoghi, Anuroop Sriram, Felix Therrien, Jehad Abed, Oleksandr Voznyy, Edward H. Sargent, Zachary Ulissi, C. Lawrence Zitnick
405444
31860942
57
DFT-PBE+U
VASP
148
The n-tetradecane training split of the QM-22 datasets. QM-22 consists of CHON molecules of 4-15 atoms, developed in counterpoint to the MD17 dataset, run at higher total energies (above 500 K) and with a broader configuration space.
10.60732/0396d7de
C, H
Chen Qu, Paul L. Houston, Thomas Allison, Barry I. Schneider, Joel M. Bowman
253646
11160424
2
DFT-B3LYP
Gaussian 16
146
Approximately 2,300 configurations of Li10SiP2S12, based on crystal structures from the Materials Project database, material ID mp-696129. One of two LiSiPS datasets from this source. The other uses the PBE functional, rather than the PBEsol functional.
10.60732/8e2d8e4c
Li, P, S, Si
Jianxing Huang, Linfeng Zhang, Han Wang, Jinbao Zhao, Jun Cheng, Weinan E
2356
313100
4
DFT-PBEsol
VASP 5.4.4
146
Verification set for magnetic Moment Tensor Potentials (mMTPs) for the bcc Fe-Al system. Contains 336 configurations of 16-atom Fe-Al supercells with collinear atomic magnetic moments, used to validate mMTPs trained on the companion training set (FeAl-mMTP-Train). Configurations generated using constrained DFT (cDFT) with ABINIT and PAW PBE pseudopotentials with a 6x6x6 k-point mesh and 25 Hartree plane-wave cutoff energy. mMTPs predict formation energy, lattice parameters, and total magnetic moments of bcc Fe-Al at 0 K.Note: ColabFit dataset contains energy, atomic forces, and stress. Refer to the original files for per-atom magnetic moment data.
Al, Fe
Alexey S. Kotykhov, Konstantin Gubaev, Max Hodapp, Christian Tantardini, Alexander V. Shapeev, Ivan S. Novikov
210
5376
2
DFT-PBE
ABINIT
146
The Acetaldehyde (singlet) set of the QM-22 datasets, with energies calculated at the CCSD(T)/MRCI level of theory. QM-22 consists of CHON molecules of 4-15 atoms, developed in counterpoint to the MD17 dataset, run at higher total energies (above 500 K) and with a broader configuration space.
10.60732/27f8a97a
C, H, O
Yong-Chang Han, Benjamin C. Shepler, Joel M. Bowman
202518
1417626
3
CCSD(T), MRCI
MOLPRO
142
Configurations of Ni from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K.
10.60732/b184fffc
Ni
Christopher M. Andolina, Wissam A. Saidi
3778
74782
1
DFT-PBE
VASP
142
Test split of the MAD-1.5 (Massive Atomic Diversity version 1.5) dataset, a highly curated collection designed for training broadly applicable atomistic machine-learning models across the full periodic table. MAD-1.5 extends the original MAD dataset with targeted enrichment strategies covering 102 chemical elements (all isotopes with half-life above one day). All 216,803 structures are computed with a single standardized all-electron DFT workflow using the r2SCAN meta-GGA functional in FHI-aims (version 250806), with tight basis sets, 8 Angstrom^-1 k-point density, Gaussian smearing of 0.05 eV, and SCF convergence thresholds of 1e-6 eV (energy), 1e-4 eV/Angstrom (forces), and 1e-5 e*a0^-3 (electron density). The dataset spans molecules (monomers, dimers, trimers, molecular crystals), bulk crystals, surfaces, nanoclusters, and low-dimensional structures organized into 14 subsets. Quality is ensured by two-step outlier removal: heuristic filtering of structures with forces >100 eV/Angstrom, followed by LLPR uncertainty-based filtering. The test split (~10% of cleaned data, excluding monomers, dimers, and trimers which are fixed in the training split) uses a stratified split method consistent with the training and validation splits. Subset-resolved MAE for PET-MAD-1.5-S on this test set is 11.09 meV/atom (energy) and 36.81 meV/Angstrom (forces). A companion PBE-functional dataset (Massive_Atomic_Diversity_MAD-1.5_PBE) was used during model training with separate prediction heads.
Ac, Ag, Al, Am, Ar, As, At, Au, B, Ba, Be, Bi, Bk, Br, C, Ca, Cd, Ce, Cf, Cl, Cm, Co, Cr, Cs, Cu, Dy, Er, Es, Eu, F, Fe, Fm, Fr, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Md, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, No, Np, O, Os, P, Pa, Pb, Pd, Pm, Po, Pr, Pt, Pu, Ra, Rb, Re, Rh, Rn, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr
Cesare Malosso, Filippo Bigi, Paolo Pegolo, Joseph W. Abbott, Philip Loche, Mariana Rossi, Michele Ceriotti, Arslan Mazitov
18314
321704
102
DFT-r2SCAN
FHI-aims v250806
141
The validation split of sAlex. sAlex is a subsample of the Alexandria dataset that was used to fine tune the OMat24 (Open Materials 2024) models. From the site: sAlex was created by removing structures matched in WBM and only sampling structure along a trajectory with an energy difference greater than 10 meV/atom.
10.60732/1c59d4ac
Ac, Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr
Luis Barroso-Luque, Muhammed Shuaibi, Xiang Fu, Brandon M. Wood, Misko Dzamba, Meng Gao, Ammar Rizvi, C. Lawrence Zitnick, Zachary W. Ulissi
547885
5670890
86
DFT-PBE+U
VASP
141
MatPES (Materials Potential Energy Surface) is a foundational PES dataset developed collaboratively by the Materials Virtual Lab and Materials Project. The v2025.1 PBE release contains 434,712 structures sampled via the DIRECT method from 300 K NpT molecular dynamics simulations seeded from Materials Project entries. Static DFT calculations were performed using VASP with the PBE functional and MatPESStaticSet convergence settings optimized for energy, force, and stress calculations. There is a companion dataset calculated with the r2SCAN functional (MatPES-R2SCAN-2025.1).
Ac, Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr
Aaron D. Kaplan, Runze Liu, Ji Qi, Tsz Wai Ko, Bowen Deng, Janosh Riebesell, Gerbrand Ceder, Kristin A. Persson, Shyue Ping Ong
434668
3881535
89
DFT-PBE
VASP 6.4.x
140
Dataset containing MD trajectories of AT-AT-CG-CG DNA base pairs from the MD22 benchmark set. MD22 represents a collection of datasets in a benchmark that can be considered an updated version of the MD17 benchmark datasets, including more challenges with respect to system size, flexibility and degree of non-locality. The datasets in MD22 include MD trajectories of the protein Ac-Ala3-NHMe; the lipid DHA (docosahexaenoic acid); the carbohydrate stachyose; nucleic acids AT-AT and AT-AT-CG-CG; and the buckyball catcher and double-walled nanotube supramolecules. Each of these is included here in a separate dataset, as represented on sgdml.org. Calculations were performed using FHI-aims and i-Pi software at the DFT-PBE+MBD level of theory. Trajectories were sampled at temperatures between 400-500 K at 1 fs resolution.
10.60732/a87c6d4c
C, H, N, O
Stefan Chmiela, Valentin Vassilev-Galindo, Oliver T. Unke, Adil Kabylda, Huziel E. Sauceda, Alexandre Tkatchenko, Klaus-Robert Müller
10153
1198054
4
DFT-PBE+MBE
FHI-aims
139
The test split of the dataset Alex_MP-20. This dataset contains structures from the Alexandria (Schmidt et al. 2022) and MP-20 (Materials Project 2020) datasets. Data has been modified as follows: Exclude structures containing the elements Tc, Pm, or any element with atomic number 84 or higher. Relax structures with DFT using a PBE functional in order to have consistent energies. For the training set, remove any structure with more than 20 atoms inside the unit cell. For the training set, remove any structure with energy above the hull higher than 0.1 eV/atom.
10.60732/4df848c7
Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, Hf, Hg, Ho, I, In, Ir, K, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, O, Os, P, Pb, Pd, Pr, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Te, Ti, Tl, Tm, V, W, Y, Yb, Zn, Zr
Claudio Zeni, Robert Pinsler, Daniel Zügner, Andrew Fowler, Matthew Horton, Xiang Fu, Zilong Wang, Aliaksandra Shysheya, Jonathan Crabbé, Shoko Ueda, Roberto Sordillo, Lixin Sun, Jake Smith, Bichlien Nguyen, Hannes Schulz, Sarah Lewis, Chin-Wei Huang, Ziheng Lu, Yichi Zhou, Han Yang, Hongxia Hao, Jielan Li, Chunlei Yang, Wenjie Li, Ryota Tomioka, Tian Xie
67521
647769
76
DFT-PBE
VASP
139
The validation split of the dataset Alex_MP-20. This dataset contains structures from the Alexandria (Schmidt et al. 2022) and MP-20 (Materials Project 2020) datasets. Data has been modified as follows: Exclude structures containing the elements Tc, Pm, or any element with atomic number 84 or higher. Relax structures with DFT using a PBE functional in order to have consistent energies. For the training set, remove any structure with more than 20 atoms inside the unit cell. For the training set, remove any structure with energy above the hull higher than 0.1 eV/atom.
10.60732/4132ee7c
Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, Hf, Hg, Ho, I, In, Ir, K, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, O, Os, P, Pb, Pd, Pr, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Te, Ti, Tl, Tm, V, W, Y, Yb, Zn, Zr
Claudio Zeni, Robert Pinsler, Daniel Zügner, Andrew Fowler, Matthew Horton, Xiang Fu, Zilong Wang, Aliaksandra Shysheya, Jonathan Crabbé, Shoko Ueda, Roberto Sordillo, Lixin Sun, Jake Smith, Bichlien Nguyen, Hannes Schulz, Sarah Lewis, Chin-Wei Huang, Ziheng Lu, Yichi Zhou, Han Yang, Hongxia Hao, Jielan Li, Chunlei Yang, Wenjie Li, Ryota Tomioka, Tian Xie
67521
647222
76
DFT-PBE
VASP
137
Benzene test PBE0-MBD dataset from "Semi-local and hybrid functional DFT data for thermalised snapshots of polymorphs of benzene, succinic acid, and glycine". DFT reference energies and forces were calculated using Quantum Espresso v6.3. The calculations were performed with the semi-local PBE xc functional, Tkatchenko-Scheffler dispersion correction, optimised norm-conserving Vanderbilt pseudopotentials, a Monkhorst-Pack k-point grid with a maximum spacing of 0.06 x 2π A^-1, and a plane-wave energy cut-off of 100 Ry for the wavefunction.
10.60732/c88b64a0
C, H
Venkat Kapil, Edgar A. Engel
200
5760
2
DFT-PBE+TS
Quantum ESPRESSO v6.3
137
The n-syn-CH3CHOO set of the QM-22 datasets, with energies calculated at the CCSD(T)/MRCI level of theory. QM-22 consists of CHON molecules of 4-15 atoms, developed in counterpoint to the MD17 dataset, run at higher total energies (above 500 K) and with a broader configuration space.
10.60732/4647e973
C, H, O
Nathanael M. Kidwell, Hongwei Li, Xiaohong Wang, Joel M. Bowman, Marsha I. Lester
159474
1275792
3
CCSD(T)-F12b
MOLPRO, MOLCAS
135
The Glycine set of the QM-22 datasets. QM-22 consists of CHON molecules of 4-15 atoms, developed in counterpoint to the MD17 dataset, run at higher total energies (above 500 K) and with a broader configuration space.
10.60732/9771a0c2
C, H, N, O
Joel M. Bowman, Jeffrey Li, Chen Qu, Riccardo Conte, Paul L. Houston
70099
700990
4
DFT-B3LYP
MOLPRO
135
The JARVIS_Open_Catalyst_All dataset is part of the joint automated repository for various integrated simulations (JARVIS) DFT database. This subset contains configurations from the Open Catalyst Project (OCP) 460328 training, rest validation and test dataset. JARVIS is a set of tools and datasets built to meet current materials design challenges.
10.60732/198ab33a
Ag, Al, As, Au, B, Bi, C, Ca, Cd, Cl, Co, Cr, Cs, Cu, Fe, Ga, Ge, H, Hf, Hg, In, Ir, K, Mn, Mo, N, Na, Nb, Ni, O, Os, P, Pb, Pd, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Tc, Te, Ti, Tl, V, W, Y, Zn, Zr
Lowik Chanussot, Abhishek Das, Siddharth Goyal, Thibaut Lavril, Muhammed Shuaibi, Morgane Riviere, Kevin Tran, Javier Heras-Domingo, Caleb Ho, Weihua Hu, Aini Palizhati, Anuroop Sriram, Brandon Wood, Junwoong Yoon, Devi Parikh, C. Lawrence Zitnick, Zachary Ulissi
485236
37726627
56
DFT-rPBE
VASP
135
Configurations of Pd from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K.
10.60732/8d5bdb05
Pd
Christopher M. Andolina, Wissam A. Saidi
3413
137688
1
DFT-PBE
VASP
135
The JARVIS_Materials_Project_84K dataset is part of the joint automated repository for various integrated simulations (JARVIS) DFT database. This subset contains 84,000 configurations of 3D materials from the Materials Project database. JARVIS is a set of tools and datasets built to meet current materials design challenges.
10.60732/46681ef7
Ac, Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr
Anubhav Jain, Shyue Ping Ong, Geoffroy Hautier, Wei Chen, William Davidson Richards, Stephen Dacek, Shreyas Cholia, Dan Gunter, David Skinner, Gerbrand Ceder, Kristin A. Persson
83416
2339728
89
DFT-undefined
VASP
132
The JARVIS_OQMD_no_CFID dataset is part of the joint automated repository for various integrated simulations (JARVIS) database. This dataset contains configurations from the Open Quantum Materials Database (OQMD), created to hold information about the electronic structure and stability of organic materials for the purpose of aiding in materials discovery. Calculations were performed at the DFT level of theory, using the PAW-PBE functional implemented by VASP. JARVIS is a set of tools and collected datasets built to meet current materials design challenges.
10.60732/82cb32aa
Ac, Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr
Scott Kirklin, James E Saal, Bryce Meredig, Alex Thompson, Jeff W Doak, Muratahan Aykol, Stephan Rühl, Chris Wolverton
811368
5015282
89
DFT-PBE
VASP
131
ANI-1 is a dataset of 20 million conformations with calculated non-equilibrium energy values. The conformations are based on a subset of the GDB-11 dataset, each molecule containing between 1 and 8 heavy atoms, with atomic species limited to C, N and O. Configuration sets are included for standard and high energy (defined as energies greater than 275 kcal*mol-1 higher than the lowest energy conformer) conformations, and, within these, number of heavy atoms per molecule.
10.60732/a57b3cb3
C, H, N, O
Justin S. Smith, Olexandr Isayev, Adrian E. Roitberg
24389594
392138641
4
DFT-ωB97X
Gaussian 09
131
The JARVIS-Polymer-Genome dataset is part of the joint automated repository for various integrated simulations (JARVIS) database. This dataset contains configurations from the Polymer Genome dataset, as created for the linked publication (Huan, T., Mannodi-Kanakkithodi, A., Kim, C. et al.). Structures were curated from existing sources and the original authors' works, removing redundant, identical structures before calculations, and removing redundant datapoints after calculations were performed. Band gap energies were calculated using two different DFT functionals: rPW86 and HSE06; atomization energy was calculated using rPW86. JARVIS is a set of tools and collected datasets built to meet current materials design challenges.
10.60732/37f5fcea
Al, C, Ca, Cd, Cl, F, H, Hf, Mg, N, O, Pb, S, Sn, Ti, Zn, Zr
Tran Doan Huan, Arun Mannodi-Kanakkithodi, Chiho Kim, Vinit Sharma, Ghanshyam Pilania, Rampi Ramprasad
1073
34441
17
DFT-rPW86, DFT-HSE06
VASP
130
The n-tetradecane testing split of the QM-22 datasets. This split includes DFT calculated atomic forces. Metadata includes energy difference in cm^-1 between given structure and the zig-zag minimum. QM-22 consists of CHON molecules of 4-15 atoms, developed in counterpoint to the MD17 dataset, run at higher total energies (above 500 K) and with a broader configuration space.
10.60732/7cec33e0
C, H
Chen Qu, Paul L. Houston, Thomas Allison, Barry I. Schneider, Joel M. Bowman
89648
5375749
2
DFT-B3LYP
Gaussian 16
128
ANI-2x-wB97X-631Gd is a portion of the ANI-2x dataset, which includes DFT-calculated energies for structures from 2 to 63 atoms in size containing H, C, N, O, S, F, and Cl. This portion of ANI-2x was calculated in Gaussian 09 at the wB97X level of theory using the 6-31G(d) basis set. Configuration sets are divided by number of atoms per structure.
10.60732/ac84253d
C, Cl, F, H, N, O, S
Kate Huddleston, Roman Zubatyuk, Justin Smith, Adrian Roitberg, Olexandr Isayev, Ignacio Pickering, Christian Devereux, Kipton Barros
9650934
146725202
7
DFT-ωB97X
Gaussian 09
128
1090 structures uniformly selected from the MD/tfMC simulation during the training process of CGM-MLPs. This dataset was one of the datasets used in testing screening parameters during the process of producing an active learning dataset for Cu-C interactions for the purposes of exploring substrate-catalyzed deposition as a means of controllable synthesis of carbon nanomaterials. The combined dataset includes structures from the Carbon_GAP_20 dataset and additional configurations of carbon clusters on a Cu(111) surface.
10.60732/535052eb
C, Cu
Di Zhang, Peiyun Yi, Xinmin Lai, Linfa Peng, Hao Li
1091
362898
2
DFT-PBE+D3
CP2K
128
Configurations of Mg from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K.
10.60732/f4d2f0b8
Mg
Christopher M. Andolina, Wissam A. Saidi
2938
57353
1
DFT-PBE
VASP
128
A comprehensive DFT data set was generated for six elements - Li, Mo, Ni, Cu, Si, and Ge. These elements were chosen to span a variety of chemistries (main group metal, transition metal, and semiconductor), crystal structures (bcc, fcc, and diamond) and bonding types (metallic and covalent). This dataset comprises only the Ge configurations
10.60732/4552d3fd
Ge
Yunxing Zuo, Chi Chen, Xiangguo Li, Zhi Deng, Yiming Chen, Jörg Behler, Gábor Csányi, Alexander V. Shapeev, Aidan P. Thompson, Mitchell A. Wood, Shyue Ping Ong
228
14072
1
DFT-PBE
VASP
126
The rattled-1000 training split of OMat24 (Open Materials 2024). OMat24 is a large-scale open dataset of density functional theory (DFT) calculations. The dataset is available in sub-datasets and subsampled sub-datasets based on the structure generation strategy used. There are two main splits in OMat24: train and validation, each divided into the aforementioned subsampling and sub-datasets.
10.60732/6994f9f0
Ac, Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr
Luis Barroso-Luque, Muhammed Shuaibi, Xiang Fu, Brandon M. Wood, Misko Dzamba, Meng Gao, Ammar Rizvi, C. Lawrence Zitnick, Zachary W. Ulissi
11388475
161511768
89
DFT-PBE+U
VASP
126
The Tropolone set of the QM-22 datasets. QM-22 consists of CHON molecules of 4-15 atoms, developed in counterpoint to the MD17 dataset, run at higher total energies (above 500 K) and with a broader configuration space.
10.60732/b76ce2d6
C, H, O
Joel M. Bowman, Chen Qu, Riccardo Conte, Apurba Nandi, Paul L. Houston, Qi Yu
6768
101520
3
DFT-B3LYP
Gaussian 16
125
The aimd-from-PBE-1000-npt training split of OMat24 (Open Materials 2024). OMat24 is a large-scale open dataset of density functional theory (DFT) calculations. The dataset is available in subdatasets and subsampled sub-datasets based on the structure generation strategy used. There are two main splits in OMat24: train and validation, each divided into the aforementioned subsampling and sub-datasets.
10.60732/25f16f85
Ac, Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr
Luis Barroso-Luque, Muhammed Shuaibi, Xiang Fu, Brandon M. Wood, Misko Dzamba, Meng Gao, Ammar Rizvi, C. Lawrence Zitnick, Zachary W. Ulissi
21269486
179930890
89
DFT-PBE+U
VASP
125
Training set for magnetic Moment Tensor Potentials (mMTPs) that fit to magnetic forces for the bcc Fe-Al system. Contains 2632 configurations of 16-atom Fe-Al supercells with collinear atomic magnetic moments and magnetic forces (negative derivatives of energy with respect to magnetic moments, in eV/mu_B; zero for equilibrium magnetic moments). Configurations generated using constrained DFT (cDFT) with ABINIT and PAW PBE pseudopotentials with a 6x6x6 k-point mesh and 25 Hartree plane-wave cutoff energy. Fitting to magnetic forces is demonstrated to improve reliability of the fitted mMTPs compared to fitting only to energies and forces. mMTP ensembles with 2, 3, and 4 magnetic basis functions are evaluated for predicting Fe-Al properties at 0 K and lattice parameters at 300 K. Note: ColabFit dataset contains energy, atomic forces, and stress. Refer to the original files for per-atom magnetic moment and magnetic force data.
Al, Fe
Alexey S. Kotykhov, Konstantin Gubaev, Vadim Sotskov, Christian Tantardini, Max Hodapp, Alexander V. Shapeev, Ivan S. Novikov
1018
42096
2
DFT-PBE
ABINIT
125
MSR-ACC/TAE25 (Microsoft Research Accurate Chemistry Collection, Total Atomization Energies 2025) provides 73,040 total atomization energies (TAEs) at the CCSD(T)/CBS level obtained with the W1-F12 composite wavefunction protocol implemented in Molpro 2024.1. This is the canonical training split comprising 71,871 molecules (99% of molecules remaining after removing overlap with the W4-17 and GMTKN55 benchmark sets).The dataset covers the chemical space of closed-shell, charge-neutral, covalently bound equilibrium molecular structures containing up to 5 non-hydrogen atoms drawn from elements H through Ar, excluding rare gases. Molecular structures were generated by exhaustive graph enumeration and degree-sequence sampling, then optimized through a cascade of GFN2-xTB, r2SCAN-3c, and B3LYP-D3(BJ)/def2-TZVPP levels of theory (ORCA). Structures were filtered to exclude those with significant multireference character (%TAE[(T)] > 6% at CCSD(T)/6-31G*), triplet electronic ground states, or dissociated fragments. The W1-F12 protocol includes Hartree-Fock extrapolation to the complete basis set limit (cc-pVDZ-F12 and cc-pVTZ-F12, alpha=5), CCSD-F12b correlation, perturbative triples delta(T) using jul-cc-pV(D+d)Z and jul-cc-pV(T+d)Z basis sets (alpha=3.22), and a core-valence correction using cc-pwCVTZ. The dataset spans 45.1% organic and 54.9% inorganic molecules and provides broader chemical diversity than comparable datasets such as GDB-9 or VQM24/DMC. Additional data available in the source files, including DFT atomization energies at approximately 90 levels of theory, singlet-triplet gaps, %TAE[(T)] multireference diagnostics, and W1-F12 energy components, can be downloaded from ColabFit Exchange.
Al, B, Be, C, Cl, F, H, Li, Mg, N, Na, O, P, S, Si
Sebastian Ehlert, Jan Hermann, Thijs Vogels, Victor Garcia Satorras, Stephanie Lanius, Marwin Segler, Klaas J.H. Giesbertz, Derk P. Kooi, Kenji Takeda, Chin-Wei Huang, Giulia Luise, Rianne van den Berg, Paola Gori-Giorgi, Amir Karton
71871
532242
15
W1-F12/CCSD(T)-CBS
Molpro 2024.1
124
A comprehensive DFT data set was generated for six elements - Li, Mo, Ni, Cu, Si, and Ge. These elements were chosen to span a variety of chemistries (main group metal, transition metal, and semiconductor), crystal structures (bcc, fcc, and diamond) and bonding types (metallic and covalent). This dataset comprises only the Cu configurations
10.60732/49de06ae
Cu
Yunxing Zuo, Chi Chen, Xiangguo Li, Zhi Deng, Yiming Chen, Jörg Behler, Gábor Csányi, Alexander V. Shapeev, Aidan P. Thompson, Mitchell A. Wood, Shyue Ping Ong
262
27416
1
DFT-PBE
VASP
124
Configurations of azobenzene featuring a cis to trans thermal inversion through three channels: inversion, rotation, and rotation assisted by inversion; and configurations of glycine as a simpler comparison molecule. All calculations were performed in FHI-aims software using the Perdew-Burke-Ernzerhof (PBE) exchange-correlation functional with the Tkatchenko-Scheffler (TS) method to account for van der Waals (vdW) interactions. The azobenzene sets contain calculations from several different MD simulations, including two long simulations initialized at 300 K; short simulations (300 steps) initialized at 300 K and shorter (.5fs) timestep; four simulations, two starting from each of cis and trans isomer, at 750 K (initialized at 3000 K); and simulations at 50 K (initialized at 300 K). The glycine isomerization set was built using one MD simulation starting from each of two different minima. Initializatin and simulation temperature were 500 K.
10.60732/71f8031b
C, H, N, O
Valentin Vassilev-Galindo, Gregory Fonseca, Igor Poltavsky, Alexandre Tkatchenko
69174
1520162
4
DFT-PBE
FHI-aims
124
A reference set of configurations of hydrogenated liquid and amorphous silicon from the datasets for Si-H-GAP. These configurations were used to evaluate training on a GAP model.
10.60732/54bc0a93
H, Si
Davis Unruh, Reza Vatan Meidanshahi, Stephen M. Goodnick, Gábor Csányi, Gergely T. Zimányi
114
24895
2
DFT-PBE
Quantum ESPRESSO
118
ANI-2x-wB97MV-def2TZVPP is a portion of the ANI-2x dataset, which includes DFT-calculated energies for structures from 2 to 63 atoms in size containing H, C, N, O, S, F, and Cl. This portion of ANI-2x was calculated at the WB97MV level of theory using the def2TZVPP basis set. Configuration sets are divided by number of atoms per structure.
10.60732/5209eb00
C, Cl, F, H, N, O, S
Kate Huddleston, Roman Zubatyuk, Justin Smith, Adrian Roitberg, Olexandr Isayev, Ignacio Pickering, Christian Devereux, Kipton Barros
9649797
146703867
7
DFT-ωB97M-V
ORCA 4.2.1
118
COMP6v2-wB97MV-def2TZVPP is the portion of COMP6v2 calculated at the wB97MV/def2TZVPP level of theory. COmprehensive Machine-learning Potential (COMP6) Benchmark Suite version 2.0 is an extension of the COMP6 benchmark found in the following repository: https://github.com/isayev/COMP6. COMP6v2 is a data set of density functional properties for molecules containing H, C, N, O, S, F, and Cl. It is available at the following levels of theory: wB97X/631Gd (data used to train model in the ANI-2x paper); wB97MD3BJ/def2TZVPP; wB97MV/def2TZVPP; B973c/def2mTZVP. The 6 subsets from COMP6 (ANI-MD, DrugBank, GDB07to09, GDB10to13 Tripeptides, and s66x8) are contained in each of the COMP6v2 datasets corresponding to the above levels of theory.
10.60732/e98b76e7
C, Cl, F, H, N, O, S
Kate Huddleston, Roman Zubatyuk, Justin Smith, Adrian Roitberg, Olexandr Isayev, Ignacio Pickering, Christian Devereux, Kipton Barros
156338
3786615
7
DFT-ωB97M-V
ORCA 4.2.1
117
This dataset was generated using the following active learning scheme: 1) candidate structures were relaxed by a partially-trained MTP model, 2) structures for which the MTP had to perform extrapolation were passed to DFT to be re-computed, 3) the MTP was retrained, including the structures that were re-computed with DFT, 4) steps 1-3 were repeated until the MTP no longer extrapolated on any of the original candidate structures. The original candidate structures for this dataset included about 375,000 binary and ternary structures, enumerating all possible unit cells with different symmetries (BCC, FCC, and HCP) and different number of atoms.
10.60732/7b56ca82
Al, Ni, Ti
Konstantin Gubaev, Evgeny V. Podryabinkin, Gus L.W. Hart, Alexander V. Shapeev
2666
24851
3
DFT-undefined
VASP
117
Configurations of nitrophenol from WS22. The WS22 database combines Wigner sampling with geometry interpolation to generate 1.18 million molecular geometries equally distributed into 10 independent datasets of flexible organic molecules with varying sizes and chemical complexity. In addition to the potential energy and forces required to construct potential energy surfaces, the WS22 database provides several other quantum chemical properties, all obtained via single-point calculations for each molecular geometry. All quantum chemical calculations were performed with the Gaussian 09 program.
10.60732/d91bf8fd
C, H, N, O
Max Pinheiro Jr, Shuang Zhang, Pavlo O. Dral, Mario Barbatti
119995
1799925
4
DFT-PBE0
Gaussian 09
116
The aimd-from-PBE-1000-nvt training split of OMat24 (Open Materials 2024). OMat24 is a large-scale open dataset of density functional theory (DFT) calculations. The dataset is available in sub-datasets and subsampled sub-datasets based on the structure generation strategy used. There are two main splits in OMat24: train and validation, each divided into the aforementioned subsampling and sub-datasets.
10.60732/de0e0690
Ac, Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr
Luis Barroso-Luque, Muhammed Shuaibi, Xiang Fu, Brandon M. Wood, Misko Dzamba, Meng Gao, Ammar Rizvi, C. Lawrence Zitnick, Zachary W. Ulissi
20256650
169879539
86
DFT-PBE+U
VASP
115
A comprehensive DFT data set was generated for six elements - Li, Mo, Ni, Cu, Si, and Ge. These elements were chosen to span a variety of chemistries (main group metal, transition metal, and semiconductor), crystal structures (bcc, fcc, and diamond) and bonding types (metallic and covalent). This dataset comprises only the Si configurations
10.60732/e16c3975
Si
Yunxing Zuo, Chi Chen, Xiangguo Li, Zhi Deng, Yiming Chen, Jörg Behler, Gábor Csányi, Alexander V. Shapeev, Aidan P. Thompson, Mitchell A. Wood, Shyue Ping Ong
25
1525
1
DFT-PBE
VASP
113
ANI-2x-wB97MD3BJ-def2TZVPP is a portion of the ANI-2x dataset, which includes DFT-calculated energies for structures from 2 to 63 atoms in size containing H, C, N, O, S, F, and Cl. This portion of ANI-2x was calculated in ORCA at the wB97M level of theory with D3 and BJ energy corrections, using the def2-TZVPP basis set. Configuration sets are divided by number of atoms per structure. Uncorrected SCF energy values and dipoles are recorded in the metadata.
10.60732/5bd01ed9
C, Cl, F, H, N, O, S
Kate Huddleston, Roman Zubatyuk, Justin Smith, Adrian Roitberg, Olexandr Isayev, Ignacio Pickering, Christian Devereux, Kipton Barros
9649788
146703426
7
DFT-ωB97M+D3(BJ)
ORCA 4.2.1
113
This dataset was created to investigate the role of surface water and hydroxyl groups in facilitating spontaneous CO₂ activation at Cu⁺ sites and the formation of monodentate formate species in the context of using CO2 hydrogenation to produce methanol.
10.60732/cea60472
C, Cu, H, Mg, O
Estefanía Fernández Villanueva, Pablo Germán Lustemberg, Minjie Zhao, Jose Soriano, Patricia Concepción, María Verónica Ganduglia Pirovano
14955
1043206
5
DFT-PBE+D3
VASP 6.3.0
112
Approximately 7,600 configurations of Ag used as part of a training dataset for a DP-GEN-based ML model for a Ag-Au nanoalloy potential.
10.60732/93adbc95
Ag
Yinan Wang, Xiaoyang Wang, Linfeng Zhang, Ben Xu, Han Wang
7589
152114
1
DFT-PBE+D3
VASP
111
Configurations of Au from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K.
10.60732/7c0dfbc8
Au
Christopher M. Andolina, Wissam A. Saidi
3585
89006
1
DFT-PBE
VASP
111
MSR-ACC/TAE25 (Microsoft Research Accurate Chemistry Collection, Total Atomization Energies 2025) provides 73,040 total atomization energies (TAEs) at the CCSD(T)/CBS level obtained with the W1-F12 composite wavefunction protocol implemented in Molpro 2024.1. This is the canonical validation split comprising 730 molecules (1% of molecules remaining after removing overlap with the W4-17 and GMTKN55 benchmark sets).The dataset covers the chemical space of closed-shell, charge-neutral, covalently bound equilibrium molecular structures containing up to 5 non-hydrogen atoms drawn from elements H through Ar, excluding rare gases. Molecular structures were generated by exhaustive graph enumeration and degree-sequence sampling, then optimized through a cascade of GFN2-xTB, r2SCAN-3c, and B3LYP-D3(BJ)/def2-TZVPP levels of theory (ORCA). Structures were filtered to exclude those with significant multireference character (%TAE[(T)] > 6% at CCSD(T)/6-31G*), triplet electronic ground states, or dissociated fragments. The W1-F12 protocol includes Hartree-Fock extrapolation to the complete basis set limit (cc-pVDZ-F12 and cc-pVTZ-F12, alpha=5), CCSD-F12b correlation, perturbative triples delta(T) using jul-cc-pV(D+d)Z and jul-cc-pV(T+d)Z basis sets (alpha=3.22), and a core-valence correction using cc-pwCVTZ. The dataset spans 45.1% organic and 54.9% inorganic molecules and provides broader chemical diversity than comparable datasets such as GDB-9 or VQM24/DMC. Additional data available in the source files, including DFT atomization energies at approximately 90 levels of theory, singlet-triplet gaps, %TAE[(T)] multireference diagnostics, and W1-F12 energy components, can be downloaded from ColabFit Exchange.
Al, B, Be, C, Cl, F, H, Li, Mg, N, Na, O, P, S, Si
Sebastian Ehlert, Jan Hermann, Thijs Vogels, Victor Garcia Satorras, Stephanie Lanius, Marwin Segler, Klaas J.H. Giesbertz, Derk P. Kooi, Kenji Takeda, Chin-Wei Huang, Giulia Luise, Rianne van den Berg, Paola Gori-Giorgi, Amir Karton
730
5444
15
W1-F12/CCSD(T)-CBS
Molpro 2024.1
110
OC20_S2EF_train_200K is the 200K training split of the OC20 Structure to Energy and Forces (S2EF) task.
10.60732/6ccdeb1d
Ag, Al, As, Au, B, Bi, C, Ca, Cd, Cl, Co, Cr, Cs, Cu, Fe, Ga, Ge, H, Hf, Hg, In, Ir, K, Mn, Mo, N, Na, Nb, Ni, O, Os, P, Pb, Pd, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Tc, Te, Ti, Tl, V, W, Y, Zn, Zr
Lowik Chanussot, Abhishek Das, Siddharth Goyal, Thibaut Lavril, Muhammed Shuaibi, Morgane Riviere, Kevin Tran, Javier Heras-Domingo, Caleb Ho, Weihua Hu, Aini Palizhati, Anuroop Sriram, Brandon Wood, Junwoong Yoon, Devi Parikh, C. Lawrence Zitnick, Zachary Ulissi
200000
14631937
56
DFT-rPBE
VASP
109
Configurations of Sr from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K.
10.60732/69de7b6b
Sr
Christopher M. Andolina, Wissam A. Saidi
3037
48387
1
DFT-PBE
VASP
109
The Acetaldehyde (triplet) set of the QM-22 datasets, with energies calculated at the CCSD(T) level of theory. QM-22 consists of CHON molecules of 4-15 atoms, developed in counterpoint to the MD17 dataset, run at higher total energies (above 500 K) and with a broader configuration space.
10.60732/d66c9888
C, H, O
Bina Fu, Yong-Chang Han, Joel M. Bowman, Luca Angelucci, Nadia Balucani, Francesca Leonori, Piergiorgio Casavecchia
51530
360710
3
CCSD(T)
MOLPRO
108
ANI-2x-wB97X-def2TZVPP is a portion of the ANI-2x dataset, which includes DFT-calculated energies for structures from 2 to 63 atoms in size containing H, C, N, O, S, F, and Cl. This portion of ANI-2x was calculated in ORCA at the wB97X level of theory using the def2TZVPP basis set. Configuration sets are divided by number of atoms per structure. Dipoles are recorded in the metadata.
10.60732/61569e2c
C, Cl, F, H, N, O, S
Christian Devereux, Justin S. Smith, Kate K. Huddleston, Kipton Barros, Roman Zubatyuk, Olexandr Isayev, Adrian E. Roitberg
8481522
127828812
7
DFT-ωB97X
ORCA 4.2.1
107
A comprehensive DFT data set was generated for six elements - Li, Mo, Ni, Cu, Si, and Ge. These elements were chosen to span a variety of chemistries (main group metal, transition metal, and semiconductor), crystal structures (bcc, fcc, and diamond) and bonding types (metallic and covalent). This dataset comprises only the Cu configurations
10.60732/7c69274d
Cu
Yunxing Zuo, Chi Chen, Xiangguo Li, Zhi Deng, Yiming Chen, Jörg Behler, Gábor Csányi, Alexander V. Shapeev, Aidan P. Thompson, Mitchell A. Wood, Shyue Ping Ong
31
3178
1
DFT-PBE
VASP
107
Configurations of Cu from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K.
10.60732/e0a72dd8
Cu
Christopher M. Andolina, Wissam A. Saidi
3355
96328
1
DFT-PBE
VASP
107
Approximately 46,000 configurations of copper, including small and bulk structures, surfaces, interfaces, point defects, and randomly modified variants. Also includes structures with displaced or missing atoms.
10.60732/c712b78a
Cu
Yury Lysogorskiy, Cas van der Oord, Anton Bochkarev, Sarath Menon, Matteo Rinaldi, Thomas Hammerschmidt, Matous Mrovec, Aidan Thompson, Gábor Csányi, Christoph Ortner, Ralf Drautz
46327
307430
1
DFT-PBE
FHI-aims
107
Configurations of Pb from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K.
10.60732/bddd3245
Pb
Christopher M. Andolina, Wissam A. Saidi
5254
117186
1
DFT-PBE
VASP
106
129 molecules of composition C7O2H10 from the QM9 dataset with 5000 conformational geometries apiece. Molecular dynamics data was simulated using the Fritz-Haber Institute ab initio simulation software.
10.60732/ad0a0039
C, H, O
Jonathan Vandermause, Yu Xie, Jin Soo Lim, Cameron J. Owen, Boris Kozinsky
640791
12175029
3
DFT-PBE+TS
FHI-aims
105
AIMNet2(2025) is the extended training dataset for the AIMNet2 (second generation atoms-in-molecules network) neural network interatomic potential, curated to improve the model's description of noncovalent interactions (NCIs) including hydrogen bonding, pi-pi stacking, dispersion, sigma-hole, ionic, and electrostatic contacts. The dataset covers neutral and charged closed-shell molecular systems composed of up to 14 non-metal elements (H, B, C, N, O, F, Si, P, S, Cl, As, Se, Br, I) with up to 193 atoms per system. Structures were drawn from three complementary sources: (a) molecular geometries from SPICE v2.0.1 (solvated systems, amino acid-ligand pairs, water clusters) and the CREMP dataset (macrocyclic peptides); (b) small neutral and charged molecules from PubChem sampled via normal mode sampling and metadynamics-guided geometry exploration; (c) dimer geometries assembled from Cambridge Structural Database (CSD) monomers (up to 14 supported elements, fewer than 200 atoms) and pre-optimized with AIMNet2-wB97M-D3(2023) to remove steric clashes while preserving configurational diversity. All quantum chemical calculations used ORCA 6.0.1 with the composite B97-3c DFT functional under restricted Kohn-Sham (RKS) formalism. SCF convergence was enforced with TightSCF and SlowConv; RIJCOSX integral acceleration and DEFGRID2 integration grid were applied throughout. AIMNet2(2025) was initialized from AIMNet2(2023) weights and continually pretrained on this dataset without weight freezing or regularization, using a multi-task loss over energy (w=1.0), forces (w=0.2), and Hirshfeld partial charges (w=0.5).
As, B, Br, C, Cl, F, H, I, N, O, P, S, Se, Si
Kamal Singh Nayal, Ilkwon Cho, Runtian Nick Gao, Peikun Zheng, Olexandr Isayev
3764666
130288462
14
DFT-B97-3c
ORCA 6.0.1
105
The H2CO/HCOH set of the QM-22 datasets, representing the isomerization of formaldehyde to cis and trans-hydroxycarbene, with energies calculated at the CCSD(T) level of theory. QM-22 consists of CHON molecules of 4-15 atoms, developed in counterpoint to the MD17 dataset, run at higher total energies (above 500 K) and with a broader configuration space.
10.60732/c04a4e90
C, H, O
Chen Qu, Qi Yu, Brian L. Van Hoozen Jr, Joel M. Bowman, Rodrigo A. Vargas-Hernández
34750
139000
3
MRCI
MOLPRO
103
The Hydronium set of the QM-22 datasets, with energies calculated at the CCSD(T)/MRCI level of theory. QM-22 consists of CHON molecules of 4-15 atoms, developed in counterpoint to the MD17 dataset, run at higher total energies (above 500 K) and with a broader configuration space.
10.60732/cd74ffdf
H, O
Chen Qu, Qi Yu, Brian L. Van Hoozen Jr, Joel M. Bowman, Rodrigo A. Vargas-Hernández
32141
128564
2
CCSD(T), MRCI
MOLPRO
103
The train split of the dataset Alex_MP-20. This dataset contains structures from the Alexandria (Schmidt et al. 2022) and MP-20 (Materials Project 2020) datasets. Data has been modified as follows: Exclude structures containing the elements Tc, Pm, or any element with atomic number 84 or higher. Relax structures with DFT using a PBE functional in order to have consistent energies. For the training set, remove any structure with more than 20 atoms inside the unit cell. For the training set, remove any structure with energy above the hull higher than 0.1 eV/atom.
10.60732/8d6afc67
Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, Hf, Hg, Ho, I, In, Ir, K, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, O, Os, P, Pb, Pd, Pr, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Te, Ti, Tl, Tm, V, W, Y, Yb, Zn, Zr
Claudio Zeni, Robert Pinsler, Daniel Zügner, Andrew Fowler, Matthew Horton, Xiang Fu, Zilong Wang, Aliaksandra Shysheya, Jonathan Crabbé, Shoko Ueda, Roberto Sordillo, Lixin Sun, Jake Smith, Bichlien Nguyen, Hannes Schulz, Sarah Lewis, Chin-Wei Huang, Ziheng Lu, Yichi Zhou, Han Yang, Hongxia Hao, Jielan Li, Chunlei Yang, Wenjie Li, Ryota Tomioka, Tian Xie
540162
5184565
76
DFT-PBE
VASP
103
Configurations of Ti from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K.
10.60732/f08eba7c
Ti
Christopher M. Andolina, Wissam A. Saidi
5436
148209
1
DFT-PBE
VASP
103
The dataset consists of energies and forces for monolayer graphene, bilayer graphene, graphite, and diamond in various states, including strained static structures and configurations drawn from ab initio MD trajectories. A total number of 4788 configurations was generated from DFT calculations using the Vienna Ab initio Simulation Package (VASP). The energies and forces are stored in the extended XYZ format. One file for each configuration.
10.60732/e65112ef
C
Mingjian Wen, Ellad B. Tadmor
4769
228396
1
DFT-PBE+MBD
VASP
102
The JARVIS-C2DB dataset is part of the joint automated repository for various integrated simulations (JARVIS) database. This subset contains configurations from the Computational 2D Database (C2DB), which contains a variety of properties for 2-dimensional materials across more than 30 differentcrystal structures. JARVIS is a set of tools and datasets built to meet current materials design challenges.
10.60732/37c26dae
Ag, Al, As, Au, B, Ba, Bi, Br, C, Ca, Cd, Cl, Co, Cr, Cs, Cu, F, Fe, Ga, Ge, H, Hf, Hg, I, In, Ir, K, Li, Mg, Mn, Mo, N, Na, Nb, Ni, O, Os, P, Pb, Pd, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Te, Ti, Tl, V, W, Y, Zn, Zr
Sten Haastrup, Mikkel Strange, Mohnish Pandey, Thorsten Deilmann, Per S Schmidt, Nicki F Hinsche, Morten N Gjerding, Daniele Torelli, Peter M Larsen, Anders C Riis-Jensen, Jakob Gath, Karsten W Jacobsen, Jens Jørgen Mortensen, Thomas Olsen, Kristian S Thygesen
3520
17990
61
DFT-PBE
GPAW
102
Training set for magnetic Moment Tensor Potentials (mMTPs) for the bcc Fe-Al system. Contains 2012 configurations of 16-atom Fe-Al supercells with collinear atomic magnetic moments. Configurations were generated using constrained DFT (cDFT) with ABINIT and PAW PBE pseudopotentials with a 6x6x6 k-point mesh and 25 Hartree plane-wave cutoff energy. The fitted mMTPs (with 2 magnetic basis functions) predict formation energy, lattice parameters, and total magnetic moments of bcc Fe-Al at 0 K across varying Al concentrations. Note: ColabFit dataset contains energy, atomic forces, and stress. Refer to the original files for per-atom magnetic moment data.
Al, Fe
Alexey S. Kotykhov, Konstantin Gubaev, Max Hodapp, Christian Tantardini, Alexander V. Shapeev, Ivan S. Novikov
434
32192
2
DFT-PBE
ABINIT
101
The Malonaldehyde set of the QM-22 datasets, with energies calculated at the CCSD(T) level of theory. QM-22 consists of CHON molecules of 4-15 atoms, developed in counterpoint to the MD17 dataset, run at higher total energies (above 500 K) and with a broader configuration space.
10.60732/e77ca63e
C, H, O
Yimin Wang, Bastiaan J. Braams, Joel M. Bowman, Stuart Carter, David P. Tew
11145
100305
3
CCSD(T)
MOLPRO
101
127,000 configurations from a dataset used to benchmark and train a modified DeePMD model called DeepPot-SE, or Deep Potential - Smooth Edition
10.60732/d5518670
Al, C, Co, Cr, Cu, Fe, Ge, H, Mn, Mo, N, Ni, O, Pt, S, Si, Ti
Linfeng Zhang, Jiequn Han, Han Wang, Wissam A. Saidi, Roberto Car, Weinan E
126631
26210897
17
DFT-PBE
CP2K, Quantum ESPRESSO
101
The Ethanol set of the QM-22 datasets. QM-22 consists of CHON molecules of 4-15 atoms, developed in counterpoint to the MD17 dataset, run at higher total energies (above 500 K) and with a broader configuration space.
10.60732/b52743ef
C, H, O
Joel M. Bowman, Chen Qu, Riccardo Conte, Apurba Nandi, Paul L. Houston, Qi Yu
11011
99099
3
DFT-B3LYP
Gaussian 16
100
Validation configurations of bulk water from HO_LiMoNiTi_NPJCM_2020 used in the training of an ANN, whereby total energy is extrapolated by a Taylor expansion as a means of reducing computational costs.
10.60732/142b62c8
H, O
April M. Cooper, Johannes Kästner, Alexander Urban, Nongnuch Artrith
2112
405504
2
DFT-revPBE+D3
VASP
100
ANI-1xBB is a dataset of approximately 13.1 million nonequilibrium conformers of small organic molecules (H, C, N, O only; up to 7 heavy atoms; up to 23 atoms total), designed to support the training of reactive machine learning interatomic potentials. Single-point quantum chemistry properties were computed at three electronic temperatures (T_el = 0, 1000, and 5000 K) using B97-3c composite DFT in ORCA 4.2.1 via finite-temperature DFT (Fermi smearing). All geometries were treated as closed-shell (charge = 0, mult = 1); Fermi smearing at T_el = 5000 K approximates the superposition of closed- and open-shell states during bond dissociation and is the primary labeling scheme used for model training in the associated publication. This dataset contains the T_el = 5000 K (b973c_etemp5000) energies and forces; data at T_el = 0 K and 1000 K are available in the original source files. Configuration sets represent: constrained geometry optimization steps (snap_source='opt', ~9% of data) and fixed-distance NVT MD snapshots (snap_source='md', ~91% of data).
C, H, N, O
Shuhao Zhang, Roman Zubatyuk, Yinuo Yang, Adrian Roitberg, Olexandr Isayev
13144877
184872744
4
DFT-B97-3c
ORCA 4.2.1
100
Training and testing configurations of bulk water from HO_LiMoNiTi_NPJCM_2020 used in the training of an ANN, whereby total energy is extrapolated by a Taylor expansion as a means of reducing computational costs.
10.60732/7f3ffd0b
H, O
April M. Cooper, Johannes Kästner, Alexander Urban, Nongnuch Artrith
700
134400
2
DFT-revPBE+D3
VASP
99
158,000 diverse atomic environments of elemental tungsten.Includes DFT-PBE energies, forces and stresses for tungsten; periodic unit cells in the range of 1-135 atoms, including bcc primitive cell, 128-atom bcc cell, vacancies, low index surfaces, gamma-surfaces, and dislocation cores.
10.60732/8d093f34
W
Wojciech J. Szlachta, Albert P. Bartók, Gábor Csányi
9471
158304
1
DFT-PBE
CASTEP 6.01
99
COMP6v2-wB97MD3BJ-def2TZVPP is the portion of COMP6v2 calculated at the wB97MD3BJ/def2TZVPP level of theory. COmprehensive Machine-learning Potential (COMP6) Benchmark Suite version 2.0 is an extension of the COMP6 benchmark found in the following repository: https://github.com/isayev/COMP6. COMP6v2 is a data set of density functional properties for molecules containing H, C, N, O, S, F, and Cl. It is available at the following levels of theory: wB97X/631Gd (data used to train model in the ANI-2x paper); wB97MD3BJ/def2TZVPP; wB97MV/def2TZVPP; B973c/def2mTZVP. The 6 subsets from COMP6 (ANI-MD, DrugBank, GDB07to09, GDB10to13 Tripeptides, and s66x8) are contained in each of the COMP6v2 datasets corresponding to the above levels of theory.
10.60732/19db27ec
C, Cl, F, H, N, O, S
Kate Huddleston, Roman Zubatyuk, Justin Smith, Adrian Roitberg, Olexandr Isayev, Ignacio Pickering, Christian Devereux, Kipton Barros
156353
3787055
7
DFT-ωB97M-V
ORCA 4.2.1
98
Configurations of acrolein from WS22. The WS22 database combines Wigner sampling with geometry interpolation to generate 1.18 million molecular geometries equally distributed into 10 independent datasets of flexible organic molecules with varying sizes and chemical complexity. In addition to the potential energy and forces required to construct potential energy surfaces, the WS22 database provides several other quantum chemical properties, all obtained via single-point calculations for each molecular geometry. All quantum chemical calculations were performed with the Gaussian 09 program.
10.60732/0f9d02a8
C, H, O
Max Pinheiro Jr, Shuang Zhang, Pavlo O. Dral, Mario Barbatti
119993
959944
3
DFT-PBE0
Gaussian 09
96
The N-methyl acetamide set of the QM-22 datasets. QM-22 consists of CHON molecules of 4-15 atoms, developed in counterpoint to the MD17 dataset, run at higher total energies (above 500 K) and with a broader configuration space.
10.60732/89997b6f
C, H, N, O
Apurba Nandi, Chen Qu, Joel M. Bowman
6607
79284
4
DFT-B3LYP
MOLPRO
95
The test set of a train/test pair from the aspirin dataset from sGDML. To create the coupled cluster datasets, the data used for training the models were created by running abinitio MD in the NVT ensemble using the Nosé-Hoover thermostat at 500 K during a 200 ps simulation with a resolution of 0.5 fs. Energies and forces were recalculated by all-electron coupled cluster with single, double and perturbative triple excitations (CCSD(T)). The Dunning correlation-consistent basis set CCSD/cc-pVDZ was used for aspirin. All calculations were performed with the Psi4 software suite.
10.60732/083a6253
C, H, O
Stefan Chmiela, Huziel E. Sauceda, Klaus-Robert Müller, Alexandre Tkatchenko
500
10500
3
CCSD
Psi4
95
A comprehensive DFT data set was generated for six elements - Li, Mo, Ni, Cu, Si, and Ge. These elements were chosen to span a variety of chemistries (main group metal, transition metal, and semiconductor), crystal structures (bcc, fcc, and diamond) and bonding types (metallic and covalent). This dataset comprises only the Si configurations
10.60732/c2471ffc
Si
Yunxing Zuo, Chi Chen, Xiangguo Li, Zhi Deng, Yiming Chen, Jörg Behler, Gábor Csányi, Alexander V. Shapeev, Aidan P. Thompson, Mitchell A. Wood, Shyue Ping Ong
214
13233
1
DFT-PBE
VASP
94
The JARVIS-QM9-DGL dataset is part of the joint automated repository for various integrated simulations (JARVIS) database. This dataset contains configurations from the QM9 dataset, originally created as part of the datasets at quantum-machine.org, as implemented with the Deep Graph Library (DGL) Python package. Units for r2 (electronic spatial extent) are a0^2; for alpha (isotropic polarizability), a0^3; for mu (dipole moment), D; for Cv (heat capacity), cal/mol K. Units for all other properties are eV. JARVIS is a set of tools and collected datasets built to meet current materials design challenges.
10.60732/403cd4f2
C, F, H, N, O
Raghunathan Ramakrishnan, Pavlo O. Dral, Matthias Rupp, O. Anatole von Lilienfeld
130831
2358210
5
DFT-B3LYP
Gaussian 09
93
The OCHCO cation set of the QM-22 datasets, with energies calculated at the CCSD(T) level of theory. QM-22 consists of CHON molecules of 4-15 atoms, developed in counterpoint to the MD17 dataset, run at higher total energies (above 500 K) and with a broader configuration space.
10.60732/8f92dba5
C, H, O
Chen Qu, Qi Yu, Brian L. Van Hoozen Jr, Joel M. Bowman, Rodrigo A. Vargas-Hernández
7800
39000
3
CCSD(T)
MOLPRO
92
This is the dataset from npj Comp. Mater 7, 12 (2021), 'Predicting stable crystalline compounds using chemical similarity'. Stable crystal structure compositions of up to 12 atoms were gathered from the Materials Project database. These structures were mutated by replacing all of a given element with a similar element (see publication for details).
10.60732/b9e7eedf
Ac, Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, Hf, Hg, Ho, I, In, Ir, K, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr
Hai-Chen Wang, Silvana Botti, Miguel A. L. Marques
219310
1711271
85
DFT-PBE
VASP
92
A comprehensive DFT data set was generated for six elements - Li, Mo, Ni, Cu, Si, and Ge. These elements were chosen to span a variety of chemistries (main group metal, transition metal, and semiconductor), crystal structures (bcc, fcc, and diamond) and bonding types (metallic and covalent). This dataset comprises only the Li configurations
10.60732/d8a6d50c
Li
Yunxing Zuo, Chi Chen, Xiangguo Li, Zhi Deng, Yiming Chen, Jörg Behler, Gábor Csányi, Alexander V. Shapeev, Aidan P. Thompson, Mitchell A. Wood, Shyue Ping Ong
29
1320
1
DFT-PBE
VASP
91
Training set from COLL. Consists of configurations taken from molecular collisions of different small organic molecules. Energies and forces for 140,000 random snapshots taken from these trajectories were recomputed with density functional theory (DFT). These calculations were performed with the revPBE functional and def2-TZVP basis, including D3 dispersion corrections
10.60732/f95867ef
C, H, O
Johannes Gasteiger, Shankari Giri, Johannes T. Margraf, Stephan Günnemann
119965
1225234
3
DFT-revPBE+D3
ORCA
91
This dataset was originally designed to fit a GAP model for the Mo-Nb-Ta-V-W quinary system that was used to study segregation and defects in the body-centered-cubic refractory high-entropy alloy MoNbTaVW.
10.60732/00dc545a
Mo, Nb, Ta, V, W
Jesper Byggmästar, Kai Nordlund, Flyura Djurabekova
2329
127913
5
DFT-PBE
VASP
91
The Methane set of the QM-22 datasets. QM-22 consists of CHON molecules of 4-15 atoms, developed in counterpoint to the MD17 dataset, run at higher total energies (above 500 K) and with a broader configuration space.
10.60732/ca55415d
C, H
Apurba Nandi, Chen Qu, Joel M. Bowman
9000
45000
2
DFT-B3LYP
MOLPRO
89
A comprehensive DFT data set was generated for six elements - Li, Mo, Ni, Cu, Si, and Ge. These elements were chosen to span a variety of chemistries (main group metal, transition metal, and semiconductor), crystal structures (bcc, fcc, and diamond) and bonding types (metallic and covalent). This dataset comprises only the Mo configurations
10.60732/3db3283a
Mo
Yunxing Zuo, Chi Chen, Xiangguo Li, Zhi Deng, Yiming Chen, Jörg Behler, Gábor Csányi, Alexander V. Shapeev, Aidan P. Thompson, Mitchell A. Wood, Shyue Ping Ong
23
1189
1
DFT-PBE
VASP
89
The JARVIS_OMDB dataset is part of the joint automated repository for various integrated simulations (JARVIS) database. This dataset contains configurations from the Organic Materials Database (OMDB): a dataset of 12,500 crystal materials for the purpose of training models for the prediction of properties for complex and lattice-periodic organic crystals with large numbers of atoms per unit cell. Dataset covers 69 space groups, 65 elements; averages 82 atoms per unit cell. This dataset also includes classical force-field inspired descriptors (CFID) for each configuration. JARVIS is a set of tools and collected datasets built to meet current materials design challenges.
10.60732/a375b3dc
Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Cl, Co, Cr, Cs, Cu, F, Fe, Ga, Ge, H, Hf, Hg, I, In, Ir, K, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Ni, O, Os, P, Pb, Pd, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Te, Ti, Tl, U, V, W, Y, Zn, Zr
Bart Olsthoorn, R. Matthias Geilhufe, Stanislav S. Borysov, Alexander V. Balatsky
12497
1061362
65
DFT-PBE
VASP
89
A comprehensive DFT data set was generated for six elements - Li, Mo, Ni, Cu, Si, and Ge. These elements were chosen to span a variety of chemistries (main group metal, transition metal, and semiconductor), crystal structures (bcc, fcc, and diamond) and bonding types (metallic and covalent). This dataset comprises only the Li configurations
10.60732/63ab9206
Li
Yunxing Zuo, Chi Chen, Xiangguo Li, Zhi Deng, Yiming Chen, Jörg Behler, Gábor Csányi, Alexander V. Shapeev, Aidan P. Thompson, Mitchell A. Wood, Shyue Ping Ong
241
11576
1
DFT-PBE
VASP
88
Dataset from "Surface segregation in high-entropy alloys from alchemical machine learning: dataset HEA25S". Includes 10000 bulk HEA structures (Dataset O), 2640 HEA surface slabs (Dataset A), together with 1000 bulk and 1000 surface slabs snapshots from the molecular dynamics (MD) runs (Datasets B and C), and 500 MD snapshots of the 25 elements Cantor-style alloy surface slabs. These splits, along with their respective train, test, and validation splits, are included as configuration sets.
10.60732/3c5c6e72
Ag, Au, Co, Cr, Cu, Fe, Hf, Ir, Lu, Mn, Mo, Nb, Ni, Pd, Pt, Rh, Ru, Sc, Ta, Ti, V, W, Y, Zn, Zr
Arslan Mazitov, Maximilian A. Springer, Nataliya Lopanitsyna, Guillaume Fraux, Sandip De, Michele Ceriotti
15004
633387
25
DFT-PBEsol
VASP
88
The validation set from OMol25. From the dataset creator: OMol25 represents the largest high quality molecular DFT dataset spanning biomolecules, metal complexes, electrolytes, and community datasets. OMol25 was generated at the ω B97M-V/def2-TZVPD level of theory.
10.60732/8baea040
Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, O, Os, P, Pb, Pd, Pm, Pr, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Ti, Tl, Tm, V, W, Xe, Y, Yb, Zn, Zr
Daniel S. Levine, Muhammed Shuaibi, Evan Walter Clark Spotte-Smith, Michael G. Taylor, Muhammad R. Hasyim, Kyle Michel, Ilyes Batatia, Gábor Csányi, Misko Dzamba, Peter Eastman, Nathan C. Frey, Xiang Fu, Vahe Gharakhanyan, Aditi S. Krishnapriyan, Joshua A. Rackers, Sanjeev Raja, Ammar Rizvi, Andrew S. Rosen, Zachary Ulissi, Santiago Vargas, C. Lawrence Zitnick, Samuel M. Blau, Brandon M. Wood
2762021
283298012
83
DFT-ωB97M-V
ORCA
87
Hessian QM9 is the first database of equilibrium configurations and numerical Hessian matrices, consisting of 41,645 molecules from the QM9 dataset at the wB97x/6-31G* level. Molecular Hessians were calculated in vacuum, as well as in water, tetrahydrofuran, and toluene using an implicit solvation model.
10.60732/e8c8e0eb
C, F, H, N, O
Nicholas J. Williams, Lara Kabalan, Ljiljana Stojanovic, Viktor Zólyomi, Edward O. Pyzer-Knapp
166580
3063848
5
DFT-ωB97X
NWChem
86
The test set from the doped CsPbI3 energetics dataset. This dataset was created to explore the effect of Cd and Pb substitutions on the structural stability of inorganic lead halide perovskite CsPbI3. CsPbI3 undergoes a direct to indirect band-gap phase transition at room temperature. The dataset contains configurations of CsPbI3 with low levels of Cd and Zn, which were used to train a GNN model to predict the energetics of structures with higher levels of substitutions.
10.60732/e2e38c83
Cd, Cs, I, Pb, Zn
Roman A. Eremin, Innokentiy S. Humonen, Alexey A. Kazakov, Vladimir D. Lazarev, Anatoly P. Pushkarev, Semen A. Budennyy
60
9600
5
DFT-PBE
VASP
86
The full trajectories from the VASP runs used to generate the 23-Single-Element-DNPs training sets. Configuration sets are available for each element.
10.60732/a4e0fea6
Ag, Al, Au, Co, Cu, Ge, I, Kr, Li, Mg, Mo, Nb, Ni, Os, Pb, Pd, Pt, Re, Sb, Sr, Ti, Zn, Zr
Christopher M. Andolina, Wissam A. Saidi
108644
2352424
23
DFT-PBE
Quantum ESPRESSO
86
Test configurations with fixed value for dihedral beta in alpha-gamma plane of 120 degreesfrom 3BPA dataset. Used to showcase the performance of linear atomic cluster expansion (ACE) force fields in a machine learning model to predict the potential energy surfaces of organic molecules.
10.60732/09d00e4e
C, H, N, O
Dávid Péter Kovács, Cas van der Oord, Jiri Kucera, Alice E. A. Allen, Daniel J. Cole, Christoph Ortner, Gábor Csányi
2347
63369
4
DFT-ωB97X
ORCA
86
A comprehensive DFT data set was generated for six elements - Li, Mo, Ni, Cu, Si, and Ge. These elements were chosen to span a variety of chemistries (main group metal, transition metal, and semiconductor), crystal structures (bcc, fcc, and diamond) and bonding types (metallic and covalent). This dataset comprises only the Ni configurations
10.60732/9a25df21
Ni
Yunxing Zuo, Chi Chen, Xiangguo Li, Zhi Deng, Yiming Chen, Jörg Behler, Gábor Csányi, Alexander V. Shapeev, Aidan P. Thompson, Mitchell A. Wood, Shyue Ping Ong
263
27420
1
DFT-PBE
VASP
85
A comprehensive DFT data set was generated for six elements - Li, Mo, Ni, Cu, Si, and Ge. These elements were chosen to span a variety of chemistries (main group metal, transition metal, and semiconductor), crystal structures (bcc, fcc, and diamond) and bonding types (metallic and covalent). This dataset comprises only the Ge configurations
10.60732/1a1e4a52
Ge
Yunxing Zuo, Chi Chen, Xiangguo Li, Zhi Deng, Yiming Chen, Jörg Behler, Gábor Csányi, Alexander V. Shapeev, Aidan P. Thompson, Mitchell A. Wood, Shyue Ping Ong
25
1568
1
DFT-PBE
VASP
84
A comprehensive DFT data set was generated for six elements - Li, Mo, Ni, Cu, Si, and Ge. These elements were chosen to span a variety of chemistries (main group metal, transition metal, and semiconductor), crystal structures (bcc, fcc, and diamond) and bonding types (metallic and covalent). This dataset comprises only the Mo configurations
10.60732/3827e5e1
Mo
Yunxing Zuo, Chi Chen, Xiangguo Li, Zhi Deng, Yiming Chen, Jörg Behler, Gábor Csányi, Alexander V. Shapeev, Aidan P. Thompson, Mitchell A. Wood, Shyue Ping Ong
194
10087
1
DFT-PBE
VASP
84
The JARVIS_Materials_Project_2020 dataset is part of the joint automated repository for various integrated simulations (JARVIS) DFT database. This subset contains 127,000 configurations of 3D materials from the Materials Project database. JARVIS is a set of tools and datasets built to meet current materials design challenges.
10.60732/8122ca50
Ac, Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr
Anubhav Jain, Shyue Ping Ong, Geoffroy Hautier, Wei Chen, William Davidson Richards, Stephen Dacek, Shreyas Cholia, Dan Gunter, David Skinner, Gerbrand Ceder, Kristin A. Persson
126335
3725727
89
DFT-undefined
VASP
84
Dataset containing MD trajectories of AT-AT DNA base pairs from the MD22 benchmark set. {DESC}
10.60732/3e801453
C, H, N, O
Stefan Chmiela, Valentin Vassilev-Galindo, Oliver T. Unke, Adil Kabylda, Huziel E. Sauceda, Alexandre Tkatchenko, Klaus-Robert Müller
19999
1199940
4
DFT-PBE+MBE
FHI-aims
84
Configurations of Os from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K.
10.60732/1ec0df98
Os
Christopher M. Andolina, Wissam A. Saidi
4624
114840
1
DFT-PBE
VASP
83
Out-of-domain validation configurations for the structure to total energy and forces (S2EF) task of OC22. Open Catalyst 2022 (OC22) is a database of training trajectories for predicting catalytic reactions on oxide surfaces meant to complement OC20, which did not contain oxide surfaces.
10.60732/71142b0d
Au, Ba, Be, Bi, C, Ca, Cd, Ce, Co, Cr, Cs, Cu, Fe, Ga, Ge, H, Hf, Hg, In, Ir, K, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Ni, O, Os, Pb, Pd, Pt, Rb, Re, Rh, Ru, Sb, Sc, Se, Si, Sn, Sr, Ta, Ti, Tl, V, W, Zn, Zr
Richard Tran, Janice Lan, Muhammed Shuaibi, Brandon M. Wood, Siddharth Goyal, Abhishek Das, Javier Heras-Domingo, Adeesh Kolluru, Ammar Rizvi, Nima Shoghi, Anuroop Sriram, Felix Therrien, Jehad Abed, Oleksandr Voznyy, Edward H. Sargent, Zachary Ulissi, C. Lawrence Zitnick
457249
36937329
52
DFT-PBE+U
VASP
82
A comprehensive DFT data set was generated for six elements - Li, Mo, Ni, Cu, Si, and Ge. These elements were chosen to span a variety of chemistries (main group metal, transition metal, and semiconductor), crystal structures (bcc, fcc, and diamond) and bonding types (metallic and covalent). This dataset comprises only the Ni configurations
10.60732/ef83b761
Ni
Yunxing Zuo, Chi Chen, Xiangguo Li, Zhi Deng, Yiming Chen, Jörg Behler, Gábor Csányi, Alexander V. Shapeev, Aidan P. Thompson, Mitchell A. Wood, Shyue Ping Ong
31
3158
1
DFT-PBE
VASP
81
This data set was originally used to generate a linear SNAP potential for solid and liquid tantalum as published in Thompson, A.P. et. al, J. Comp. Phys. 285 (2015) 316-330.
10.60732/da9afef7
Ta
Aidan P. Thompson, Laura P. Swiler, Christian R. Trott, Stephen M. Foiles, Garritt J. Tucker
363
4224
1
DFT-PBE
VASP
80
Dataset containing MD trajectories of the double-walled nanotube supramolecule from the MD22 benchmark set. MD22 represents a collection of datasets in a benchmark that can be considered an updated version of the MD17 benchmark datasets, including more challenges with respect to system size, flexibility and degree of non-locality. The datasets in MD22 include MD trajectories of the protein Ac-Ala3-NHMe; the lipid DHA (docosahexaenoic acid); the carbohydrate stachyose; nucleic acids AT-AT and AT-AT-CG-CG; and the buckyball catcher and double-walled nanotube supramolecules. Each of these is included here in a separate dataset, as represented on sgdml.org. Calculations were performed using FHI-aims and i-Pi software at the DFT-PBE+MBD level of theory. Trajectories were sampled at temperatures between 400-500 K at 1 fs resolution.
10.60732/fce214af
C, H
Stefan Chmiela, Valentin Vassilev-Galindo, Oliver T. Unke, Adil Kabylda, Huziel E. Sauceda, Alexandre Tkatchenko, Klaus-Robert Müller
5032
1861840
2
DFT-PBE+MBE
FHI-aims
80
A subset of the MAD-1.5 (Massive Atomic Diversity version 1.5) structures recomputed with the PBE GGA functional, covering the MAD-1 subsets (MC3D, MC3D-rattled, MC3D-random, MC3D-surface, MC3D-cluster, MC2D, SHIFTML-molcrys, SHIFTML-molfrags) plus monomers and MC3D-random-extended from the new MAD-1.5 subsets. All DFT settings are consistent with the r2SCAN calculations: FHI-aims (version 250806) all-electron code with tight NAO basis sets (species defaults 2020), 8 Angstrom^-1 k-point density for periodic systems, Gaussian smearing of 0.05 eV, and SCF convergence thresholds of 1e-6 eV (energy), 1e-4 eV/Angstrom (forces), and 1e-5 e*a0^-3 (electron density). Cross-validation splits are consistent with the r2SCAN train/val/test splits; this file contains all three splits combined. PBE targets were used in PET-MAD-1.5 model training with separate prediction heads alongside r2SCAN targets, improving force accuracy by approximately 25% relative to r2SCAN-only training. As a lower level of theory, this dataset is less carefully curated than the primary r2SCAN dataset; PBE heads are discarded from the final released models.
Ac, Ag, Al, Am, Ar, As, At, Au, B, Ba, Be, Bi, Bk, Br, C, Ca, Cd, Ce, Cf, Cl, Cm, Co, Cr, Cs, Cu, Dy, Er, Es, Eu, F, Fe, Fm, Fr, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Md, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, No, Np, O, Os, P, Pa, Pb, Pd, Pm, Po, Pr, Pt, Pu, Ra, Rb, Re, Rh, Rn, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr
Cesare Malosso, Filippo Bigi, Paolo Pegolo, Joseph W. Abbott, Philip Loche, Mariana Rossi, Michele Ceriotti, Arslan Mazitov
101493
2620250
102
DFT-PBE
FHI-aims v250806
80
The main training dataset for GST_GAP_22, calculated using the PBEsol functional. GST-GAP-22 contains configurations of phase-change materials on the quasi-binary GeTe-Sb2Te3 (GST) line of chemical compositions. Data was used for training a machine learning interatomic potential to simulate a range of germanium-antimony-tellurium compositions under realistic device conditions.
10.60732/f2d6e02c
Ge, Sb, Te
Yuxing Zhou, Wei Zhang, Evan Ma, Volker L. Deringer
2690
341004
3
DFT-PBEsol
CASTEP
79
Training configurations from the SAIT_semiconductors_ACS_2023_HfO dataset. This dataset contains HfO configurations from the SAIT semiconductors datasets. SAIT semiconductors datasets comprise two rich datasets for the important semiconductor thin film materials silicon nitride (SiN) and hafnium oxide (HfO), gathered for the development of MLFFs. DFT simulations were conducted under various conditions that include differing initial structures, stoichiometry, temperature, strain, and defects.
10.60732/495b736b
Hf, O
Geonu Kim, Byunggook Na, Gunhee Kim, Hyuntae Cho, Seung-Jin Kang, Hee Sun Lee, Saerom Choi, Heejae Kim, Seungwon Lee, Yongdeok Kim
27958
2683968
2
DFT-PBE
VASP
79
MatPES (Materials Potential Energy Surface) is a foundational PES dataset developed collaboratively by the Materials Virtual Lab and the Materials Project. The v2025.2 PBE release contains 433,189 structures sampled via the DIRECT method from 300 K NpT molecular dynamics simulations seeded from Materials Project entries. Static DFT calculations were performed using VASP with the PBE functional and MatPESStaticSet convergence settings optimized for energy, force, and stress calculations. v2025.2 removes a small number of duplicated structures present in v2025.1, and the original files add Bader charges and Bader magnetic moments per atom. The previous version of this dataset (MatPES-PBE-2025.1) is available from ColabFit. There is a companion dataset calculated with the r2SCAN functional (MatPES-R2SCAN-2025.2).
Ac, Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr
Aaron D. Kaplan, Runze Liu, Ji Qi, Tsz Wai Ko, Bowen Deng, Janosh Riebesell, Gerbrand Ceder, Kristin A. Persson, Shyue Ping Ong
433163
3867177
89
DFT-PBE
VASP 6.4.x
78
MSR-ACC/TAE25 (Microsoft Research Accurate Chemistry Collection, Total Atomization Energies 2025) provides 73,040 total atomization energies (TAEs) at the CCSD(T)/CBS level obtained with the W1-F12 composite wavefunction protocol implemented in Molpro 2024.1. This is the complete MSR-ACC/TAE25 dataset of 73,040 molecules, comprising all structures prior to partitioning into canonical train and validation splits. The dataset covers the chemical space of closed-shell, charge-neutral, covalently bound equilibrium molecular structures containing up to 5 non-hydrogen atoms drawn from elements H through Ar, excluding rare gases. Molecular structures were generated by exhaustive graph enumeration and degree-sequence sampling, then optimized through a cascade of GFN2-xTB, r2SCAN-3c, and B3LYP-D3(BJ)/def2-TZVPP levels of theory (ORCA). Structures were filtered to exclude those with significant multireference character (%TAE[(T)] > 6% at CCSD(T)/6-31G*), triplet electronic ground states, or dissociated fragments. The W1-F12 protocol includes Hartree-Fock extrapolation to the complete basis set limit (cc-pVDZ-F12 and cc-pVTZ-F12, alpha=5), CCSD-F12b correlation, perturbative triples delta(T) using jul-cc-pV(D+d)Z and jul-cc-pV(T+d)Z basis sets (alpha=3.22), and a core-valence correction using cc-pwCVTZ. The dataset spans 45.1% organic and 54.9% inorganic molecules and provides broader chemical diversity than comparable datasets such as GDB-9 or VQM24/DMC. Additional data available in the source files, including DFT atomization energies at approximately 90 levels of theory, singlet-triplet gaps, %TAE[(T)] multireference diagnostics, and W1-F12 energy components, can be downloaded from ColabFit Exchange. It includes molecules overlapping with the W4-17 and GMTKN55 benchmark sets that are excluded from the train and validation splits.
Al, B, Be, C, Cl, F, H, Li, Mg, N, Na, O, P, S, Si
Sebastian Ehlert, Jan Hermann, Thijs Vogels, Victor Garcia Satorras, Stephanie Lanius, Marwin Segler, Klaas J.H. Giesbertz, Derk P. Kooi, Kenji Takeda, Chin-Wei Huang, Giulia Luise, Rianne van den Berg, Paola Gori-Giorgi, Amir Karton
73040
540810
15
W1-F12/CCSD(T)-CBS
Molpro 2024.1
76
The test split of the Transition1x dataset. Transition1x is a benchmark dataset containing 9.6 million Density Functional Theory (DFT) calculations of forces and energies of molecular configurations on and around reaction pathways at the ωB97x/6-31 G(d) level of theory. The configurations contained in this dataset allow a better representation of features in transition state regions when compared to other benchmark datasets -- in particular QM9 and ANI1x.
10.60732/f26a9f60
C, H, N, O
Mathias Schreiner, Arghya Bhowmik, Tejs Vegge, Jonas Busk, Ole Winther
190261
2106595
4
DFT-ωB97X
ORCA 5.0.2
76
Test set from COLL. Consists of configurations taken from molecular collisions of different small organic molecules. Energies and forces for 140,000 random snapshots taken from these trajectories were recomputed with density functional theory (DFT). These calculations were performed with the revPBE functional and def2-TZVP basis, including D3 dispersion corrections
10.60732/7b135132
C, H, O
Johannes Gasteiger, Shankari Giri, Johannes T. Margraf, Stephan Günnemann
9480
97886
3
DFT-revPBE+D3
ORCA
76
This is the filtered validation split of ODAC25. Open Direct Air Capture 2025 (ODAC25) is the largest high-quality DFT dataset for Direct Air Capture, containing over 15,000 Metal-Organic Frameworks (MOFs), including experimental, defective, synthetic, and amine-functionalized MOFs, with 4 adsorbates: CO2, H2O, N2, and O2. ODAC25 significantly improves upon ODAC23 by adding functionalized MOFs, new adsorbates (N2 and O2), higher k-point convergence, and re-relaxations of empty MOFs. The dataset contains three partitions: (1) mof_plus_adsorbate includes full DFT relaxations of different adsorbates on various MOFs; (2) mof includes re-relaxations of empty MOFs; (3) gcmc includes DFT single points of configurations derived from Grand Canonical Monte Carlo (GCMC) simulations. MOFs deemed problematic by Jin et al. (2025) have been excluded (see https://zenodo.org/records/14802658).
Ag, Al, C, Cd, Cl, Co, Cu, Eu, F, Fe, Gd, H, I, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, O, P, Pr, S, Si, Sm, Sr, Tb, Y, Zn, Zr
Anuroop Sriram, Logan M. Brabson, Xiaohan Yu, Sihoon Choi, Kareem Abdelmaqsoud, Elias Moubarak, Pim de Haan, Sindy Löwe, Johann Brehmer, John R. Kitchin, Max Welling, C. Lawrence Zitnick, Zachary Ulissi, Andrew J. Medford, David S. Sholl
783702
171139115
32
DFT-PBE+D3
VASP 6.3
75
A dataset consisting of the energies of supercells containing from 1 to 250 atoms. The supercells represent energy-volume relations for 8 crystal structures of Ta, 5 uniform deformation paths between pairs of structures, vacancies, interstitials, surfaces with low-index orientations, 4 symmetrical tilt grain boundaries, γ-surfaces on the (110) and (211) fault planes, a [111] screw dislocation, liquid Ta, and several isolated clusters containing from 2 to 51 atoms. Some of the supercells contain static atomic configurations. However, most are snapshots of ab initio MD simulations at different densities, and temperatures ranging from 293 K to 3300 K. The BCC structure was sampled in the greatest detail, including a wide range of isotropic and uniaxial deformations.
10.60732/7f6cac29
Ta
Yi-Shen Lin, Ganga P. Purja Pun, Yuri Mishin
3191
135706
1
DFT-PBE
VASP
75
Configurations of Kr from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K.
10.60732/6a060a77
Kr
Christopher M. Andolina, Wissam A. Saidi
2875
95033
1
DFT-PBE
VASP
75
Approximately 7,400 configurations of titanium used for training a deep potential using the DeePMD-kit molecular dynamics package and DP-GEN training scheme.
10.60732/85e47ff3
Ti
Tongqi Wen, Rui Wang, Lingyu Zhu, Linfeng Zhang, Han Wang, David J. Srolovitz, Zhaoxuan Wu
7376
143792
1
DFT-PBE
VASP
75
Structures from the SAIT_semiconductors_ACS_2023_HfO dataset, separated into crystal, out-of-domain, and random (generated by randomly distributing 32 Hf and 64 O atoms within the unit cells of the HfO2 crystals) configuration sets. This dataset contains HfO configurations from the SAIT semiconductors datasets. SAIT semiconductors datasets comprise two rich datasets for the important semiconductor thin film materials silicon nitride (SiN) and hafnium oxide (HfO), gathered for the development of MLFFs. DFT simulations were conducted under various conditions that include differing initial structures, stoichiometry, temperature, strain, and defects.
10.60732/186c10bf
Hf, O
Geonu Kim, Byunggook Na, Gunhee Kim, Hyuntae Cho, Seung-Jin Kang, Hee Sun Lee, Saerom Choi, Heejae Kim, Seungwon Lee, Yongdeok Kim
191973
18429408
2
DFT-PBE
VASP
74
Test configurations with fixed value for dihedral beta in alpha-gamma plane of 180 degreesfrom 3BPA dataset. Used to showcase the performance of linear atomic cluster expansion (ACE) force fields in a machine learning model to predict the potential energy surfaces of organic molecules.
10.60732/e9fb7e4a
C, H, N, O
Dávid Péter Kovács, Cas van der Oord, Jiri Kucera, Alice E. A. Allen, Daniel J. Cole, Christoph Ortner, Gábor Csányi
2350
63450
4
DFT-ωB97X
ORCA
74
The training set from HME21. The high-temperature multi-element 2021 (HME21) dataset comprises approximately 25,000 configurations, including 37 elements, used in the training of a universal NNP called PreFerential Potential (PFP). The dataset specifically contains disordered and unstable structures, and structures that include irregular substitutions, as well as varied temperature and density.
10.60732/845cc1b5
Ag, Al, Au, Ba, C, Ca, Cl, Co, Cr, Cu, F, Fe, H, In, Ir, K, Li, Mg, Mn, Mo, N, Na, Ni, O, P, Pb, Pd, Pt, Rh, Ru, S, Sc, Si, Sn, Ti, V, Zn
So Takamoto, Chikashi Shinagawa, Daisuke Motoki, Kosuke Nakago, Wenwen Li, Iori Kurata, Taku Watanabe, Yoshihiro Yayama, Hiroki Iriguchi, Yusuke Asano, Tasuku Onodera, Takafumi Ishii, Takao Kudo, Hideki Ono, Ryohto Sawada, Ryuichiro Ishitani, Marc Ong, Taiki Yamaguchi, Toshiki Kataoka, Akihide Hayashi, Nontawat Charoenphakdee, Takeshi Ibuka
19954
554986
37
DFT-PBE
VASP 5.4.4
73
Dataset from "Stress-dependence of generalized stacking fault energies":DFT calculations of generalized stacking fault energies (GSFE) for Al, Cu, and Mg.
10.60732/861da1bc
Al, Cu, Mg
Binglun Yin, Predrag Andric, W. A. Curtin
272
3264
3
DFT-PBE
VASP
72
This dataset contains structures of Cu, including Cu(111), Cu(100), Cu(110), and Cu(211). Slab settings are as follows: 3 x 3, 6-layered slabs for Cu(111), (100), and (110) surfaces; 1 x 3, 6-layered slabs for Cu(211) surface. Includes some structures representing interation of H2 with one of the Cu surfaces and some structures of Cu sampled at different temperatures.
10.60732/d0801836
Cu, H
Wojciech G. Stark, Julia Westermayr, Oscar A. Douglas-Gallardo, James Gardner, Scott Habershon, Reinhard J. Maurer
3413
191104
2
DFT-SRP48
FHI-aims
72
The validation split of the Transition1x dataset. Transition1x is a benchmark dataset containing 9.6 million Density Functional Theory (DFT) calculations of forces and energies of molecular configurations on and around reaction pathways at the ωB97x/6-31 G(d) level of theory. The configurations contained in this dataset allow a better representation of features in transition state regions when compared to other benchmark datasets -- in particular QM9 and ANI1x.
10.60732/8e8402d6
C, H, N, O
Mathias Schreiner, Arghya Bhowmik, Tejs Vegge, Jonas Busk, Ole Winther
264972
3743153
4
DFT-ωB97X
ORCA 5.0.2
71
The validation split of the Open Catalyst 2025 (OC25) dataset for solid-liquid interfaces. OC25 consists of single-point DFT calculations of catalyst/solvent/ion/adsorbate structures, covering 88 elements, 8 solvents (water, methanol, CCl4, DMSO, benzene, hexane, THF, diethyl ether), 9 ionic species (Cs+, OH-, Li+, SO4^2-, Ca^2+, [Me4N]+, HCO3-, H+, F-), and adsorbates from the OC20 set plus reactive intermediates. Surfaces are derived from 39,821 Materials Project bulk structures with miller indices <= 3. Structures are highly off-equilibrium, sampled from short ab initio molecular dynamics simulations (10-50 steps, 1000K, NVT) or short DFT relaxations (5 ionic steps). The validation split contains 203,630 structures representing out-of-distribution (OOD) bulk-solvent combinations (approximately 2.5% of ~260,000 unique pairings held out). Validation calculations used tighter DFT convergence (EDIFF=1e-6 eV) compared to the training set to provide higher-quality force labels. All DFT calculations used VASP 6.3.2 with the non-spin-polarized RPBE functional supplemented with D3 dispersion correction (zero damping), plane wave cutoff 400 eV, k-point reciprocal density of 40, and a dipole correction in the z-direction.
Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, F, Fe, Ga, Ge, H, Hf, Hg, I, In, Ir, K, La, Li, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, O, Os, P, Pb, Pd, Pm, Pr, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Tc, Te, Ti, Tl, V, W, Xe, Y, Zn, Zr
Sushree Jagriti Sahoo, Mikael Maroschin, Daniel S. Levine, Zachary Ulissi, C. Lawrence Zitnick, Joel B Varley, Joseph A. Gauthier, Nitish Govindarajan, Muhammed Shuaibi
203630
29341418
70
DFT-RPBE+D3
VASP 6.3.2
71
The validation split of OMC25. Open Molecular Crystals 2025 (OMC25) is a molecular crystal dataset produced by Meta. The OE62 dataset was used as a source for sampling molecules; crystals were generated with Genarris 3.0; from these, relaxation trajectories were generated and sampled to create the final dataset. See the publication for details.
B, Br, C, Cl, F, H, I, N, O, P, S, Si
Vahe Gharakhanyan, Luis Barroso-Luque, Yi Yang, Muhammed Shuaibi, Kyle Michel, Daniel S. Levine, Misko Dzamba, Xiang Fu, Meng Gao, Xingyu Liu, Haoran Ni, Keian Noori, Brandon M. Wood, Matt Uyttendaele, Arman Boromand, C. Lawrence Zitnick, Noa Marom, Zachary W. Ulissi, Anuroop Sriram
1386816
178106924
12
DFT-PBE
VASP 6.3
71
MatPES (Materials Potential Energy Surface) is a foundational PES dataset developed collaboratively by the Materials Virtual Lab and the Materials Project. The v2025.2 r2SCAN release contains 386,544 structures sampled via the DIRECT method from 300 K NpT molecular dynamics simulations seeded from Materials Project entries. Static DFT calculations were performed using VASP with the r2SCAN meta-GGA functional and MatPESStaticSet convergence settings optimized for energy, force, and stress calculations. v2025.2 removes a small number of duplicated structures present in v2025.1, and the original files add Bader charges and Bader magnetic moments per atom. The previous version of this dataset (MatPES-R2SCAN-2025.1) is available from ColabFit. There is a companion dataset calculated with the PBE functional (MatPES-PBE-2025.2).
Ac, Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr
Aaron D. Kaplan, Runze Liu, Ji Qi, Tsz Wai Ko, Bowen Deng, Janosh Riebesell, Gerbrand Ceder, Kristin A. Persson, Shyue Ping Ong
386520
3049029
89
DFT-R2SCAN
VASP 6.4.x
71
Dataset containing MD trajectories of DHA (docosahexaenoic acid) from the MD22 benchmark set. MD22 represents a collection of datasets in a benchmark that can be considered an updated version of the MD17 benchmark datasets, including more challenges with respect to system size, flexibility and degree of non-locality. The datasets in MD22 include MD trajectories of the protein Ac-Ala3-NHMe; the lipid DHA (docosahexaenoic acid); the carbohydrate stachyose; nucleic acids AT-AT and AT-AT-CG-CG; and the buckyball catcher and double-walled nanotube supramolecules. Each of these is included here in a separate dataset, as represented on sgdml.org. Calculations were performed using FHI-aims and i-Pi software at the DFT-PBE+MBD level of theory. Trajectories were sampled at temperatures between 400-500 K at 1 fs resolution.
10.60732/9d9083b8
C, H, O
Stefan Chmiela, Valentin Vassilev-Galindo, Oliver T. Unke, Adil Kabylda, Huziel E. Sauceda, Alexandre Tkatchenko, Klaus-Robert Müller
69744
3905664
3
DFT-PBE+MBE
FHI-aims
70
Test configurations from the 'scaffold' split of Chig-AIMD. This dataset covers the conformational space of chignolin with DFT-level precision. We sequentially applied replica exchange molecular dynamics (REMD), conventional MD, and ab initio MD (AIMD) simulations on a 10 amino acid protein, Chignolin, and finally collected 2 million biomolecule structures with quantum level energy and force records.
10.60732/e9f3507f
C, H, N, O
Tong Wang, Xinheng He, Mingyu Li, Bin Shao, Tie-Yan Liu
198977
33030182
4
DFT-M06-2X
ORCA 4.2.1
70
The amorphous carbon dataset was generated using ab initio calculations with VASP software. We utilized the LDA exchange-correlation functional and the PAW potential for carbon. Melt-quench simulations were performed to create amorphous and liquid-state structures. A simple cubic lattice of 216 carbon atoms was chosen as the initial state. Simulations were conducted at densities of 1.5, 1.7, 2.0, 2.2, 2.4, 2.6, 2.8, 3.0, 3.2, 3.4, and 3.5 g/cm3 to produce a variety of structures. The NVT ensemble was employed for all melt-quench simulations, and the density was adjusted by modifying the size of the simulation cell. A time step of 1 fs was used for the simulations. For all densities, only the Γ points were sampled in the k-space. To increase structural diversity, six independent simulations were performed.In the melt-quench simulations, the temperature was raised from 300 K to 9000 K over 2 ps to melt carbon. Equilibrium molecular dynamics (MD) was conducted at 9000 K for 3 ps to create a liquid state, followed by a decrease in temperature to 5000 K over 2 ps, with the system equilibrating at that temperature for 2 ps. Finally, the temperature was lowered from 5000 K to 300 K over 2 ps to generate an amorphous structure.During the melt-quench simulation, 30 snapshots were taken from the equilibrium MD trajectory at 9000 K, 100 from the cooling process between 9000 and 5000 K, 25 from the equilibrium MD trajectory at 5000 K, and 100 from the cooling process between 5000 and 300 K. This yielded a total of 16,830 data points.Data for diamond structures containing 216 atoms at densities of 2.4, 2.6, 2.8, 3.0, 3.2, 3.4, and 3.5 g/cm3 were also prepared. Further data on the diamond structure were obtained from 80 snapshots taken from the 2 ps equilibrium MD trajectory at 300 K, resulting in 560 data points.To validate predictions for larger structures, we generated data for 512-atom systems using the same procedure as for the 216-atom systems. A single simulation was conducted for each density. The number of data points was 2,805 for amorphous and liquid states
10.60732/eeb61a0d
C
Emi Minamitani, Ippei Obayashi, Koji Shimizu, Satoshi Watanabe
20194
5191888
1
DFT-LDA
VASP
69
Dihedral scan about one of the C-C bonds of the conjugated system. Acetylacetone dataset generated from a long molecular dynamics simulation at 300 K using a Langevin thermostat at the semi-empirical GFN2-xTB level of theory. Configurations were sampled at an interval of 1 ps and the resulting set of configurations were recomputed with density functional theory using the PBE exchange-correlation functional with D3 dispersion correction and def2-SVP basis set and VeryTightSCF convergence settings using the ORCA electronic structure package.
10.60732/b03a4349
C, H, O
Ilyes Batatia, Simon Batzner, Dávid Péter Kovács, Albert Musaelian, Gregor N. C. Simm, Ralf Drautz, Christoph Ortner, Boris Kozinsky, Gábor Csányi
45
675
3
DFT-PBE+D3
ORCA 5.0
68
The test set of a train and test set pair.The combined datasets comprise approximately 275 configurations of monolayer quasi-hexagonal-phase fullerene (qHPF) membrane used to train and test an NEP model.
10.60732/f1e6e9fa
C
Penghua Ying
39
4680
1
DFT-PBE
VASP
67
Out-of-domain configurations from the SAIT_semiconductors_ACS_2023_SiN dataset. This dataset contains SiN, Si and N configurations from the SAIT semiconductors datasets. SAIT semiconductors datasets comprise two rich datasets for the important semiconductor thin film materials silicon nitride (SiN) and hafnium oxide (HfO), gathered for the development of MLFFs. DFT simulations were conducted under various conditions that include differing initial structures, stoichiometry, temperature, strain, and defects.
10.60732/c2179a59
N, Si
Geonu Kim, Byunggook Na, Gunhee Kim, Hyuntae Cho, Seung-Jin Kang, Hee Sun Lee, Saerom Choi, Heejae Kim, Seungwon Lee, Yongdeok Kim
1234
129570
2
DFT-PBE
VASP
67
Configurations of Re from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K.
10.60732/1e0997f8
Re
Christopher M. Andolina, Wissam A. Saidi
5011
100839
1
DFT-PBE
VASP
67
Validation split of the MAD-1.5 (Massive Atomic Diversity version 1.5) dataset, a highly curated collection designed for training broadly applicable atomistic machine-learning models across the full periodic table. MAD-1.5 extends the original MAD dataset with targeted enrichment strategies covering 102 chemical elements (all isotopes with half-life above one day). All 216,803 structures are computed with a single standardized all-electron DFT workflow using the r2SCAN meta-GGA functional in FHI-aims (version 250806), with tight basis sets, 8 Angstrom^-1 k-point density, Gaussian smearing of 0.05 eV, and SCF convergence thresholds of 1e-6 eV (energy), 1e-4 eV/Angstrom (forces), and 1e-5 e*a0^-3 (electron density). The dataset spans molecules (monomers, dimers, trimers, molecular crystals), bulk crystals, surfaces, nanoclusters, and low-dimensional structures organized into 14 subsets. Quality is ensured by two-step outlier removal: heuristic filtering of structures with forces >100 eV/Angstrom, followed by LLPR uncertainty-based filtering. The validation split (~10% of cleaned data) uses a stratified split method consistent with the training and test splits. A companion PBE-functional dataset (Massive_Atomic_Diversity_MAD-1.5_PBE) was used during model training with separate prediction heads.
Ac, Ag, Al, Am, Ar, As, At, Au, B, Ba, Be, Bi, Bk, Br, C, Ca, Cd, Ce, Cf, Cl, Cm, Co, Cr, Cs, Cu, Dy, Er, Es, Eu, F, Fe, Fm, Fr, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Md, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, No, Np, O, Os, P, Pa, Pb, Pd, Pm, Po, Pr, Pt, Pu, Ra, Rb, Re, Rh, Rn, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr
Cesare Malosso, Filippo Bigi, Paolo Pegolo, Joseph W. Abbott, Philip Loche, Mariana Rossi, Michele Ceriotti, Arslan Mazitov
18305
320218
102
DFT-r2SCAN
FHI-aims v250806
66
Validation set of the Open Polymers 2026 (OPoly26) dataset. OPoly26 contains over 6.57 million density functional theory (DFT) calculations on cluster fragments of up to 360 atoms derived from polymeric systems. The dataset encompasses variations in monomer composition, polymerization degree, chain architectures, and solvation environments to improve machine learning model performance for polymer property prediction. Calculations were performed at the B97M-V/def2-SVP level of theory using ORCA.
Al, B, Br, C, Ca, Cl, Co, Cs, Cu, F, Fe, H, I, K, La, Li, Mg, N, Na, Ni, O, P, S, Sr, Zn
Daniel S. Levine, Nicholas Liesen, Lauren Chua, James Diffenderfer, Helgi I. Ingolfsson, Matthew P. Kroonblawd, Nitesh Kumar, Amitesh Maiti, Supun S. Mohottalalage, Muhammed Shuaibi, Brian Van Essen, Brandon M. Wood, C. Lawrence Zitnick, Samuel M. Blau, Evan R. Antoniuk
210302
37298046
25
DFT-ωB97M-V
ORCA
66
The test set of a train/test pair from the toluene dataset from sGDML. To create the coupled cluster datasets, the data used for training the models were created by running ab initio MD in the NVT ensemble using the Nosé-Hoover thermostat at 500 K during a 200 ps simulation with a resolution of 0.5 fs. Energies and forces were recalculated using all-electron coupled cluster with single, double and perturbative triple excitations (CCSD(T)). The Dunning correlation-consistent basis set cc-pVDZ was used for malonaldehyde. All calculations were performed with the Psi4 software suite.
10.60732/52a54ab9
C, H
Stefan Chmiela, Huziel E. Sauceda, Klaus-Robert Müller, Alexandre Tkatchenko
501
7515
2
CCSD(T)
Psi4
65
Test set of decorrelated geometries sampled from 600 K xTB MD. Acetylacetone dataset generated from a long molecular dynamics simulation at 300 K using a Langevin thermostat at the semi-empirical GFN2-xTB level of theory. Configurations were sampled at an interval of 1 ps and the resulting set of configurations were recomputed with density functional theory using the PBE exchange-correlation functional with D3 dispersion correction and def2-SVP basis set and VeryTightSCF convergence settings using the ORCA electronic structure package.
10.60732/c83f94bc
C, H, O
Ilyes Batatia, Simon Batzner, Dávid Péter Kovács, Albert Musaelian, Gregor N. C. Simm, Ralf Drautz, Christoph Ortner, Boris Kozinsky, Gábor Csányi
650
9750
3
DFT-PBE+D3
ORCA 5.0
64
Dataset from "Modeling high-entropy transition-metal alloys with alchemical compression". Includes 25,000 structures utilized for fitting the aforementioned potential, with a focus on 25 d-block transition metals, excluding Tc, Cd, Re, Os and Hg. Each configuration includes a "class" field, indicating the crystal class of the structure. The class represents the following: 1: perfect crystals; 3-8 elements per structure, 2: shuffled positions (standard deviation 0.2\AA ); 3-8 elements per structure, 3: shuffled positions (standard deviation 0.5\AA ); 3-8 elements per structure, 4: shuffled positions (standard deviation 0.2\AA ); 3-25 elements per structure. Configuration sets include divisions into fcc and bcc crystals, further split by class as described above.
10.60732/7766f043
Ag, Au, Co, Cr, Cu, Fe, Hf, Ir, Lu, Mn, Mo, Nb, Ni, Pd, Pt, Rh, Ru, Sc, Ta, Ti, V, W, Y, Zn, Zr
Nataliya Lopanitsyna, Guillaume Fraux, Maximilian A. Springer, Sandip De, Michele Ceriotti
25625
1063584
25
DFT-PBEsol
VASP
64
The JARVIS-MEGNet dataset is part of the joint automated repository for various integrated simulations (JARVIS) database. This subset contains configurations with 3D materials properties from the 2018 version of Materials Project, as used in the training of the MEGNet ML model. JARVIS is a set of tools and datasets built to meet current materials design challenges.
10.60732/b88c7676
Ac, Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr
Chi Chen, Weike Ye, Yunxing Zuo, Chen Zheng, Shyue Ping Ong
69215
2070556
89
DFT-PBE
VASP
63
Approximately 2800 configurations from a test dataset–one of a pair of train/test datasets of aluminum in crystal and melt phases, used for training and testing an ANI neural network model.
10.60732/d1e27447
Al
Justin S. Smith, Benjamin Nebgen, Nithin Mathew, Jie Chen, Nicholas Lubbers, Leonid Burakovsky, Sergei Tretiak, Hai Ah Nam, Timothy Germann, Saryu Fensin, Kipton Barros
2769
357851
1
DFT-PBE
Quantum ESPRESSO
63
Out-of-domain configurations from the SAIT_semiconductors_ACS_2023_HfO dataset. This dataset contains HfO configurations from the SAIT semiconductors datasets. SAIT semiconductors datasets comprise two rich datasets for the important semiconductor thin film materials silicon nitride (SiN) and hafnium oxide (HfO), gathered for the development of MLFFs. DFT simulations were conducted under various conditions that include differing initial structures, stoichiometry, temperature, strain, and defects.
10.60732/83a90e9c
Hf, O
Geonu Kim, Byunggook Na, Gunhee Kim, Hyuntae Cho, Seung-Jin Kang, Hee Sun Lee, Saerom Choi, Heejae Kim, Seungwon Lee, Yongdeok Kim
6996
671616
2
DFT-PBE
VASP
63
The amorphous LiSi data set comprises 45,169 atomic structures with compositions Li(x)Si (0.0≤x≤4.75) and the corresponding energies and interatomic forces, which were generated using an iterative approach based on an evolutionary algorithm and subsequent refinement, as described in detail in reference [15]. The data includes bulk, surface, and cluster structures with system sizes of up to 608 atoms. The energies and forces of the LiSi structures were obtained from DFT calculations using the Perdew-Burke-Ernzerhof [10] exchange-correlation functional and projector-augmented wave pseudopotentials [16], as implemented in the Vienna Ab-Initio Simulation Package (VASP) [17,18]. We employed a plane-wave basis set with an energy cutoff of 520 eV for the representation of the wavefunctions and a uniform gamma-centered k-point grid for the Brillouin zone integration, with a mesh density corresponding to a number of k points of at least 1000 divided by the number of atoms. The atomic positions and lattice parameters of all structures were optimized until residual forces were below 20 meV/Å. This dataset was also used for the construction of the ANN potential in Ref. [15] and [19]. [10] J. P. Perdew, K. Burke, and M. Ernzerhof, Phys. Rev. Lett. 77, 3865 (1996). [15] N. Artrith, A. Urban, G. Ceder, J. Chem. Phys. 148 (2018) 241711. [16] P. E. Blöchl, Phys. Rev. B 50, 17953–17979 (1994). [17] G. Kresse, J. Furthmüller, Phys. Rev. B 54, 11169–11186 (1996). [18] Kresse, J. Furthmüller, Comput. Mater. Sci. 6, 15–50 (1996). [19] N. Artrith, A. Urban, Y. Wang, G. Ceder, arXiv:1901.09272, https://arxiv.org/pdf/1901.09272.pdf
10.60732/ea8fd398
Li, Si
Michael S. Chen, Tobias Morawietz, Thomas E. Markland, Nongnuch Artrith
44651
5741119
2
DFT-PBE
VASP
63
Configurations of o-hbdi from WS22. The WS22 database combines Wigner sampling with geometry interpolation to generate 1.18 million molecular geometries equally distributed into 10 independent datasets of flexible organic molecules with varying sizes and chemical complexity. In addition to the potential energy and forces required to construct potential energy surfaces, the WS22 database provides several other quantum chemical properties, all obtained via single-point calculations for each molecular geometry. All quantum chemical calculations were performed with the Gaussian 09 program.
10.60732/538deb26
C, H, N, O
Max Pinheiro Jr, Shuang Zhang, Pavlo O. Dral, Mario Barbatti
119988
1799820
4
DFT-PBE0
Gaussian 09
62
10,000 configurations of organosilicon compounds with energies predicted by an improved GFN-xTB Hamiltonian parameterization, using revPBE.
10.60732/029be1b1
Br, C, Cl, F, H, N, O, P, S, Si
Leonid Komissarov, Toon Verstraelen
157348
4021653
10
DFT-revPBE
ADF
62
Training configurations from the SAIT_semiconductors_ACS_2023_SiN dataset. This dataset contains SiN, Si and N configurations from the SAIT semiconductors datasets. SAIT semiconductors datasets comprise two rich datasets for the important semiconductor thin film materials silicon nitride (SiN) and hafnium oxide (HfO), gathered for the development of MLFFs. DFT simulations were conducted under various conditions that include differing initial structures, stoichiometry, temperature, strain, and defects.
10.60732/dbe982a6
N, Si
Geonu Kim, Byunggook Na, Gunhee Kim, Hyuntae Cho, Seung-Jin Kang, Hee Sun Lee, Saerom Choi, Heejae Kim, Seungwon Lee, Yongdeok Kim
22494
1283591
2
DFT-PBE
VASP
62
This dataset is a companion dataset to Carbon-24 Unique. Carbon X contains 480 carbon structures of duplicates which have the same cell shape and same number of atoms per unit cell (N=6), with different translations (X) of the fractional coordinates. Carbon_X has been cultivated from Carbon-24 (Pickard 2020, doi: 10.24435/materialscloud:2020.0026/v1). Material IDs from the original dataset are included in the metadata as 'original_id'.
C
Maya M. Martirossyan, Thomas Egg, Philipp Hoellmer, George Karypis, Mark Transtrum, Adrian Roitberg, Mingjie Liu, Richard G. Hennig, Ellad B. Tadmor, Stefano Martiniani
480
2880
1
DFT-PBE
CASTEP
62
The train set of a train/test pair from the toluene dataset from sGDML. To create the coupled cluster datasets, the data used for training the models were created by running ab initio MD in the NVT ensemble using the Nosé-Hoover thermostat at 500 K during a 200 ps simulation with a resolution of 0.5 fs. Energies and forces were recalculated using all-electron coupled cluster with single, double and perturbative triple excitations (CCSD(T)). The Dunning correlation-consistent basis set cc-pVDZ was used for malonaldehyde. All calculations were performed with the Psi4 software suite.
10.60732/05ec452e
C, H
Stefan Chmiela, Huziel E. Sauceda, Klaus-Robert Müller, Alexandre Tkatchenko
997
14955
2
CCSD(T)
Psi4
61
Validation configurations from the 'scaffold' split of Chig-AIMD. This dataset covers the conformational space of chignolin with DFT-level precision. We sequentially applied replica exchange molecular dynamics (REMD), conventional MD, and ab initio MD (AIMD) simulations on a 10 amino acid protein, Chignolin, and finally collected 2 million biomolecule structures with quantum level energy and force records.
10.60732/c182723b
C, H, N, O
Tong Wang, Xinheng He, Mingyu Li, Bin Shao, Tie-Yan Liu
198978
33030348
4
DFT-M06-2X
ORCA 4.2.1
61
Approximately 50,000 configurations of Au, Ag and AuAg used as part of a training dataset for a DP-GEN-based ML model for a Ag-Au nanoalloy potential.
10.60732/a222a0f6
Ag, Au
Yinan Wang, Xiaoyang Wang, Linfeng Zhang, Ben Xu, Han Wang
51702
1186478
2
DFT-PBE+D3
VASP
61
The JARVIS_QMOF dataset is part of the joint automated repository for various integrated simulations (JARVIS) database. This dataset contains configurations from the Quantum Metal-Organic Frameworks (QMOF) dataset, comprising quantum-chemical properties for >14,000 experimentally synthesized MOFs. QMOF contains "DFT-ready" data: filtered to remove omitted, overlapping, unbonded or deleted atoms, along with other kinds of problematic structures commented on in the literature. Data were generated via high-throughput DFT workflow, at the PBE-D3(BJ) level of theory using VASP software. JARVIS is a set of tools and collected datasets built to meet current materials design challenges.
10.60732/67cd629a
Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, Hf, Hg, Ho, I, In, Ir, K, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, Np, O, P, Pb, Pd, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Y, Yb, Zn, Zr
Andrew S. Rosen, Shaelyn M. Iyer, Debmalya Ray, Zhenpeng Yao, Alán Aspuru-Guzik, Laura Gagliardi, Justin M. Notestein, Randall Q. Snurr
20425
2321633
79
DFT-PBE+D3(BJ)
VASP 5.4.4
61
The train set of a train/test pair from the benzene dataset from sGDML. To create the coupled cluster datasets, the data used for training the models were created by running ab initio MD in the NVT ensemble using the Nosé-Hoover thermostat at 500 K during a 200 ps simulation with a resolution of 0.5 fs. Energies and forces were recalculated using all-electron coupled cluster with single , double and perturbative triple excitations (CCSD(T)). The Dunning correlation-consistent basis set cc-pVDZ was used for benzene. All calculations were performed with the Psi4 software suite.
10.60732/a3ca9725
C, H
Stefan Chmiela, Huziel E. Sauceda, Klaus-Robert Müller, Alexandre Tkatchenko
999
11988
2
CCSD(T)
Psi4
60
Approximately 115,000 configurations of carbon with 200 atoms, with simulated melt, quench, reheat, then annealing at the noted temperature. Includes a variety of carbon structures.
10.60732/8ecd90ee
C
John L. A. Gardner, Zoé Faure Beaulieu, Volker L. Deringer
115199
23039800
1
IP-C-GAP-17
LAMMPS
60
Training dataset from xxMD-CASSCF. The xxMD (Extended Excited-state Molecular Dynamics) dataset is a comprehensive collection of non-adiabatic trajectories encompassing several photo-sensitive molecules. This dataset challenges existing Neural Force Field (NFF) models with broader nuclear configuration spaces that span reactant, transition state, product, and conical intersection regions, making it more chemically representative than its contemporaries. xxMD is divided into two datasets, each with corresponding train, test and validation splits. xxMD-CASSCF contains calculations generated using state-averaged complete active state self-consistent field (SA-CASSCF) electronic theory. xxMD-DFT contains recalculated single-point spin-polarized (unrestricted) DFT values.
10.60732/3fb520e9
C, H, N, O, S
Zihan Pengmei, Yinan Shu, Junyu Liu
43393
807456
5
SA-CASSCF
OpenMolcas 22.06
60
The JARVIS_QM9_STD_JCTC dataset is part of the joint automated repository for various integrated simulations (JARVIS) database. This dataset contains configurations from the QM9 dataset, originally created as part of the datasets at quantum-machine.org. Units for r2 (electronic spatial extent) are a^2; for alpha (isotropic polarizability), a^3; for mu (dipole moment), D; for Cv (heat capacity), cal/mol K. Units for all other properties are eV. JARVIS is a set of tools and collected datasets built to meet current materials design challenges.For the first iteration of DFT calculations, Gaussian 09's default electronic and geometry thresholds have been used for all molecules. For those molecules which failed to reach SCF convergence ultrafine grids have been invoked within a second iteration for evaluating the XC energy contributions. Within a third iteration on the remaining unconverged molecules, we identified those which had relaxed to saddle points, and further tightened the SCF criteria using the keyword scf(maxcycle=200, verytight). All those molecules which still featured imaginary frequencies entered the fourth iteration using keywords, opt(calcfc, maxstep=5, maxcycles=1000). calcfc constructs a Hessian in the first step of the geometry relaxation for eigenvector following. Within the fifth and final iteration, all molecules which still failed to reach convergence, have subsequently been converged using opt(calcall, maxstep=1, maxcycles=1000)
10.60732/5935fa4d
C, F, H, N, O
Raghunathan Ramakrishnan, Pavlo O. Dral, Matthias Rupp, O. Anatole von Lilienfeld
130829
2359192
5
DFT-B3LYP
Gaussian 09
60
The rattled-1000-subsampled training split of OMat24 (Open Materials 2024). OMat24 is a large-scale open dataset of density functional theory (DFT) calculations. The dataset is available in subdatasets and subsampled sub-datasets based on the structure generation strategy used. There are two main splits in OMat24: train and validation, each divided into the aforementioned subsampling and sub-datasets.
10.60732/ea43e8f5
Ac, Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr
Luis Barroso-Luque, Muhammed Shuaibi, Xiang Fu, Brandon M. Wood, Misko Dzamba, Meng Gao, Ammar Rizvi, C. Lawrence Zitnick, Zachary W. Ulissi
3879731
55648760
89
DFT-PBE+U
VASP
60
A set of validation configurations of hydrogenated liquid and amorphous silicon from the datasets for Si-H-GAP. These configurations served to augment the reference set as a final benchmark for NEP model performance.
10.60732/d2f86b68
H, Si
Davis Unruh, Reza Vatan Meidanshahi, Stephen M. Goodnick, Gábor Csányi, Gergely T. Zimányi
150
23000
2
DFT-PBE
Quantum ESPRESSO
59
Validation configurations from the 'random' split of Chig-AIMD. This dataset covers the conformational space of chignolin with DFT-level precision. We sequentially applied replica exchange molecular dynamics (REMD), conventional MD, and ab initio MD (AIMD) simulations on a 10 amino acid protein, Chignolin, and finally collected 2 million biomolecule structures with quantum level energy and force records.
10.60732/28c9171d
C, H, N, O
Tong Wang, Xinheng He, Mingyu Li, Bin Shao, Tie-Yan Liu
198985
33031510
4
DFT-M06-2X
ORCA 4.2.1
59
MatPES (Materials Potential Energy Surface) is a foundational PES dataset developed collaboratively by the Materials Virtual Lab and Materials Project. The v2025.1 r2SCAN release contains structures sampled via the DIRECT method from 300 K NpT molecular dynamics simulations seeded from Materials Project entries. Static DFT calculations were performed using VASP with the r2SCAN meta-GGA functional and MatPESStaticSet convergence settings optimized for energy, force, and stress calculations. There is a companion dataset calculated with the PBE functional (MatPES-PBE-2025.1).
Ac, Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr
Aaron D. Kaplan, Runze Liu, Ji Qi, Tsz Wai Ko, Bowen Deng, Janosh Riebesell, Gerbrand Ceder, Kristin A. Persson, Shyue Ping Ong
387856
3059679
89
DFT-R2SCAN
VASP 6.4.x
58
Approximately 145,000 configurations of alkane, aspirin, alpha-glucose and uracil, partly taken from the MD-17 dataset, used in training an 'Atomic Neural Net' model.
10.60732/82344f5c
C, H, N, O
Hao Li, Musen Zhou, Jessalyn Sebastian, Jianzhong Wu, Mengyang Gu
143756
1911045
4
DFT-PBE-vdW-TS
Q-Chem
58
The val_aimd-from-PBE-1000-npt validation split of OMat24 (Open Materials 2024). OMat24 is a large-scale open dataset of density functional theory (DFT) calculations. The dataset is available in subdatasets and subsampled sub-datasets based on the structure generation strategy used. There are two main splits in OMat24: train and validation, each divided into the aforementioned subsampling and sub-datasets.
10.60732/cdd647d5
Ac, Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Y, Yb, Zn, Zr
Luis Barroso-Luque, Muhammed Shuaibi, Xiang Fu, Brandon M. Wood, Misko Dzamba, Meng Gao, Ammar Rizvi, C. Lawrence Zitnick, Zachary W. Ulissi
202758
1710254
85
DFT-PBE+U
VASP
57
Over 300,000 configurations in an expanded dataset of 19 hydrogen combustion reaction channels. Intrinsic reaction coordinate calculations (IRC) are combined with ab initio simulations (AIMD) and normal mode displacement (NM) calculations.
10.60732/ebb9ca58
H, O
Xingyi Guan, Akshaya Das, Christopher J. Stein, Farnaz Heidar-Zadeh, Luke Bertels, Meili Liu, Mojtaba Haghighatlari, Jie Li, Oufan Zhang, Hongxia Hao, Itai Leven, Martin Head-Gordon, Teresa Head-Gordon
315943
1399037
2
DFT-ωB97X-V
Q-Chem
57
The original DFT training data for the general-purpose silicon interatomic potential described in the associated publication. The kinds of configuration that we include are chosen using intuition and past experience to guide what needs to be included to obtain good coverage pertaining to a range of properties.
10.60732/8e9bc5b0
Si
Albert P. Bartók, James Kermode, Noam Bernstein, Gábor Csányi
2231
162365
1
DFT-PW91, DFT-PBE
CASTEP
57
Structures from discrepencies_and_error_metrics_NPJ_2023 test set; these include an interstitial. The full discrepencies_and_error_metrics_NPJ_2023 dataset includes the original mlearn_Si_train dataset, modified with the purpose of developing models with better diffusivity scores by replacing ~54% of the data with structures containing migrating interstitials. The enhanced validation set contains 50 total structures, consisting of 20 structures randomly selected from the 120 replaced structures of the original training dataset, 11 snapshots with vacancy rare events (RE) from AIMD simulations, and 19 snapshots with interstitial RE from AIMD simulations. We also construct interstitial-RE and vacancy-RE testing sets, each consisting of 100 snapshots of atomic configurations with a single migrating vacancy or interstitial, respectively, from AIMD simulations at 1230 K.
10.60732/81a7ca9e
Si
Yunsheng Liu, Xingfeng He, Yifei Mo
100
6500
1
DFT-PBE
VASP 5.4.4
56
The JARVIS-MEGNet2 dataset is part of the joint automated repository for various integrated simulations (JARVIS) database. This subset contains 133K materials with formation energy from the Materials Project, as used in the training of the MEGNet ML model. JARVIS is a set of tools and datasets built to meet current materials design challenges.
10.60732/419ba77a
Ac, Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr
Chi Chen, Weike Ye, Yunxing Zuo, Chen Zheng, Shyue Ping Ong
133407
3880004
89
DFT-PBE
VASP
56
COMP6v2-wB97X-631Gd is the portion of COMP6v2 calculated at the wB97X/631Gd level of theory. COmprehensive Machine-learning Potential (COMP6) Benchmark Suite version 2.0 is an extension of the COMP6 benchmark found in the following repository: https://github.com/isayev/COMP6. COMP6v2 is a data set of density functional properties for molecules containing H, C, N, O, S, F, and Cl. It is available at the following levels of theory: wB97X/631Gd (data used to train model in the ANI-2x paper); wB97MD3BJ/def2TZVPP; wB97MV/def2TZVPP; B973c/def2mTZVP. The 6 subsets from COMP6 (ANI-MD, DrugBank, GDB07to09, GDB10to13 Tripeptides, and s66x8) are contained in each of the COMP6v2 datasets corresponding to the above levels of theory.
10.60732/cbced4c5
C, Cl, F, H, N, O, S
Kate Huddleston, Roman Zubatyuk, Justin Smith, Adrian Roitberg, Olexandr Isayev, Ignacio Pickering, Christian Devereux, Kipton Barros
157718
3897748
7
DFT-ωB97X
Gaussian 09
56
Train split from the 216-atom amorphous portion of the aC_JCP_2023 dataset. The amorphous carbon dataset was generated using ab initio calculations with VASP software. We utilized the LDA exchange-correlation functional and the PAW potential for carbon. Melt-quench simulations were performed to create amorphous and liquid-state structures. A simple cubic lattice of 216 carbon atoms was chosen as the initial state. Simulations were conducted at densities of 1.5, 1.7, 2.0, 2.2, 2.4, 2.6, 2.8, 3.0, 3.2, 3.4, and 3.5 g/cm3 to produce a variety of structures. The NVT ensemble was employed for all melt-quench simulations, and the density was adjusted by modifying the size of the simulation cell. A time step of 1 fs was used for the simulations. For all densities, only the Γ points were sampled in the k-space. To increase structural diversity, six independent simulations were performed.In the melt-quench simulations, the temperature was raised from 300 K to 9000 K over 2 ps to melt carbon. Equilibrium molecular dynamics (MD) was conducted at 9000 K for 3 ps to create a liquid state, followed by a decrease in temperature to 5000 K over 2 ps, with the system equilibrating at that temperature for 2 ps. Finally, the temperature was lowered from 5000 K to 300 K over 2 ps to generate an amorphous structure.During the melt-quench simulation, 30 snapshots were taken from the equilibrium MD trajectory at 9000 K, 100 from the cooling process between 9000 and 5000 K, 25 from the equilibrium MD trajectory at 5000 K, and 100 from the cooling process between 5000 and 300 K. This yielded a total of 16,830 data points.Data for diamond structures containing 216 atoms at densities of 2.4, 2.6, 2.8, 3.0, 3.2, 3.4, and 3.5 g/cm3 were also prepared. Further data on the diamond structure were obtained from 80 snapshots taken from the 2 ps equilibrium MD trajectory at 300 K, resulting in 560 data points.To validate predictions for larger structures, we generated data for 512-atom systems using the same procedure as for the 216-atom systems. A single simulation was conducted for each density. The number of data points was 2,805 for amorphous and liquid states
10.60732/ee630a62
C
Emi Minamitani, Ippei Obayashi, Koji Shimizu, Satoshi Watanabe
13462
2907792
1
DFT-LDA
VASP
56
This dataset is a companion dataset to Carbon-24 Unique, containing enantiomorph pairs discovered within the Carbon-24 dataset. Carbon-24_Unique_with_Enantiomorphs has been cultivated from Carbon-24 (Pickard 2020, doi: 10.24435/materialscloud:2020.0026/v1). Contains 4,330 entries of unique carbon structures, where enantiomorphs are treated as distinct. The metadata column indicates the index of the respective enantiomorph pair, if any, as well as the original id from Carbon-24.
C
Maya M. Martirossyan, Thomas Egg, Philipp Hoellmer, George Karypis, Mark Transtrum, Adrian Roitberg, Mingjie Liu, Richard G. Hennig, Ellad B. Tadmor, Stefano Martiniani
4330
48260
1
DFT-PBE
CASTEP
56
The val_aimd-from-PBE-1000-nvt validation split of OMat24 (Open Materials 2024). OMat24 is a large-scale open dataset of density functional theory (DFT) calculations. The dataset is available in subdatasets and subsampled sub-datasets based on the structure generation strategy used. There are two main splits in OMat24: train and validation, each divided into the aforementioned subsampling and sub-datasets.
10.60732/65323852
Ac, Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, Hf, Hg, Ho, I, In, Ir, K, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr
Luis Barroso-Luque, Muhammed Shuaibi, Xiang Fu, Brandon M. Wood, Misko Dzamba, Meng Gao, Ammar Rizvi, C. Lawrence Zitnick, Zachary W. Ulissi
195575
1643554
85
DFT-PBE+U
VASP
55
Approximately 7,000 distinct configurations of 2D-silicene, silicon, and PbTe. Silicon data used from http://dx.doi.org/10.1103/PhysRevX.8.041048. Dataset includes predicted force, potential energy and virial values.
10.60732/7cc0df9e
Pb, Si, Te
Zheyong Fan
7077
528999
3
DFT-PW91, DFT-PBE
CASTEP, VASP, Quantum ESPRESSO
55
The JARVIS_CFID_3D_8_18_2022 dataset is part of the joint automated repository for various integrated simulations (JARVIS) DFT database. This subset contains configurations of 3D materials. JARVIS is a set of tools and datasets built to meet current materials design challenges.
10.60732/82106853
Ac, Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr
Kamal Choudhary, Kevin F. Garrity, Andrew C. E. Reid, Brian DeCost, Adam J. Biacchi, Angela R. Hight Walker, Zachary Trautt, Jason Hattrick-Simpers, A. Gilad Kusne, Andrea Centrone, Albert Davydov, Jie Jiang, Ruth Pachter, Gowoon Cheon, Evan Reed, Ankit Agrawal, Xiaofeng Qian, Vinit Sharma, Houlong Zhuang, Sergei V. Kalinin, Bobby G. Sumpter, Ghanshyam Pilania, Pinar Acar, Subhasish Mandal, Kristjan Haule, David Vanderbilt, Karin Rabe, Francesca Tavazza
55581
561509
89
DFT-optB88-vdW, DFT-TBmBJ
VASP
55
Approximately 2800 configurations from a train dataset–one of a pair of train/test datasets of aluminum in crystal and melt phases, used for training and testing an ANI neural network model.
10.60732/af254882
Al
Justin S. Smith, Benjamin Nebgen, Nithin Mathew, Jie Chen, Nicholas Lubbers, Leonid Burakovsky, Sergei Tretiak, Hai Ah Nam, Timothy Germann, Saryu Fensin, Kipton Barros
2779
363129
1
DFT-PBE
Quantum ESPRESSO
54
Approximately 5,000 configurations of GeTe used in training of a non-von Neumann multiplication-less DNN model.
10.60732/c741fb0f
Ge, Te
Pinghui Mo, Chang Li, Dan Zhao, Yujia Zhang, Mengchao Shi, Junhua Li, Jie Liu
5025
321600
2
DFT-GGA
SIESTA
54
NENCI-2021 is a database of approximately 8000 benchmark Non-Equilibirum Non-Covalent Interaction (NENCI) energies performed on molecular dimers;intermolecular complexes of biological and chemical relevance with a particular emphasis on close intermolecular contacts. Based on dimersfrom the S101 database.
10.60732/5d2a1ceb
Br, C, Cl, F, H, Li, N, Na, O, P, S
Zachary M. Sparrow, Brian G. Ernst, Paul T. Joo, Ka Un Lao, Robert A. DiStasio, Jr
7763
129402
11
CCSD(T), SAPT2+, MP2
Psi4
52
The test set for UNEP-v1 (version 1 of Unified NeuroEvolution Potential), a model implemented in GPUMD.
10.60732/b459b5f2
Ag, Al, Au, Cr, Cu, Mg, Mo, Ni, Pb, Pd, Pt, Ta, Ti, V, W, Zr
Keke Song, Rui Zhao, Jiahui Liu, Yanzhou Wang, Eric Lindgren, Yong Wang, Shunda Chen, Ke Xu, Ting Liang, Penghua Ying, Nan Xu, Zhiqiang Zhao, Jiuyang Shi, Junjie Wang, Shuang Lyu, Zezhu Zeng, Shirong Liang, Haikuan Dong, Ligang Sun, Yue Chen, Zhuhua Zhang, Wanlin Guo, Ping Qian, Jian Sun, Paul Erhart, Tapio Ala-Nissila, Yanjing Su, Zheyong Fan
4411
318910
16
DFT-PBE
VASP
52
The training set for UNEP-v1 (version 1 of Unified NeuroEvolution Potential), a model implemented in GPUMD.
10.60732/23c88dd7
Ag, Al, Au, Cr, Cu, Mg, Mo, Ni, Pb, Pd, Pt, Ta, Ti, V, W, Zr
Keke Song, Rui Zhao, Jiahui Liu, Yanzhou Wang, Eric Lindgren, Yong Wang, Shunda Chen, Ke Xu, Ting Liang, Penghua Ying, Nan Xu, Zhiqiang Zhao, Jiuyang Shi, Junjie Wang, Shuang Lyu, Zezhu Zeng, Shirong Liang, Haikuan Dong, Ligang Sun, Yue Chen, Zhuhua Zhang, Wanlin Guo, Ping Qian, Jian Sun, Paul Erhart, Tapio Ala-Nissila, Yanjing Su, Zheyong Fan
104799
6840534
16
DFT-PBE
VASP
51
SPICE (Small-Molecule/Protein Interaction Chemical Energies) is a collection of quantum mechanical data for training potential functions. The emphasis is particularly on simulating drug-like small molecules interacting with proteins. Subsets of the dataset include the following: dipeptides: these provide comprehensive sampling of the covalent interactions found in proteins; solvated amino acids: these provide sampling of protein-water and water-water interactions; PubChem molecules: These sample a very wide variety of drug-like small molecules; monomer and dimer structures from DES370K: these provide sampling of a wide variety of non-covalent interactions; ion pairs: these provide further sampling of Coulomb interactions over a range of distances.
10.60732/a613a175
Br, C, Ca, Cl, F, H, I, K, Li, N, Na, O, P, S
Peter Eastman, Pavan Kumar Behara, David L. Dotson, Raimondas Galvelis, John E. Herr, Josh T. Horton, Yuezhi Mao, John D. Chodera, Benjamin P. Pritchard, Yuanqing Wang, Gianni De Fabritiis, Thomas E. Markland
116504
3382829
14
DFT-ωB97M+D3(BJ)
Psi4 1.4.1
51
Configurations of urea from WS22. The WS22 database combines Wigner sampling with geometry interpolation to generate 1.18 million molecular geometries equally distributed into 10 independent datasets of flexible organic molecules with varying sizes and chemical complexity. In addition to the potential energy and forces required to construct potential energy surfaces, the WS22 database provides several other quantum chemical properties, all obtained via single-point calculations for each molecular geometry. All quantum chemical calculations were performed with the Gaussian 09 program.
10.60732/8f44aef0
C, H, N, O
Max Pinheiro Jr, Shuang Zhang, Pavlo O. Dral, Mario Barbatti
119992
959936
4
DFT-PBE0
Gaussian 09
51
The SN2 dataset was generated as a partner benchmark dataset, along with the 'solvated protein fragments' dataset, for measuring the performance of machine learning models, in particular PhysNet, at describing chemical reactions, long-range interactions, and condensed phase systems. SN2 probes chemical reactions of methyl halides with halide anions, i.e. X- + CH3Y -> CH3X + Y-, and contains structures, for all possible combinations of X,Y = F, Cl, Br, I. The dataset also includes various structures for several smaller molecules that can be formed in fragmentation reactions, such as CH3X, HX, CHX or CH2X- as well as geometries for H2, CH2, CH3+ and XY interhalogen compounds. In total, the dataset provides reference energies, forces, and dipole moments for 452709 structurescalculated at the DSD-BLYP-D3(BJ)/def2-TZVP level of theory using ORCA 4.0.1.
10.60732/31df6835
Br, C, Cl, F, H, I
Oliver T. Unke, Markus Meuwly
394653
2194070
6
DFT-DSD-BLYP+D3(BJ)
ORCA 4.0.1
51
In-domain validation configurations for the initial structure to relaxed total energy (IS2RE) task of OC22. Open Catalyst 2022 (OC22) is a database of training trajectories for predicting catalytic reactions on oxide surfaces meant to complement OC20, which did not contain oxide surfaces.
10.60732/ced227e5
Ag, Al, As, Au, Ba, Be, Bi, C, Ca, Cd, Ce, Co, Cr, Cs, Cu, Fe, Ga, Ge, H, Hf, Hg, In, Ir, K, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Ni, O, Os, Pb, Pd, Pt, Rb, Re, Rh, Ru, Sb, Sc, Se, Si, Sn, Sr, Ta, Te, Ti, Tl, V, W, Y, Zn, Zr
Richard Tran, Janice Lan, Muhammed Shuaibi, Brandon M. Wood, Siddharth Goyal, Abhishek Das, Javier Heras-Domingo, Adeesh Kolluru, Ammar Rizvi, Nima Shoghi, Anuroop Sriram, Felix Therrien, Jehad Abed, Oleksandr Voznyy, Edward H. Sargent, Zachary Ulissi, C. Lawrence Zitnick
441623
35243458
57
DFT-PBE+U
VASP
51
The JARVIS_Open_Catalyst_10K dataset is part of the joint automated repository for various integrated simulations (JARVIS) DFT database. This subset contains configurations from the 10K training, rest validation and test dataset from the Open Catalyst Project (OCP). JARVIS is a set of tools and datasets built to meet current materials design challenges.
10.60732/b10d497c
Ag, Al, As, Au, B, Bi, C, Ca, Cd, Cl, Co, Cr, Cs, Cu, Fe, Ga, Ge, H, Hf, Hg, In, Ir, K, Mn, Mo, N, Na, Nb, Ni, O, Os, P, Pb, Pd, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Tc, Te, Ti, Tl, V, W, Y, Zn, Zr
Lowik Chanussot, Abhishek Das, Siddharth Goyal, Thibaut Lavril, Muhammed Shuaibi, Morgane Riviere, Kevin Tran, Javier Heras-Domingo, Caleb Ho, Weihua Hu, Aini Palizhati, Anuroop Sriram, Brandon Wood, Junwoong Yoon, Devi Parikh, C. Lawrence Zitnick, Zachary Ulissi
34938
2719837
56
DFT-rPBE
VASP
51
Benzene training PBE-TS dataset from "Semi-local and hybrid functional DFT data for thermalised snapshots of polymorphs of benzene, succinic acid, and glycine". DFT reference energies and forces were calculated using Quantum Espresso v6.3. The calculations were performed with the semi-local PBE xc functional, Tkatchenko-Scheffler dispersion correction, optimised norm-conserving Vanderbilt pseudopotentials, a Monkhorst-Pack k-point grid with a maximum spacing of 0.06 x 2π A^-1, and a plane-wave energy cut-off of 100 Ry for the wavefunction.
10.60732/6b905ba8
C, H
Venkat Kapil, Edgar A. Engel
54990
1601760
2
DFT-PBE+TS
Quantum ESPRESSO v6.3
51
This dataset is a companion dataset to Carbon-24 Unique. Carbon NXL is intended for use in training of minimal “overfitting” testing cases. Contains 353 carbon structures of duplicates which have different numbers of atoms per unit cell (N=6—16), different cell shapes L, and different translations X of the fractional coordinates. Carbon_NXL has been cultivated from Carbon-24 (Pickard 2020, doi: 10.24435/materialscloud:2020.0026/v1). Material IDs from the original dataset are included in the metadata as 'original_id'. Please cite Martirossyan et al. (https://arxiv.org/abs/2509.12178) if your work utilizes this dataset.
C
Maya M. Martirossyan, Thomas Egg, Philipp Hoellmer, George Karypis, Mark Transtrum, Adrian Roitberg, Mingjie Liu, Richard G. Hennig, Ellad B. Tadmor, Stefano Martiniani
353
2540
1
DFT-PBE
CASTEP
51
Carolina Materials contains structures used to train several machine learning models for the efficient generation of hypothetical inorganic materials. The database is built using structures from OQMD, Materials Project and ICSD, as well as ML generated structures validated by DFT.
10.60732/f2f98394
Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Cl, Co, Cr, Cs, Cu, F, Fe, Ga, Ge, H, Hf, Hg, I, In, Ir, K, Li, Mg, Mn, Mo, N, Na, Nb, Ni, O, Os, P, Pb, Pd, Po, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Tc, Te, Ti, Tl, V, W, Y, Zn, Zr
Yong Zhao, Mohammed Al-Fahdi, Ming Hu, Edirisuriya M. D. Siriwardane, Yuqi Song, Alireza Nasiri, Jianjun Hu
214267
3168298
64
DFT-PBE
VASP
50
Training simulations from CGM-MLP_natcomm2023 of carbon deposition on a Cr surface. This dataset was one of the datasets used in training during the process of producing an active learning dataset for the purposes of exploring substrate-catalyzed deposition on metal surfaces such as Cu(111), Cr(110), Ti(001), and oxygen-contaminated Cu(111) as a means of controllable synthesis of carbon nanomaterials. The combined dataset includes structures from the Carbon_GAP_20 dataset and additional configurations of carbon clusters on a Cu, Cr and Ti surfaces.
10.60732/e25bae2e
C, Cr
Di Zhang, Peiyun Yi, Xinmin Lai, Linfa Peng, Hao Li
1192
298114
2
DFT-PBE+D3
CP2K
50
Configurations of I from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K.
10.60732/57c54149
I
Christopher M. Andolina, Wissam A. Saidi
4436
113623
1
DFT-PBE
VASP
50
Test configurations with MD simulations performed at 1200K from 3BPA, used to showcase the performance of linear atomic cluster expansion (ACE) force fields in a machine learning model to predict the potential energy surfaces of organic molecules.
10.60732/397ba16b
C, H, N, O
Dávid Péter Kovács, Cas van der Oord, Jiri Kucera, Alice E. A. Allen, Daniel J. Cole, Christoph Ortner, Gábor Csányi
2139
57753
4
DFT-ωB97X
ORCA
50
10,000 configurations of SiO2 used as an example for the SIMPLE-NN machine learning model. Dataset includes three types of crystals: quartz, cristobalite and tridymite; amorphous; and liquid phase SiO2. Structures with distortion from compression, monoaxial strain and shear strain were also included in the training set.
10.60732/9903bf08
O, Si
Kyuhyun Lee, Dongsun Yoo, Wonseok Jeong, Seungwu Han
9997
599820
2
DFT-PBE
VASP
50
Training set for a magnetic Moment Tensor Potential (mMTP) for paramagnetic B1-CrN, created via active learning. Contains 2423 configurations of 64-atom CrN supercells with collinear atomic magnetic moments and magnetic forces (negative derivatives of energy with respect to magnetic moments, in eV/mu_B). Configurations generated using constrained DFT (cDFT) with ABINIT and PAW PBE pseudopotentials with a 6x6x6 k-point mesh and 25 Hartree plane-wave cutoff energy. The fitted mMTP accurately reproduces elastic constants, phonon spectrum, linear thermal expansion coefficient, and specific heat capacity of paramagnetic B1-CrN, with thermal properties (quasi-harmonic approximation) in good agreement with experimental results. Note: ColabFit dataset contains energy, atomic forces, and stress. Refer to the original files for per-atom magnetic moment and magnetic force data.
Cr, N
Alexey S. Kotykhov, Max Hodapp, Christian Tantardini, Konstantin Kravtsov, Ivan Kruglov, Alexander V. Shapeev, Ivan S. Novikov
1702
150080
2
DFT-PBE
ABINIT
50
588 structures selected from the AIMD simulation of the Cu(111) slab, including both the C1-C18 clusters on the Cu(111) slab. This dataset was one of the datasets used in testing screening parameters during the process of producing an active learning dataset for Cu-C interactions for the purposes of exploring substrate-catalyzed deposition as a means of controllable synthesis of carbon nanomaterials. The combined dataset includes structures from the Carbon_GAP_20 dataset and additional configurations of carbon clusters on a Cu(111) surface.
10.60732/9f0e607d
C, Cu
Di Zhang, Peiyun Yi, Xinmin Lai, Linfa Peng, Hao Li
588
115460
2
DFT-PBE+D3
CP2K
49
Dataset containing DFT calculations of energy and forces for all configurations in the QM9 dataset, recalculated with the ωB97X functional and 6-31G(d) basis set. Recalculating the energy and forces causes a slight shift of the potential energy surface, which results in forces acting on most configurations in the dataset. The data was generated by running Nudged Elastic Band (NEB) calculations with DFT on 10k reactions while saving intermediate calculations. QM9x is used as a benchmarking and comparison dataset for the dataset Transition1x.
10.60732/1edbb6e0
C, F, H, N, O
Mathias Schreiner, Arghya Bhowmik, Tejs Vegge, Jonas Busk, Ole Winther
133871
2407494
5
DFT-ωB97X
ORCA 5.0.2
49
Configurations of water, acetonitrile and methanol, simulated with ASE and modeled using a variety of software and methods: GAP, SchNet, GDML, ORCA and mbGDML. Forces and potential energy included; metadata includes kinetic energy and velocities.
10.60732/717087e2
C, H, N, O
Alex M. Maldonado
24509
711324
4
IP-SchNet, GFN2-xTB, IP-mbGDML, IP-GAP, MP2
ORCA
49
Validation dataset from xxMD-CASSCF. The xxMD (Extended Excited-state Molecular Dynamics) dataset is a comprehensive collection of non-adiabatic trajectories encompassing several photo-sensitive molecules. This dataset challenges existing Neural Force Field (NFF) models with broader nuclear configuration spaces that span reactant, transition state, product, and conical intersection regions, making it more chemically representative than its contemporaries. xxMD is divided into two datasets, each with corresponding train, test and validation splits. xxMD-CASSCF contains calculations generated using state-averaged complete active state self-consistent field (SA-CASSCF) electronic theory. xxMD-DFT contains recalculated single-point spin-polarized (unrestricted) DFT values.
10.60732/cea2a8c1
C, H, N, O, S
Zihan Pengmei, Yinan Shu, Junyu Liu
21616
402369
5
SA-CASSCF
OpenMolcas 22.06
49
Dataset (DFT-10B) contains structures of the 10 binary alloys AgCu, AlFe, AlMg, AlNi, AlTi, CoNi, CuFe, CuNi, FeV, and NbNi. Each alloy system includes all possible unit cells with 1-8 atoms for face-centered cubic (fcc) and body-centered cubic (bcc) crystal types, and all possible unit cells with 2-8 atoms for the hexagonal close-packed (hcp) crystal type. This results in 631 fcc, 631 bcc, and 333 hcp structures, yielding 1595 x 10 = 15,950 unrelaxed structures in total. Lattice parameters for each crystal structure were set according to Vegard's law. Total energies were computed using DFT with projector-augmented wave (PAW) potentials within the generalized gradient approximation (GGA) of Perdew, Burke, and Ernzerhof (PBE) as implemented in the Vienna Ab Initio Simulation Package (VASP). The k-point meshes for sampling the Brillouin zone were constructed using generalized regular grids.
10.60732/941b9553
Ag, Al, Co, Cu, Fe, Mg, Nb, Ni, Ti, V
Chandramouli Nyshadham, Matthias Rupp, Brayden Bekker, Alexander V. Shapeev, Tim Mueller, Conrad W. Rosenbrock, Gábor Csányi, David W. Wingate, Gus L. W. Hart
15920
116380
10
DFT-PBE
VASP
49
Lowest-energy structures with up to 4 heavy atoms from Vector-QM24 (VQM24) with properties calculated using diffusion quantum Monte Carlo (DMC) after DFT optimization. Vector-QM24 is a quantum chemistry dataset of ~836 thousand small organic and inorganic molecules. Dataset covers all possible neutral closed-shell small organic and inorganic molecules with up to five heavy (p-block) atoms: C, N, O, F, Si, P, S, Cl, Br.
Br, C, Cl, F, H, N, O, P, S, Si
Danish Khan, Anouar Benali, Scott Y. H. Kim, Guido Falk von Rudorff, O. Anatole von Lilienfeld
10780
79933
10
DMC-PBE0-ccECP
QMCPACK
49
Configurations of Sb from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K.
10.60732/7980ece8
Sb
Christopher M. Andolina, Wissam A. Saidi
5107
115196
1
DFT-PBE
VASP
48
Configurations from a cG-SchNet trained on a subset of the QM9dataset. Model was trained with the intention of providing molecules withspecified functional groups or motifs, relying on sampling of molecularfingerprint data. Relaxation data for the generated molecules is computedusing ORCA software. Configuration sets include raw data fromcG-SchNet-generated configurations, with models trained on several differenttypes of target data and DFT relaxation data as a separate configurationset. Includes approximately 80,000 configurations.
10.60732/de8af6a2
C, F, H, N, O
Niklas W.A. Gebauer, Michael Gastegger, Stefaan S.P. Hessmann, Klaus-Robert Müller, Kristof T. Schütt
23632
418729
5
IP-cgSchNet
ORCA
48
This database contains computationally generated atomic structures of glass-ceramics lithium thiophosphates (gc-LPS) with the general composition (Li2S)x(P2S5)1-x. Total energies and interatomic forces from density-functional theory (DFT) calculations are included. The DFT calculations used projector-augmented-wave (PAW) pseudopotentials and the Perdew-Burke-Ernzerhof (PBE) exchange-correlation functional as implemented in the Vienna Ab Initio Simulation Package (VASP) and a kinetic energy cutoff of 520 eV. The first Brillouin zone was sampled using VASP's fully automatic k-point scheme with a length parameter Rk = 25Å. The gc-LPS structures were generated using a combination of different sampling methods. Initial amorphous structure models were generated with ab initio molecular dynamics (AIMD) simulations of supercells at 1200 K using a Nose-Hoover thermostat with a time step of 1 fs. To obtain near-ground-state structures as reference for the machine-learning potential, 150 evenly spaced snapshots were extracted from the AIMD trajectories that were reoptimized with DFT geometry optimizations at zero Kelvin. Additional structures were generated by scaling the lattice parameters of the crystalline LPS structures (see below) by ±15% and perturbing atomic positions in AIMD simulations as described above.The resulting database was used to train a specialized ANN potential for the sampling of structures along the Li2S-P2S5 composition line with a genetic-algorithm (GA) as implemented in the atomistic evolution (ævo) package, following a previously reported protocol. Starting from supercells of the ideal crystal structures, either Li and S atoms were removed with a ratio of 2:1, or P and S atoms were removed with a ratio of 2:5, and low-energy configurations were determined with GA sampling. A population size of 32 trials and a mutation rate of 10% were employed. The ANN potential was iteratively refined by including additional sampled structures in the training. For each composition, at least 10 lowest energy structure models identified with the ANN-GA approach were selected and fully relaxed with DFT.Also included in the present database are the XSF files of the previously reported crystalline phases LiPS3, Li2PS3, Li4P2S7, Li7P3S11, α-Li3PS4, β-Li3PS4, γ-Li3PS4, and Li48P16S61. The crystal structures were obtained from the Inorganic Crystal Structure Database (ICSD). the Materials Project (MP) database, the Open Quantum Materials Database (OQMD), and the AFLOW database. The configuration names indicate the journal reference and the database.
10.60732/0a15fe72
Li, P, S
Haoyue Guo, Nongnuch Artrith
6055
264604
3
DFT-PBE
VASP
48
Validation configurations from the SAIT_semiconductors_ACS_2023_SiN dataset. This dataset contains SiN, Si and N configurations from the SAIT semiconductors datasets. SAIT semiconductors datasets comprise two rich datasets for the important semiconductor thin film materials silicon nitride (SiN) and hafnium oxide (HfO), gathered for the development of MLFFs. DFT simulations were conducted under various conditions that include differing initial structures, stoichiometry, temperature, strain, and defects.
10.60732/1eaf36bf
N, Si
Geonu Kim, Byunggook Na, Gunhee Kim, Hyuntae Cho, Seung-Jin Kang, Hee Sun Lee, Saerom Choi, Heejae Kim, Seungwon Lee, Yongdeok Kim
2822
159951
2
DFT-PBE
VASP
48
A dataset created as part of a combination DFT-ML approach to study three alkali metals (K, Li, Na) in model carbon systems at a range of densities and degrees of disorder. The purpose of the study was to investigate the properties of alkali metals in hard (non-graphitising) and nanoporous carbons as potential anode materials for battery technology.
10.60732/441f40b7
C, K, Li, Na
Jian-Xing Huang, Gábor Csányi, Jin-Bao Zhao, Jun Cheng, Volker L. Deringer
1365
298050
4
DFT-optB88-vdW
VASP 5.4.4
48
Validation set from COLL. Consists of configurations taken from molecular collisions of different small organic molecules. Energies and forces for 140,000 random snapshots taken from these trajectories were recomputed with density functional theory (DFT). These calculations were performed with the revPBE functional and def2-TZVP basis, including D3 dispersion corrections
10.60732/a1ccb643
C, H, O
Johannes Gasteiger, Shankari Giri, Johannes T. Margraf, Stephan Günnemann
9999
101829
3
DFT-revPBE+D3
ORCA
48
The dataset consists of energies and forces for pristine and defected monolayer graphene, bilayer graphene, and graphite in various states. The configurations in the dataset are generated in two ways: (1) crystals with distortions (compression and stretching of the simulation cell together with random perturbations of atoms), and (2) configura- tions drawn from ab initio molecular dynamics (AIMD) trajectories at 300, 900, and 1500 K. For monolayer graphene, the configurations include: * pristine - In-plane compressed and stretched monolayers - AIMD trajectories * defected - Configurations from the minimization of a monolayer with a single vacancy - AIMD trajectories of monolayers with a single vacancy For bilayer graphene, the configurations include: * pristine - AB-stacked bilayers with compression and stretching in the basal plane - Bilayers with different translational registry (e.g. AA, AB, and SP) at various layer separations - Twisted bilayers with different twisting angles at various layer separations - AIMD trajectories of twisted bilayers and bilayers in AB and AA stackings * defected - Configurations from the minimization of a bilayer with a single vacancy in each layer - AIMD trajectories of a bilayer with a single vacancy in one layer and the other layer pristine - AIMD trajectories of a bilayer with a single vacancy in each layer; Initial configuration without interlayer bonds - AIMD trajectories of a bilayer with a single vacancy in each layer; Initial configuration with interlayer bonds formed For graphite, the configurations include: * pristine - Graphite with compression and stretching in the basal plane - Graphite with compression and stretching along the c-axis - AIMD trajectories
10.60732/ce311990
C
Mingjian Wen, Ellad B. Tadmor
14179
656204
1
DFT-PBE
VASP 5.x.x
48
Energies of the isolated atoms evalauted at the reference DFT settings. Acetylacetone dataset generated from a long molecular dynamics simulation at 300 K using a Langevin thermostat at the semi-empirical GFN2-xTB level of theory. Configurations were sampled at an interval of 1 ps and the resulting set of configurations were recomputed with density functional theory using the PBE exchange-correlation functional with D3 dispersion correction and def2-SVP basis set and VeryTightSCF convergence settings using the ORCA electronic structure package.
10.60732/1e359db4
C, H, O
Ilyes Batatia, Simon Batzner, Dávid Péter Kovács, Albert Musaelian, Gregor N. C. Simm, Ralf Drautz, Christoph Ortner, Boris Kozinsky, Gábor Csányi
3
3
3
DFT-PBE+D3
ORCA 5.0
47
The training + validation set from the doped CsPbI3 energetics dataset. This dataset was created to explore the effect of Cd and Pb substitutions on the structural stability of inorganic lead halide perovskite CsPbI3. CsPbI3 undergoes a direct to indirect band-gap phase transition at room temperature. The dataset contains configurations of CsPbI3 with low levels of Cd and Zn, which were used to train a GNN model to predict the energetics of structures with higher levels of substitutions.
10.60732/16af950e
Cd, Cs, I, Pb, Zn
Roman A. Eremin, Innokentiy S. Humonen, Alexey A. Kazakov, Vladimir D. Lazarev, Anatoly P. Pushkarev, Semen A. Budennyy
140
22400
5
DFT-PBE
VASP
47
Training configurations with MD simulations performed at 300K from 3BPA, used to showcase the performance of linear atomic cluster expansion (ACE) force fields in a machine learning model to predict the potential energy surfaces of organic molecules.
10.60732/5f5bae68
C, H, N, O
Dávid Péter Kovács, Cas van der Oord, Jiri Kucera, Alice E. A. Allen, Daniel J. Cole, Christoph Ortner, Gábor Csányi
500
13500
4
DFT-ωB97X
ORCA
47
Approximately 6,500 configurations of Sn, including Sn8, Sn16 and Sn32, used in developing a deep potential that predicts the phase diagram of Sn.
10.60732/7d8a06fe
Sn
Tao Chen, Fengbo Yuan, Jianchuan Liu, Huayun Geng, Linfeng Zhang, Han Wang, Mohan Chen
6612
111768
1
DFT-SCAN
VASP
47
Configurations of dmabn from WS22. The WS22 database combines Wigner sampling with geometry interpolation to generate 1.18 million molecular geometries equally distributed into 10 independent datasets of flexible organic molecules with varying sizes and chemical complexity. In addition to the potential energy and forces required to construct potential energy surfaces, the WS22 database provides several other quantum chemical properties, all obtained via single-point calculations for each molecular geometry. All quantum chemical calculations were performed with the Gaussian 09 program.
10.60732/ad4e82a6
C, H, N
Max Pinheiro Jr, Shuang Zhang, Pavlo O. Dral, Mario Barbatti
119994
2519874
3
DFT-PBE0
Gaussian 09
47
This dataset was used for the training of an MLIP for amorphous alumina (a-AlOx). Two configurations sets correspond to i) the actual training data and ii) additional reference data. Ab initio calculations were performedwith the Vienna Ab initio Simulation Package. The projector augmented wave method was used to treat the atomic core electrons,and the Perdew-Burke-Ernzerhof functional within the generalized gradient approximation was used to describe the electron-electron interactions. The cutoff energy for the plane-wave basis set was set to 550 eV during the ab initio calculation. The obtained reference database includes the DFT energies of 41,203 structures. The supercell size of the AlOx reference structures varied from 24 to 132 atoms. K-point values are given for structures with: Al0, Al12, Al24, Al48 and Al192.
10.60732/96296d27
Al, O
Wenwen Li, Yasunobu Ando, Satoshi Watanabe
123560
4541194
2
DFT-PBE
VASP
47
The JARVIS_Open_Catalyst_100K dataset is part of the joint automated repository for various integrated simulations (JARVIS) DFT database. This subset contains configurations from the 100K training, rest validation and test dataset from the Open Catalyst Project (OCP). JARVIS is a set of tools and datasets built to meet current materials design challenges.
10.60732/ae1c7e2f
Ag, Al, As, Au, B, Bi, C, Ca, Cd, Cl, Co, Cr, Cs, Cu, Fe, Ga, Ge, H, Hf, Hg, In, Ir, K, Mn, Mo, N, Na, Nb, Ni, O, Os, P, Pb, Pd, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Tc, Te, Ti, Tl, V, W, Y, Zn, Zr
Lowik Chanussot, Abhishek Das, Siddharth Goyal, Thibaut Lavril, Muhammed Shuaibi, Morgane Riviere, Kevin Tran, Javier Heras-Domingo, Caleb Ho, Weihua Hu, Aini Palizhati, Anuroop Sriram, Brandon Wood, Junwoong Yoon, Devi Parikh, C. Lawrence Zitnick, Zachary Ulissi
124929
9719646
56
DFT-rPBE
VASP
47
TiO2 dataset that was designed to build atom neural network potentials (ANN) by Artrith et al. using the AENET package. This dataset includes various crystalline phases of TiO2 and MD data that are extracted from ab inito calculations. The dataset includes 7815 structures with 165,229 atomic environments in the stochiometric ratio of 66% O to 34% Ti.
10.60732/861c6a25
O, Ti
Nongnuch Artrith, Alexander Urban
7809
165080
2
DFT-PBE
Quantum ESPRESSO
47
This dataset was created for the purpose of training an MLIP for silica (SiO2). For initial DFT computations, GPAW (in combination with ASE) was used with LDA, PBE and PBEsol functionals; and VASP with the SCAN functional. All calculations used the projector augmented-wave method. After comparison, it was found that SCAN performed best, and all values were recalculated using SCAN. An energy cut-off of 900 eV and a k-spacing of 0.23 Å-1 were used.
10.60732/c2bee5fa
O, Si
Linus C. Erhard, Jochen Rohrer, Karsten Albe, Volker L. Deringer
3074
268118
2
DFT-SCAN
VASP
47
Reference C, H, O, and N atoms from 3BPA, used to showcase the performance of linear atomic cluster expansion (ACE) force fields in a machine learning model to predict the potential energy surfaces of organic molecules.
10.60732/bfdb46b7
C, H, N, O
Dávid Péter Kovács, Cas van der Oord, Jiri Kucera, Alice E. A. Allen, Daniel J. Cole, Christoph Ortner, Gábor Csányi
4
4
4
DFT-ωB97X
ORCA
46
The JARVIS_EPC_2D dataset is part of the joint automated repository for various integrated simulations (JARVIS) DFT database. This subset contains configurations sourced from the JARVIS-DFT-2D dataset, rerelaxed with Quantum ESPRESSO. JARVIS is a set of tools and datasets built to meet current materials design challenges.
10.60732/c7d2c9cd
Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cl, Co, Cr, Cu, F, Fe, Ga, Ge, H, Hf, I, In, Ir, K, La, Li, Mg, Mo, N, Na, Nb, Ni, O, P, Pb, Pd, Pt, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Te, Ti, Tl, V, W, Y, Zn, Zr
Daniel Wines, Kamal Choudhary, Adam J. Biacchi, evin F. Garrity, Francesca Tavazza
161
788
55
DFT-PBEsol
Quantum ESPRESSO
46
Succinic acid test PBE0-MBD dataset from "Semi-local and hybrid functional DFT data for thermalised snapshots of polymorphs of benzene, succinic acid, and glycine". DFT reference energies and forces were calculated using Quantum Espresso v6.3. The calculations were performed with the semi-local PBE xc functional, Tkatchenko-Scheffler dispersion correction, optimised norm-conserving Vanderbilt pseudopotentials, a Monkhorst-Pack k-point grid with a maximum spacing of 0.06 x 2π A^-1, and a plane-wave energy cut-off of 100 Ry for the wavefunction.
10.60732/cfea7523
C, H, O
Venkat Kapil, Edgar A. Engel
200
5600
3
DFT-PBE+TS
Quantum ESPRESSO v6.3
46
Training data generated for GAP-20. GAP-20 describes the properties of the bulk crystalline and amorphous phases, crystal surfaces, and defect structures with an accuracy approaching that of direct ab initio simulation, but at a significantly reduced cost. The final potential is fitted to reference data computed using the optB88-vdW density functional theory (DFT) functional.
10.60732/9d095830
C
Patrick Rowe, Volker L. Deringer, Piero Gasparotto, Gábor Csányi, Angelos Michaelides
6088
400275
1
DFT-optB88-vdW
VASP
46
Validation dataset from xxMD-DFT. The xxMD (Extended Excited-state Molecular Dynamics) dataset is a comprehensive collection of non-adiabatic trajectories encompassing several photo-sensitive molecules. This dataset challenges existing Neural Force Field (NFF) models with broader nuclear configuration spaces that span reactant, transition state, product, and conical intersection regions, making it more chemically representative than its contemporaries. xxMD is divided into two datasets, each with corresponding train, test and validation splits. xxMD-CASSCF contains calculations generated using state-averaged complete active state self-consistent field (SA-CASSCF) electronic theory. xxMD-DFT contains recalculated single-point spin-polarized (unrestricted) DFT values.
10.60732/bd646241
C, H, N, O, S
Zihan Pengmei, Yinan Shu, Junyu Liu
21605
402142
5
DFT-M06
Psi4
46
ANI-1xnr was developed to train the ANI-1xnr model, intended to model reactive chemistry. Specifically, ANI-1xnr is meant to represent carbon solid-phase nucleation, graphene ring formation from acetylene, biofuel additives, combustion of methane and the spontaneous formation of glycine from early earth small molecules. The dataset was generated using an active learning method in which ab initio nanoreactor simulations supplied MLIP training; the MLIP was subsequently tested and new simulations were generated based on structures tested with high uncertainty to supply the next cycle of MLIP training.
10.60732/ad56ac0a
C, H, N, O
Shuhao Zhang, Małgorzata Z. Makoś, Ryan B. Jadrich, Elfi Kraka, Kipton Barros, Benjamin T. Nebgen, Sergei Tretiak, Olexandr Isayev, Nicholas Lubbers, Richard A. Messerly, Justin S. Smith
196550
27209270
4
KS-DFT-BLYP+D3
CP2K
45
Dataset containing MD trajectories of the buckyball-catcher supramolecule from the MD22 benchmark set. MD22 represents a collection of datasets in a benchmark that can be considered an updated version of the MD17 benchmark datasets, including more challenges with respect to system size, flexibility and degree of non-locality. The datasets in MD22 include MD trajectories of the protein Ac-Ala3-NHMe; the lipid DHA (docosahexaenoic acid); the carbohydrate stachyose; nucleic acids AT-AT and AT-AT-CG-CG; and the buckyball catcher and double-walled nanotube supramolecules. Each of these is included here in a separate dataset, as represented on sgdml.org. Calculations were performed using FHI-aims and i-Pi software at the DFT-PBE+MBD level of theory. Trajectories were sampled at temperatures between 400-500 K at 1 fs resolution.
10.60732/3ac33c6f
C, H
Stefan Chmiela, Valentin Vassilev-Galindo, Oliver T. Unke, Adil Kabylda, Huziel E. Sauceda, Alexandre Tkatchenko, Klaus-Robert Müller
6102
903096
2
DFT-PBE+MBE
FHI-aims
45
The rattled-300-subsampled training split of OMat24 (Open Materials 2024). OMat24 is a large-scale open dataset of density functional theory (DFT) calculations. The dataset is available in subdatasets and subsampled sub-datasets based on the structure generation strategy used. There are two main splits in OMat24: train and validation, each divided into the aforementioned subsampling and sub-datasets.
10.60732/42702fb9
Ac, Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr
Luis Barroso-Luque, Muhammed Shuaibi, Xiang Fu, Brandon M. Wood, Misko Dzamba, Meng Gao, Ammar Rizvi, C. Lawrence Zitnick, Zachary W. Ulissi
3463993
49674369
88
DFT-PBE+U
VASP
45
This dataset comprises a training dataset for magnetic multi-component machine-learning potentials for Fe-Al systems, including different concentrations of Fe and Al (Al concentrations from 0%-50%), with fully equilibrated and perturbed atomic positions, lattice vectors and magnetic moments represented.
10.60732/9d635e27
Al, Fe
Alexey S. Kotykhov, Konstantin Gubaev, Max Hodapp, Christian Tantardini, Alexander V. Shapeev, Ivan S. Novikov
434
6944
2
DFT-PBE
ABINIT
45
1590 configurations of H2O/water with total energy and forces calculated using a hybrid approach at DFT/revPBE0-D3 level of theory.
10.60732/07f8deb4
H, O
Bingqing Cheng, Edgar A. Engel, Jörg Behler, Christoph Dellago, Michele Ceriotti
1588
304896
2
DFT-revPBE0+D3
CP2K
45
This dataset was originally designed to fit a GAP potential with a specific focus on properties relevant for simulations of radiation-induced collision cascades and the damage they produce, including a realistic repulsive potential for short-range many-body cascade dynamics and a good description of the liquid phase.
10.60732/6367ea51
W
Jesper Byggmästar, Ali Hamedani, Kai Nordlund, Flyura Djurabekova
3528
42068
1
DFT-PBE
VASP
45
The rattled-300 training split of OMat24 (Open Materials 2024). OMat24 is a large-scale open dataset of density functional theory (DFT) calculations. The dataset is available in subdatasets and subsampled sub-datasets based on the structure generation strategy used. There are two main splits in OMat24: train and validation, each divided into the aforementioned subsampling and sub-datasets.
10.60732/444965da
Ac, Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr
Luis Barroso-Luque, Muhammed Shuaibi, Xiang Fu, Brandon M. Wood, Misko Dzamba, Meng Gao, Ammar Rizvi, C. Lawrence Zitnick, Zachary W. Ulissi
6319089
89791992
88
DFT-PBE+U
VASP
44
Binning-binning configurations from CA-9 dataset used during validation step for NNP_BB potential. CA-9 consists of configurations of carbon with curated subsets chosen to test the effects of intentionally choosing dissimilar configurations when training neural network potentials
10.60732/95c19122
C
Daniel Hedman, Tom Rothe, Gustav Johansson, Fredrik Sandin, J. Andreas Larsson, Yoshiyuki Miyamoto
4003
233034
1
DFT-PBE
VASP
44
Training dataset from xxMD-DFT. The xxMD (Extended Excited-state Molecular Dynamics) dataset is a comprehensive collection of non-adiabatic trajectories encompassing several photo-sensitive molecules. This dataset challenges existing Neural Force Field (NFF) models with broader nuclear configuration spaces that span reactant, transition state, product, and conical intersection regions, making it more chemically representative than its contemporaries. xxMD is divided into two datasets, each with corresponding train, test and validation splits. xxMD-CASSCF contains calculations generated using state-averaged complete active state self-consistent field (SA-CASSCF) electronic theory. xxMD-DFT contains recalculated single-point spin-polarized (unrestricted) DFT values.
10.60732/8e7f6d7c
C, H, N, O, S
Zihan Pengmei, Yinan Shu, Junyu Liu
43385
807298
5
DFT-M06
Psi4
44
Approximately 850 configurations of CoSb3 and Mg3Sb2 generated using a dual adaptive sampling (DAS) method for use with machine learning of interatomic potentials (MLIP).
10.60732/d28a2c1d
Mg, Sb
Hongliang Yang, Yifan Zhu, Erting Dong, Yabei Wu, Jiong Yang, Wenqing Zhang
846
247744
2
DFT-PBE
VASP
43
tmQM_wB97MV contains configurations from the tmQM dataset, with several structures from tmQM that were found to be missing hydrogens filtered out, and energies of all other structures recomputed at the wB97M-V/def2-SVPD level of DFT.
10.60732/4144e554
Ag, As, Au, B, Br, C, Cd, Cl, Co, Cr, Cu, F, Fe, H, Hf, Hg, I, Ir, La, Mn, Mo, N, Nb, Ni, O, Os, P, Pd, Pt, Re, Rh, Ru, S, Sc, Se, Si, Ta, Tc, Ti, V, W, Y, Zn, Zr
Aaron G. Garrison, Javier Heras-Domingo, John R. Kitchin, Gabriel dos Passos Gomes, Zachary W. Ulissi, Samuel M. Blau
86501
5710563
44
DFT-ωB97M-V
Q-Chem
43
Test split from the 216-atom amorphous portion of the aC_JCP_2023 dataset. The amorphous carbon dataset was generated using ab initio calculations with VASP software. We utilized the LDA exchange-correlation functional and the PAW potential for carbon. Melt-quench simulations were performed to create amorphous and liquid-state structures. A simple cubic lattice of 216 carbon atoms was chosen as the initial state. Simulations were conducted at densities of 1.5, 1.7, 2.0, 2.2, 2.4, 2.6, 2.8, 3.0, 3.2, 3.4, and 3.5 g/cm3 to produce a variety of structures. The NVT ensemble was employed for all melt-quench simulations, and the density was adjusted by modifying the size of the simulation cell. A time step of 1 fs was used for the simulations. For all densities, only the Γ points were sampled in the k-space. To increase structural diversity, six independent simulations were performed.In the melt-quench simulations, the temperature was raised from 300 K to 9000 K over 2 ps to melt carbon. Equilibrium molecular dynamics (MD) was conducted at 9000 K for 3 ps to create a liquid state, followed by a decrease in temperature to 5000 K over 2 ps, with the system equilibrating at that temperature for 2 ps. Finally, the temperature was lowered from 5000 K to 300 K over 2 ps to generate an amorphous structure.During the melt-quench simulation, 30 snapshots were taken from the equilibrium MD trajectory at 9000 K, 100 from the cooling process between 9000 and 5000 K, 25 from the equilibrium MD trajectory at 5000 K, and 100 from the cooling process between 5000 and 300 K. This yielded a total of 16,830 data points.Data for diamond structures containing 216 atoms at densities of 2.4, 2.6, 2.8, 3.0, 3.2, 3.4, and 3.5 g/cm3 were also prepared. Further data on the diamond structure were obtained from 80 snapshots taken from the 2 ps equilibrium MD trajectory at 300 K, resulting in 560 data points.To validate predictions for larger structures, we generated data for 512-atom systems using the same procedure as for the 216-atom systems. A single simulation was conducted for each density. The number of data points was 2,805 for amorphous and liquid states
10.60732/4ca1927e
C
Emi Minamitani, Ippei Obayashi, Koji Shimizu, Satoshi Watanabe
3366
727056
1
DFT-LDA
VASP
43
Structures from the SAIT_semiconductors_ACS_2023_SiN dataset, separated into N-only, Si-only, SiN, and out-of-domain melt, quench and relax configuration sets. This dataset contains SiN, Si and N configurations from the SAIT semiconductors datasets. SAIT semiconductors datasets comprise two rich datasets for the important semiconductor thin film materials silicon nitride (SiN) and hafnium oxide (HfO), gathered for the development of MLFFs. DFT simulations were conducted under various conditions that include differing initial structures, stoichiometry, temperature, strain, and defects.
10.60732/ef14d3da
N, Si
Geonu Kim, Byunggook Na, Gunhee Kim, Hyuntae Cho, Seung-Jin Kang, Hee Sun Lee, Saerom Choi, Heejae Kim, Seungwon Lee, Yongdeok Kim
88111
5201559
2
DFT-PBE
VASP
42
The rattled-1000 validation split of OMat24 (Open Materials 2024). OMat24 is a large-scale open dataset of density functional theory (DFT) calculations. The dataset is available in subdatasets and subsampled sub-datasets based on the structure generation strategy used. There are two main splits in OMat24: train and validation, each divided into the aforementioned subsampling and sub-datasets.
10.60732/4623b5e6
Ac, Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Y, Yb, Zn, Zr
Luis Barroso-Luque, Muhammed Shuaibi, Xiang Fu, Brandon M. Wood, Misko Dzamba, Meng Gao, Ammar Rizvi, C. Lawrence Zitnick, Zachary W. Ulissi
117004
1657765
86
DFT-PBE+U
VASP
42
The a-AQUA dataset was generated to address the need for a training set for a water PES that includes 2-body, 3-body and 4-body interactions calculated at the CCSD(T) level of theory. Structures were selected from the existing HBB2-pol and MB-pol datasets. For each water dimer structure, CCSD(T)/aug-cc-pVTZ calculations were performed with an additional 3s3p2d1f basis set; exponents equal to (0.9, 0.3, 0.1) for sp, (0.6, 0.2) for d, and 0.3 for f. This additional basis is placed at the center of mass (COM) of each dimer configuration. The basis set superposition error (BSSE) correction was determined with the counterpoise scheme. CCSD(T)/aug-cc-pVQZ calculations were then performed with the same additional basis set and BSSE correction. Final CCSD(T)/CBS energies were obtained by extrapolation over the CCSD(T)/aug-cc-pVTZ and CCSD(T)/aug-cc-pVQZ 2-b energies. All ab initio calculations were performed using Molpro package.Trimer structures were calculated at CCSD(T)-F12a/aug-cc-pVTZ with BSSE correction. Four-body structure calculations were performed at CCSD(T)-F12 level.
10.60732/e8b084a9
H, O
Qi Yu, Chen Qu, Paul L. Houston, Riccardo Conte, Apurba Nandi, Joel M. Bowman
120162
877128
2
CCSD(T)/CBS, CCSD(T)-F12a, CCSD(T)-F12
MOLPRO
42
The rattled-500 training split of OMat24 (Open Materials 2024). OMat24 is a large-scale open dataset of density functional theory (DFT) calculations. The dataset is available in subdatasets and subsampled sub-datasets based on the structure generation strategy used. There are two main splits in OMat24: train and validation, each divided into the aforementioned subsampling and sub-datasets.
10.60732/95b8a22e
Ac, Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr
Luis Barroso-Luque, Muhammed Shuaibi, Xiang Fu, Brandon M. Wood, Misko Dzamba, Meng Gao, Ammar Rizvi, C. Lawrence Zitnick, Zachary W. Ulissi
6922153
98860300
88
DFT-PBE+U
VASP
42
9,200 configurations of beta-Ga2O3, including two configuration sets. One contains DFT data for 8400 configurations simulated between temperatures of 50K - 600K. The second contains configurations with 0K simulation temperature.
10.60732/6fd38e1f
Ga, O
Ruiyang Li, Zeyu Liu, Andrew Rohskopf, Kiarash Gordiz, Asegun Henry, Eungkyu Lee, Tengfei Luo
9200
2944000
2
DFT-QUICKSTEP
CP2K
42
Training simulations from CGM-MLP_natcomm2023 of carbon on a Cu metal surface. This dataset was one of the datasets used in training during the process of producing an active learning dataset for the purposes of exploring substrate-catalyzed deposition on metal surfaces such as Cu(111), Cr(110), Ti(001), and oxygen-contaminated Cu(111) as a means of controllable synthesis of carbon nanomaterials. The combined dataset includes structures from the Carbon_GAP_20 dataset and additional configurations of carbon clusters on a Cu, Cr and Ti surfaces.
10.60732/76552006
C, Cu
Di Zhang, Peiyun Yi, Xinmin Lai, Linfa Peng, Hao Li
520
122294
2
DFT-PBE+D3
CP2K
41
The rattled-500-subsampled training split of OMat24 (Open Materials 2024). OMat24 is a large-scale open dataset of density functional theory (DFT) calculations. The dataset is available in subdatasets and subsampled sub-datasets based on the structure generation strategy used. There are two main splits in OMat24: train and validation, each divided into the aforementioned subsampling and sub-datasets.
10.60732/ed9a1102
Ac, Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr
Luis Barroso-Luque, Muhammed Shuaibi, Xiang Fu, Brandon M. Wood, Misko Dzamba, Meng Gao, Ammar Rizvi, C. Lawrence Zitnick, Zachary W. Ulissi
3975399
56846329
89
DFT-PBE+U
VASP
41
Dataset for "Interplay between ferroelectricity and metallicity in BaTiO3", exploring properties of ferroelectric barium titanate (BaTiO3), including the effects of electron and hole doping. Includes configuration sets for unit cells and supercells of BaTiO3.
10.60732/9abdf618
Al, Ba, K, La, Nb, O, Sc, Ti, V
Veronica F. Michel, Tobias Esswein, Nicola A. Spaldin
1062
18715
9
DFT-PBEsol
VASP
41
Dataset generated using a committee-based active learning strategy to build a training dataset for modeling complex aqueous systems.
10.60732/07d278f0
B, C, F, H, Mo, N, O, S, Ti
Christoph Schran, Fabian L. Thiemann, Patrick Rowe, Erich A. Müller, Ondrej Marsalek, Angelos Michaelides
1786
681912
9
DFT-optB88-vdW, DFT-PBE+D3, DFT-revPBE0+D3, DFT-BLYP+D3
CP2K
41
Configurations of Nb from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K.
10.60732/2146db76
Nb
Christopher M. Andolina, Wissam A. Saidi
3114
54086
1
DFT-PBE
VASP
41
Test dataset from xxMD-DFT. The xxMD (Extended Excited-state Molecular Dynamics) dataset is a comprehensive collection of non-adiabatic trajectories encompassing several photo-sensitive molecules. This dataset challenges existing Neural Force Field (NFF) models with broader nuclear configuration spaces that span reactant, transition state, product, and conical intersection regions, making it more chemically representative than its contemporaries. xxMD is divided into two datasets, each with corresponding train, test and validation splits. xxMD-CASSCF contains calculations generated using state-averaged complete active state self-consistent field (SA-CASSCF) electronic theory. xxMD-DFT contains recalculated single-point spin-polarized (unrestricted) DFT values.
10.60732/690e82cc
C, H, N, O, S
Zihan Pengmei, Yinan Shu, Junyu Liu
21661
402856
5
DFT-M06
Psi4
41
53,841 structures of alpha-brass (less than 40% Zinc). Includes atomic forces and total energy. Calculated using VASP at the DFT level of theory.
10.60732/f127f7e7
Cu, Zn
Jan Weinreich, Anton Römer, Martín Leandro Paleico, Jörg Behler
53475
2951436
2
DFT-PBE
VASP
41
The training split of the Transition1x dataset. Transition1x is a benchmark dataset containing 9.6 million Density Functional Theory (DFT) calculations of forces and energies of molecular configurations on and around reaction pathways at the ωB97x/6-31 G(d) level of theory. The configurations contained in this dataset allow a better representation of features in transition state regions when compared to other benchmark datasets -- in particular QM9 and ANI1x.
10.60732/b1104cc5
C, H, N, O
Mathias Schreiner, Arghya Bhowmik, Tejs Vegge, Jonas Busk, Ole Winther
62988
535993
4
DFT-ωB97X
ORCA 5.0.2
41
The MAD benchmark dataset, containing a selection of MAD test, MPtrj, Alexandria, SPICE, MD22 and OC2020 datasets, computed with MAD DFT settings. Part of the MAD (Massive Atomic Diversity) dataset family. From the creators: Starting from relatively small sets of stable structures, the dataset is built to contain “massive atomic diversity” (MAD) by aggressively distorting these configurations, with near-complete disregard for the stability of the resulting configurations. The electronic structure details, on the other hand, are chosen to maximize consistency rather than to obtain the most accurate prediction fora given structure, or to minimize computational effort. The MAD dataset we present here, despite containing fewer than 100k structures, has already been shown to enable training universal interatomic potentials that are competitive with models trained on traditional datasets with two to three orders of magnitude more structures.
10.60732/b1f21e20
Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, O, Os, P, Pb, Pd, Pm, Pr, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Ti, Tl, Tm, V, W, Xe, Y, Yb, Zn, Zr
Arslan Mazitov, Sofiia Chorna, Guillaume Fraux, Marnik Bercx, Giovanni Pizzi, Sandip De, Michele Ceriotti
1884
44748
81
DFT-PBEsol
VASP
40
Configurations of urocanic from WS22. The WS22 database combines Wigner sampling with geometry interpolation to generate 1.18 million molecular geometries equally distributed into 10 independent datasets of flexible organic molecules with varying sizes and chemical complexity. In addition to the potential energy and forces required to construct potential energy surfaces, the WS22 database provides several other quantum chemical properties, all obtained via single-point calculations for each molecular geometry. All quantum chemical calculations were performed with the Gaussian 09 program.
10.60732/4b1f8c83
C, H, N, O
Max Pinheiro Jr, Shuang Zhang, Pavlo O. Dral, Mario Barbatti
119986
1919776
4
DFT-PBE0
Gaussian 09
40
Glycine validation PBE0-MBD dataset from "Semi-local and hybrid functional DFT data for thermalised snapshots of polymorphs of benzene, succinic acid, and glycine". DFT reference energies and forces were calculated using Quantum Espresso v6.3. The calculations were performed with the semi-local PBE xc functional, Tkatchenko-Scheffler dispersion correction, optimised norm-conserving Vanderbilt pseudopotentials, a Monkhorst-Pack k-point grid with a maximum spacing of 0.06 x 2π A^-1, and a plane-wave energy cut-off of 100 Ry for the wavefunction.
10.60732/b102ddfd
C, H, N, O
Venkat Kapil, Edgar A. Engel
200
7120
4
DFT-PBE+TS
Quantum ESPRESSO v6.3
39
The solvated protein fragments dataset was generated as a partner benchmark dataset, along with SN2, for measuring the performance of machine learning models, in particular PhysNet, at describing chemical reactions, long-range interactions, and condensed phase systems. The dataset contains structures for all possible "amons" (hydrogen-saturated covalently bonded fragments) of up to eight heavy atoms (C, N, O, S) that can be derived from chemical graphs of proteins containing the 20 natural amino acids connected via peptide bonds or disulfide bridges. For amino acids that can occur in different charge states due to (de)protonation (i.e., carboxylic acids that can be negatively charged or amines that can be positively charged), all possible structures with up to a total charge of +-2e are included. In total, the dataset provides reference energies, forces, and dipole moments for 2,731,180 structures calculated at the revPBE-D3(BJ)/def2-TZVP level of theory using ORCA 4.0.1.
10.60732/c4731f07
C, H, N, O, S
Oliver T. Unke, Markus Meuwly
2730942
58390211
5
DFT-revPBE+D3(BJ)
ORCA 4.0.1
39
The DFT-2D-3-12-2021 dataset is part of the joint automated repository for various integrated simulations (JARVIS) DFT database. This subset contains configurations of 2D materials. JARVIS is a set of tools and datasets built to meet current materials design challenges.
10.60732/8a437fac
Ac, Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cu, Dy, Er, F, Fe, Ga, Ge, H, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, Np, O, Os, P, Pa, Pb, Pd, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Y, Yb, Zn, Zr
Kamal Choudhary, Kevin F. Garrity, Andrew C. E. Reid, Brian DeCost, Adam J. Biacchi, Angela R. Hight Walker, Zachary Trautt, Jason Hattrick-Simpers, A. Gilad Kusne, Andrea Centrone, Albert Davydov, Jie Jiang, Ruth Pachter, Gowoon Cheon, Evan Reed, Ankit Agrawal, Xiaofeng Qian, Vinit Sharma, Houlong Zhuang, Sergei V. Kalinin, Bobby G. Sumpter, Ghanshyam Pilania, Pinar Acar, Subhasish Mandal, Kristjan Haule, David Vanderbilt, Karin Rabe, Francesca Tavazza
887
6230
81
DFT-optB88-vdW, DFT-TBmBJ
VASP
39
Dataset for "Analysis of minerals as electrode materials for Ca-based rechargeable batteries". Includes DFT structures of pyroxenes, garnet and carbonates. Dataset was produced to pursue identification of Ca-based high specific energy cathode materials.
10.60732/134fd579
C, Ca, Cr, Mn, O, Si
M. Elena Arroyo-de Dompablo, Jose Luis Casals
4726
550074
6
DFT-PBE
VASP
39
Data from the publication "Enlisting Potential Cathode Materials for Rechargeable Ca Batteries". The development of rechargeable batteries based on a Ca metal anode demands the identification of suitable cathode materials. This work investigates the potential application of a variety of compounds, which are selected from the In-organic Crystal Structural Database (ICSD) considering 3d-transition metal oxysulphides, pyrophosphates, silicates, nitrides, and phosphates with a maximum of four different chemical elements in their composition. Cathode perfor-mance of CaFeSO, CaCoSO, CaNiN, Ca3MnN3, Ca2Fe(Si2O7), CaM(P2O7) (M = V, Cr, Mn, Fe, Co), CaV2(P2O7)2, Ca(VO)2(PO4)2 and α-VOPO4 is evaluated throughout the calculation of operation voltages, volume changes associated to the redox reaction and mobility of Ca2+ ions. Some materials exhibit attractive specific capacities and intercalation voltages combined with energy barriers for Ca migration around 1 eV (CaFeSO, Ca2FeSi2O7 and CaV2(P2O7)2). Based on the DFT results, αI-VOPO4 is identified as a potential Ca-cathode with a maximum theoretical specific capacity of 312 mAh/g, an average intercalation voltage of 2.8 V and calculated energy barriers for Ca migration below 0.65 eV (GGA functional).
10.60732/49ecd7c5
Ca, Co, Fe, Mn, N, Ni, O, P, S, Si, V
M. Elena Arroyo-de Dompablo, Jose Luis Casals
10839
1034708
11
DFT-PBE
VASP
39
The rattled-300 validation split of OMat24 (Open Materials 2024). OMat24 is a large-scale open dataset of density functional theory (DFT) calculations. The dataset is available in subdatasets and subsampled sub-datasets based on the structure generation strategy used. There are two main splits in OMat24: train and validation, each divided into the aforementioned subsampling and sub-datasets.
10.60732/b3c0c67d
Ac, Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, Hf, Hg, Ho, I, In, Ir, K, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Zn, Zr
Luis Barroso-Luque, Muhammed Shuaibi, Xiang Fu, Brandon M. Wood, Misko Dzamba, Meng Gao, Ammar Rizvi, C. Lawrence Zitnick, Zachary W. Ulissi
62451
883431
84
DFT-PBE+U
VASP
39
40 graphite structures with different lattice constants ranging from 2.0 to 3.2 Å, with a 0.03 Å increment. This dataset was one of the datasets used in testing screening parameters during the process of producing an active learning dataset for Cu-C interactions for the purposes of exploring substrate-catalyzed deposition as a means of controllable synthesis of carbon nanomaterials. The combined dataset includes structures from the Carbon_GAP_20 dataset and additional configurations of carbon clusters on a Cu(111) surface.
10.60732/85590078
C
Di Zhang, Peiyun Yi, Xinmin Lai, Linfa Peng, Hao Li
41
1968
1
DFT-PBE+D3
CP2K
38
Data from "On-the-fly assessment of diffusion barriers of disordered transition metal oxyfluorides using local descriptors". The dataset contains the result of 48 Nudged Elastic Band calculations of Li(2-x)VO2F diffusion barriers. The NEB was performed with VASP, using projector augmented-wave (PAW) method to describe electron-ion interaction. The disordered rock salt cells were created using a 3 x 4 x 4 supercell containing 96 atoms (in case of no vacancies). PBE is used as XC functional while a rotationally invariant Hubbard U correction was applied to the d orbital of V with a U value of 3.25 eV.
10.60732/ada99db2
F, Li, O, V
Jin Hyun Chang, Peter Bjørn Jørgensen, Simon Loftager, Arghya Bhowmik, Juan María García Lastra, Tejs Vegge
233
20670
4
DFT-PBE+U
VASP
38
The rattled-500 validation split of OMat24 (Open Materials 2024). OMat24 is a large-scale open dataset of density functional theory (DFT) calculations. The dataset is available in subdatasets and subsampled sub-datasets based on the structure generation strategy used. There are two main splits in OMat24: train and validation, each divided into the aforementioned subsampling and sub-datasets.
10.60732/906de541
Ac, Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Y, Yb, Zn, Zr
Luis Barroso-Luque, Muhammed Shuaibi, Xiang Fu, Brandon M. Wood, Misko Dzamba, Meng Gao, Ammar Rizvi, C. Lawrence Zitnick, Zachary W. Ulissi
68830
985338
85
DFT-PBE+U
VASP
38
The training dataset for GST_GAP_22, recalculated using the PBE functional. GST-GAP-22 contains configurations of phase-change materials on the quasi-binary GeTe-Sb2Te3 (GST) line of chemical compositions. Data was used for training a machine learning interatomic potential to simulate a range of germanium-antimony-tellurium compositions under realistic device conditions.
10.60732/164f9a70
Ge, Sb, Te
Yuxing Zhou, Wei Zhang, Evan Ma, Volker L. Deringer
2690
341004
3
DFT-PBE
CASTEP
38
Glycine training PBE0-MBD dataset from "Semi-local and hybrid functional DFT data for thermalised snapshots of polymorphs of benzene, succinic acid, and glycine". DFT reference energies and forces were calculated using Quantum Espresso v6.3. The calculations were performed with the semi-local PBE xc functional, Tkatchenko-Scheffler dispersion correction, optimised norm-conserving Vanderbilt pseudopotentials, a Monkhorst-Pack k-point grid with a maximum spacing of 0.06 x 2π A^-1, and a plane-wave energy cut-off of 100 Ry for the wavefunction.
10.60732/358cb5ee
C, H, N, O
Venkat Kapil, Edgar A. Engel
3582
109570
4
DFT-PBE+TS
Quantum ESPRESSO v6.3
38
Configurations of Ge from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K.
10.60732/33767c4f
Ge
Christopher M. Andolina, Wissam A. Saidi
2810
188884
1
DFT-PBE
VASP
38
Dataset containing MD trajectories of the tetrasaccharide stachyose from the MD22 benchmark set. MD22 represents a collection of datasets in a benchmark that can be considered an updated version of the MD17 benchmark datasets, including more challenges with respect to system size, flexibility and degree of non-locality. The datasets in MD22 include MD trajectories of the protein Ac-Ala3-NHMe; the lipid DHA (docosahexaenoic acid); the carbohydrate stachyose; nucleic acids AT-AT and AT-AT-CG-CG; and the buckyball catcher and double-walled nanotube supramolecules. Each of these is included here in a separate dataset, as represented on sgdml.org. Calculations were performed using FHI-aims and i-Pi software at the DFT-PBE+MBD level of theory. Trajectories were sampled at temperatures between 400-500 K at 1 fs resolution.
10.60732/e2a66d93
C, H, O
Stefan Chmiela, Valentin Vassilev-Galindo, Oliver T. Unke, Adil Kabylda, Huziel E. Sauceda, Alexandre Tkatchenko, Klaus-Robert Müller
27272
2372664
3
DFT-PBE+MBE
FHI-aims
38
The JARVIS_AGRA_CO dataset is part of the joint automated repository for various integrated simulations (JARVIS) DFT database. This dataset contains data from the CO2 reduction reaction (CO2RR) dataset from Chen et al., as used in the automated graph representation algorithm (AGRA) training dataset: a collection of DFT training data for training a graph representation method to extract the local chemical environment of metallic surface adsorption sites. JARVIS is a set of tools and datasets built to meet current materials design challenges.
10.60732/3e9ff0b1
C, Co, Cu, Fe, Mo, Ni, O
Zhi Wen Chen, Zachary Gariepy, Lixin Chen, Xue Yao, Abu Anand, Szu-Jia Liu, Conrard Giresse Tetsassi Feugmo, Isaac Tamblyn, Chandra Veer Singh
194
12804
7
DFT-PBE
VASP
37
This iron nanoparticles database contains dimers; trimers; bcc, fcc, hexagonal close-packed (hcp), simple cubic, and diamond crystalline structures. A wide range of cell parameters, as well as rattled structures, bcc-fcc and bcc-hcp transitional structures, surface slabs cleaved from relaxed bulk structures, nanoparticles and liquid configurations are included. The energy, forces and virials for the atomic structures were computed at the DFT level of theory using VASP with the PBE functional and standard PAW pseudopotentials for Fe (with 8 valence electrons, 4s^23d^6). The kinetic energy cutoff for plane waves was set to 400 eV and the energy threshold for convergence was 10-7 eV. All the DFT calculations were carried out with spin polarization.
10.60732/20ba88af
Fe
Richard Jana, Miguel A. Caro
198
20097
1
DFT-PBE
VASP
37
A set of training configurations of hydrogenated liquid and amorphous silicon from the datasets for Si-H-GAP. Includes virial sigmas used for configurations used in the corresponding publication (virial-sigma-paper) as well as an alternate configuration defined by doubled virial sigma prefactors (from 0.025 to 0.05).
10.60732/43a0cef7
H, Si
Davis Unruh, Reza Vatan Meidanshahi, Stephen M. Goodnick, Gábor Csányi, Gergely T. Zimányi
392
65909
2
DFT-PBE
Quantum ESPRESSO
37
133,855 configurations of stable small organic molecules composed of CHONF. A subset of GDB-17, with calculations of energies, dipole moment, polarizability and enthalpy. Calculations performed at B3LYP/6-31G(2df,p) level of theory.
10.60732/b82731e4
C, F, H, N, O
Raghunathan Ramakrishnan, Pavlo O. Dral, Matthias Rupp, O. Anatole von Lilienfeld
133877
2407626
5
DFT-B3LYP
Gaussian 09
37
OC20_S2EF_val_ood_ads is the out-of-domain validation set of the OC20 Structure to Energy and Forces (S2EF) dataset featuring unseen adsorbate. Features include energy, atomic forces and data from the OC20 mappings file, including adsorbate id, materials project bulk id and miller index.
10.60732/d820e77c
Ag, Al, As, Au, B, Bi, C, Ca, Cd, Cl, Co, Cr, Cs, Cu, Fe, Ga, Ge, H, Hf, Hg, In, Ir, K, Mn, Mo, N, Na, Nb, Ni, O, Os, P, Pb, Pd, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Tc, Te, Ti, Tl, V, W, Y, Zn, Zr
Lowik Chanussot, Abhishek Das, Siddharth Goyal, Thibaut Lavril, Muhammed Shuaibi, Morgane Riviere, Kevin Tran, Javier Heras-Domingo, Caleb Ho, Weihua Hu, Aini Palizhati, Anuroop Sriram, Brandon Wood, Junwoong Yoon, Devi Parikh, C. Lawrence Zitnick, Zachary Ulissi
999838
72858155
56
DFT-rPBE
VASP
37
This dataset contains four trajectories of amorphous zeolitic imidazolate frameworks (ZIF-4), liquids calculated at four different volumes and at temperatures of 1500K and 1750K; and three trajectories of the ZIF-4 crystal: one at 300K and two at 1500K. Data was generated at the DFT-PBE-D3 level of theory.
10.60732/a6b0da5e
C, H, N, Zn
Nicolas Castel, Dune Andre, Connor Edwards, Jack D. Evans, Francois-Xavier Coudert
1189732
323607104
4
DFT-PBE+D3
CP2K
37
A dataset of 64-atom silicon configurations in four phases: cubic-diamond, (beta)-tin, R8, and liquid. MD simulations are run at 300, 600 and 900 K for solid phases; up to 2500 K for the L phase. All relaxations performed at zero pressure. Additional configurations prepared by random distortion of crystal structures. VASP was used with a PAW pseudopotential and PBE exchange correlation. k-point mesh was optimized for energy convergence of 0.5 meV/atom and stress convergence of 0.1 kbar. The plane wave energy cutoff was set to 300 eV. To reduce the correlation between data points MD, data were thinned by using one of every 100 consecutive structures from the MD simulations at 300 K and one of every 20 structures from higher temperature MD simulations.
10.60732/68b1a5ad
Si
Ekin D. Cubuk, Brad D. Malone, Berk Onat, Amos Waterland, Efthimios Kaxiras
1110
71040
1
DFT-PBE
VASP
37
3,000 Al-Ga-In sesquioxides with energies and band gaps. Relaxed and Vegard's Law geometries with formation energy and band gaps at DFT-PBE level of theory of (Alx-Gay-Inz)2O3 oxides, x+y+z=1. Contains all structures from the NOMAD 2018 Kaggle challenge training and leaderboard data. The formation energy and bandgap energy were computed by using the PBE exchange-correlation DFT functional with the all-electron electronic structure code FHI-aims with tight setting.
10.60732/e4af85f8
Al, Ga, In, O
Christopher Sutton, Luca M. Ghiringhelli, Takenori Yamamoto, Yury Lysogorskiy, Lars Blumenthal, Thomas Hammerschmidt, Jacek R. Golebiowski, Xiangyue Liu, Angelo Ziletti, Matthias Scheffler
3000
185070
4
DFT-PBE
FHI-aims
37
GAP-20 describes the properties of the bulk crystalline and amorphous phases, crystal surfaces, and defect structures with an accuracy approaching that of direct ab initio simulation, but at a significantly reduced cost. The final potential is fitted to reference data computed using the optB88-vdW density functional theory (DFT) functional.
10.60732/fd1b78a8
C
Patrick Rowe, Volker L. Deringer, Piero Gasparotto, Gábor Csányi, Angelos Michaelides
16906
1270764
1
DFT-optB88-vdW
VASP
37
The DFT with D2 vdW corrections split of the Graphene-hBN_and_Graphene-Graphene dataset. This dataset family (see other Graphene-hBN_and_Graphene_Graphene datasets) contains data for Graphene-Graphene and Graphene-hexagonal boron nitride (hBN) ab initio calculations for structures with different interlayer distances and disregistries, calculated using DFT with D2 van der Waals corrections, DFT with D3 van der Waals corrections, and QMC methods.
B, C, N
Kittithat Krongchon, Lucas K. Wagner, Tawfiqur Rakib, Daniel Palmer, Elif Ertekin, Harley T. Johnson
368
13248
3
DFT-PBE+D2
Quantum ESPRESSO
37
Succinic acid test PBE-TS dataset from "Semi-local and hybrid functional DFT data for thermalised snapshots of polymorphs of benzene, succinic acid, and glycine". DFT reference energies and forces were calculated using Quantum Espresso v6.3. The calculations were performed with the semi-local PBE xc functional, Tkatchenko-Scheffler dispersion correction, optimised norm-conserving Vanderbilt pseudopotentials, a Monkhorst-Pack k-point grid with a maximum spacing of 0.06 x 2π A^-1, and a plane-wave energy cut-off of 100 Ry for the wavefunction.
10.60732/0b56af0a
C, H, O
Venkat Kapil, Edgar A. Engel
500
14000
3
DFT-PBE+TS
Quantum ESPRESSO v6.3
36
The JARVIS-2DMatPedia dataset is part of the joint automated repository for various integrated simulations (JARVIS) database. This subset contains configurations with 2D materials from the 2DMatPedia database, generated through two methods: a top-down exfoliation approach, using structures of bulk materials from the Materials Project database; and a bottom-up approach, replacing each element in a 2D material with another from the same group (according to column number). JARVIS is a set of tools and datasets built to meet current materials design challenges.
10.60732/a2df077f
Ac, Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, Hf, Hg, Ho, I, In, Ir, K, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, Np, O, Os, P, Pa, Pb, Pd, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr
Jun Zhou, Lei Shen, Miguel Dias Costa, Kristin A. Persson, Shyue Ping Ong", "Patrick Huck, Yunhao Lu, Xiaoyang Ma, Yiming Chen, Hanmei Tang, Yuan Ping Feng
6351
66295
83
DFT-optB88-vdW
VASP
36
Configurations of toluene from WS22. The WS22 database combines Wigner sampling with geometry interpolation to generate 1.18 million molecular geometries equally distributed into 10 independent datasets of flexible organic molecules with varying sizes and chemical complexity. In addition to the potential energy and forces required to construct potential energy surfaces, the WS22 database provides several other quantum chemical properties, all obtained via single-point calculations for each molecular geometry. All quantum chemical calculations were performed with the Gaussian 09 program.
10.60732/c6e1f25a
C, H
Max Pinheiro Jr, Shuang Zhang, Pavlo O. Dral, Mario Barbatti
99995
1499925
2
DFT-PBE0
Gaussian 09
36
Test configurations from the SAIT_semiconductors_ACS_2023_HfO dataset. This dataset contains HfO configurations from the SAIT semiconductors datasets. SAIT semiconductors datasets comprise two rich datasets for the important semiconductor thin film materials silicon nitride (SiN) and hafnium oxide (HfO), gathered for the development of MLFFs. DFT simulations were conducted under various conditions that include differing initial structures, stoichiometry, temperature, strain, and defects.
10.60732/e7100354
Hf, O
Geonu Kim, Byunggook Na, Gunhee Kim, Hyuntae Cho, Seung-Jin Kang, Hee Sun Lee, Saerom Choi, Heejae Kim, Seungwon Lee, Yongdeok Kim
3510
336960
2
DFT-PBE
VASP
36
Approximately 9,100 configurations of Li10SiP2S12, based on crystal structures from the Materials Project database, material ID mp-696129. One of two LiSiPS datasets from this source. The other uses the PBEsol functional, rather than the PBE functional.
10.60732/a82feb87
Li, P, S, Si
Jianxing Huang, Linfeng Zhang, Han Wang, Jinbao Zhao, Jun Cheng, Weinan E
9150
2100050
4
DFT-PBE
VASP 5.4.4
36
Test configurations from the SAIT_semiconductors_ACS_2023_SiN dataset. This dataset contains SiN, Si and N configurations from the SAIT semiconductors datasets. SAIT semiconductors datasets comprise two rich datasets for the important semiconductor thin film materials silicon nitride (SiN) and hafnium oxide (HfO), gathered for the development of MLFFs. DFT simulations were conducted under various conditions that include differing initial structures, stoichiometry, temperature, strain, and defects.
10.60732/821cd3a8
N, Si
Geonu Kim, Byunggook Na, Gunhee Kim, Hyuntae Cho, Seung-Jin Kang, Hee Sun Lee, Saerom Choi, Heejae Kim, Seungwon Lee, Yongdeok Kim
2866
165559
2
DFT-PBE
VASP
36
This data set was originally used to generate a multi-component linear SNAP potential for tungsten and beryllium as published in Wood, M. A., et. al. Phys. Rev. B 99 (2019) 184305. This data set was developed for the purpose of studying plasma material interactions in fusion reactors.
10.60732/7500db4b
Be, W
Mitchell A. Wood, Mary Alice Cusentino, Brian D. Wirth, Aidan P. Thompson
25055
524332
2
DFT-PBE
VASP
36
Glycine training PBE-TS dataset from "Semi-local and hybrid functional DFT data for thermalised snapshots of polymorphs of benzene, succinic acid, and glycine". DFT reference energies and forces were calculated using Quantum Espresso v6.3. The calculations were performed with the semi-local PBE xc functional, Tkatchenko-Scheffler dispersion correction, optimised norm-conserving Vanderbilt pseudopotentials, a Monkhorst-Pack k-point grid with a maximum spacing of 0.06 x 2π A^-1, and a plane-wave energy cut-off of 100 Ry for the wavefunction.
10.60732/76aa925f
C, H, N, O
Venkat Kapil, Edgar A. Engel
29067
952530
4
DFT-PBE+TS
Quantum ESPRESSO v6.3
36
Random-random configurations from CA-9 dataset used during validation step for NNP_RR potential. CA-9 consists of configurations of carbon with curated subsets chosen to test the effects of intentionally choosing dissimilar configurations when training neural network potentials
10.60732/1005a764
C
Daniel Hedman, Tom Rothe, Gustav Johansson, Fredrik Sandin, J. Andreas Larsson, Yoshiyuki Miyamoto
4001
218129
1
DFT-PBE
VASP
36
Structures from Vector-QM24 (VQM24) that converged to saddle points during relaxation, with properties calculated using DFT. Vector-QM24 is a quantum chemistry dataset of ~836 thousand small organic and inorganic molecules. Dataset covers all possible neutral closed-shell small organic and inorganic molecules with up to five heavy (p-block) atoms: C, N, O, F, Si, P, S, Cl, Br.
Br, C, Cl, F, H, N, O, P, S, Si
Danish Khan, Anouar Benali, Scott Y. H. Kim, Guido Falk von Rudorff, O. Anatole von Lilienfeld
51072
524617
10
DFT-ωB97X+D3
Psi4
36
Test set from W_LML-retrain dataset, containing bulk tungsten calculations. The W_LML-retrain dataset contains DFT calculations used in testing a linear-in-descriptor machine learning potential that accounts for dislocation-defect interactions in tungsten. Density functional simulations were performed using VASP. The PBE generalised gradient approximation was used to describe effects of electron exchange and correlation together with a projector augmented wave (PAW) basis set with a cut-off energy of 550 eV. Occupancies were smeared with a Methfessel-Paxton scheme of order one with a 0.1 eV smearing width. The Brillouin zone was sampled with a Monkhorst-Pack k-point grid for the 2D cluster simulations periodic along the dislocation line and a single k-point was used for the calculations with 3D spherical QM regions. The values of these parameters were chosen after a series of convergence tests on forces with a tolerance of a few meV/Å.
10.60732/9d48595f
W
Berk Onat, Christoph Ortner, James R. Kermode
8
1996
1
DFT-PBE
VASP
35
The JARVIS_AGRA_CHO dataset is part of the joint automated repository for various integrated simulations (JARVIS) DFT database. This dataset contains data from the CO2 reduction reaction (CO2RR) dataset from Chen et al., as used in the automated graph representation algorithm (AGRA) training dataset: a collection of DFT training data for training a graph representation method to extract the local chemical environment of metallic surface adsorption sites. JARVIS is a set of tools and datasets built to meet current materials design challenges.
10.60732/91b60711
C, Co, Cu, Fe, H, Mo, Ni, O
Zhi Wen Chen, Zachary Gariepy, Lixin Chen, Xue Yao, Abu Anand, Szu-Jia Liu, Conrard Giresse Tetsassi Feugmo, Isaac Tamblyn, Chandra Veer Singh
216
14472
8
DFT-PBE+D3
VASP
35
Configurations of alanine from WS22. The WS22 database combines Wigner sampling with geometry interpolation to generate 1.18 million molecular geometries equally distributed into 10 independent datasets of flexible organic molecules with varying sizes and chemical complexity. In addition to the potential energy and forces required to construct potential energy surfaces, the WS22 database provides several other quantum chemical properties, all obtained via single-point calculations for each molecular geometry. All quantum chemical calculations were performed with the Gaussian 09 program.
10.60732/f74456fc
C, H, N, O
Max Pinheiro Jr, Shuang Zhang, Pavlo O. Dral, Mario Barbatti
119991
1559883
4
DFT-PBE0
Gaussian 09
35
All DFT single-point calculations for the OrbNet Denali training set were carried out in Entos Qcore version 0.8.17 at the ωB97X-D3/def2-TZVP level of theory using in-core density fitting with the neese=4 DFT integration grid.
10.60732/b6043e2b
B, Br, C, Ca, Cl, F, H, I, K, Li, Mg, N, Na, O, P, S, Si
Anders S. Christensen, Sai Krishna Sirumalla, Zhuoran Qiao, Michael B. OConnor, Daniel G. A. Smith, Feizhi Ding, Peter J. Bygrave, Animashree Anandkumar, Matthew Welborn, Frederick R. Manby, Thomas F. Miller III
2337230
104937852
17
DFT-ωB97X+D3
ENTOS QCORE 0.8.17
35
Starting from a single reference ab initio simulation, we use active learning to expand into new state points and to describe the quantum nature of the nuclei. The final model, trained on 814 reference calculations, yields excellent results under a range of conditions, from liquid water at ambient and elevated temperatures and pressures to different phases of ice, and the air-water interface — all including nuclear quantum effects.
10.60732/d1024453
H, O
Christoph Schran, Kyrstof Brezina, Ondrej Marsalek
8814
2304144
2
DFT-revPBE0+D3
CP2K
35
Validation configurations of Li8Mo2Ni7Ti7O32 from HO_LiMoNiTi_NPJCM_2020 used in the training of an ANN, whereby total energy is extrapolated by a Taylor expansion as a means of reducing computational costs.
10.60732/40d9f4e8
Li, Mo, Ni, O, Ti
April M. Cooper, Johannes Kästner, Alexander Urban, Nongnuch Artrith
1792
100352
5
DFT-SCAN
VASP
35
Dataset for H2CO, with and without added noise for testing the effects of noise on quality of fit. Configurations sets are included for clean energy values with different levels of gaussian noise added to atomic forces (including a set with no noise added), and energies perturbed at different levels (including a set with no perturbation). Configuration sets correspond to individual files found at the data link.
10.60732/76701b84
C, H, O
Sugata Goswami, Silvan Käser, Raymond J. Bemish, Markus Meuwly
28808
115232
3
MP2
Gaussian 09
35
Test dataset from xxMD-CASSCF. The xxMD (Extended Excited-state Molecular Dynamics) dataset is a comprehensive collection of non-adiabatic trajectories encompassing several photo-sensitive molecules. This dataset challenges existing Neural Force Field (NFF) models with broader nuclear configuration spaces that span reactant, transition state, product, and conical intersection regions, making it more chemically representative than its contemporaries. xxMD is divided into two datasets, each with corresponding train, test and validation splits. xxMD-CASSCF contains calculations generated using state-averaged complete active state self-consistent field (SA-CASSCF) electronic theory. xxMD-DFT contains recalculated single-point spin-polarized (unrestricted) DFT values.
10.60732/f48ed7f0
C, H, N, O, S
Zihan Pengmei, Yinan Shu, Junyu Liu
21700
403800
5
SA-CASSCF
OpenMolcas 22.06
35
The DFT with D3 vdW corrections split of the Graphene-hBN_and_Graphene-Graphene dataset. This dataset family (see other Graphene-hBN_and_Graphene_Graphene datasets) contains data for Graphene-Graphene and Graphene-hexagonal boron nitride (hBN) ab initio calculations for structures with different interlayer distances and disregistries, calculated using DFT with D2 van der Waals corrections, DFT with D3 van der Waals corrections, and QMC methods.
B, C, N
Kittithat Krongchon, Lucas K. Wagner, Tawfiqur Rakib, Daniel Palmer, Elif Ertekin, Harley T. Johnson
368
13248
3
DFT-PBE+D3
Quantum ESPRESSO
35
The ChIMES C 2.0 Small dataset consists of initial structures of carbon calculated at the DFT level using VASP and trajectories produced using the ChIMES model. See links for the model code and ChIMES simulation evaluation library.
10.60732/ef8a9926
C
Rebecca K. Lindsey, Nir Goldman, Laurence E. Fried
601
117976
1
DFT-PBE
ChIMES
34
Glycine test PBE0-MBD dataset from "Semi-local and hybrid functional DFT data for thermalised snapshots of polymorphs of benzene, succinic acid, and glycine". DFT reference energies and forces were calculated using Quantum Espresso v6.3. The calculations were performed with the semi-local PBE xc functional, Tkatchenko-Scheffler dispersion correction, optimised norm-conserving Vanderbilt pseudopotentials, a Monkhorst-Pack k-point grid with a maximum spacing of 0.06 x 2π A^-1, and a plane-wave energy cut-off of 100 Ry for the wavefunction.
10.60732/a85ea3a9
C, H, N, O
Venkat Kapil, Edgar A. Engel
200
6880
4
DFT-PBE+TS
Quantum ESPRESSO v6.3
34
Succinic acid validation PBE0-MBD dataset from "Semi-local and hybrid functional DFT data for thermalised snapshots of polymorphs of benzene, succinic acid, and glycine". DFT reference energies and forces were calculated using Quantum Espresso v6.3. The calculations were performed with the semi-local PBE xc functional, Tkatchenko-Scheffler dispersion correction, optimised norm-conserving Vanderbilt pseudopotentials, a Monkhorst-Pack k-point grid with a maximum spacing of 0.06 x 2π A^-1, and a plane-wave energy cut-off of 100 Ry for the wavefunction.
10.60732/55319022
C, H, O
Venkat Kapil, Edgar A. Engel
200
5600
3
DFT-PBE+TS
Quantum ESPRESSO v6.3
34
This dataset contains structures of materials from the N (15th), O (16th) and F (16th) columns of the periodic table used for generating a 2-body non-bonded vdW potential.
10.60732/5617dd04
As, At, Bi, O, P, Po, S, Sb, Se, Te
Peng Geng, Sergey Zybin, Saber Naserifar, William A. Goddard, III
262
1494
10
DFT-PBE
VASP 5.4.4
34
This dataset investigates the effect of defects, such as copper and oxygen vacancies, in cuprous oxide films. Structures include oxygen vacancies formed in proximity of a reconstructed Cu2O(111) surface, where the outermost unsaturated copper atoms are removed, thus forming non-stoichiometric surface layers with copper vacancies. Surface and bulk properties are addressed by modelling a thick and symmetric slab consisting of 8 atomic layers and 736 atoms. Configuration sets include bulk, slab, vacancy and oxygen gas. Version v1
10.60732/7fd4eb34
Cu, O
Nanchen Dongfang, Marcella Iannuzzi, Yasmine Al-Hamdani
855
604801
2
DFT-PBE+U+D3
CP2K
34
A training dataset of 90,000 configurations with interaction properties between H2 and Pt(111) surfaces.
10.60732/831d1c4a
H, Pt
Jonathan Vandermause, Yu Xie, Jin Soo Lim, Cameron J. Owen, Boris Kozinsky
90731
5705442
2
DFT-PBE
VASP
34
Configurations of Zn from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K.
10.60732/dfa1e792
Zn
Christopher M. Andolina, Wissam A. Saidi
3852
102160
1
DFT-PBE
VASP
34
The validation set from HME21. The high-temperature multi-element 2021 (HME21) dataset comprises approximately 25,000 configurations, including 37 elements, used in the training of a universal NNP called PreFerential Potential (PFP). The dataset specifically contains disordered and unstable structures, and structures that include irregular substitutions, as well as varied temperature and density.
10.60732/5bc5a5cc
Ag, Al, Au, Ba, C, Ca, Cl, Co, Cr, Cu, F, Fe, H, In, Ir, K, Li, Mg, Mn, Mo, N, Na, Ni, O, P, Pb, Pd, Pt, Rh, Ru, S, Sc, Si, Sn, Ti, V, Zn
So Takamoto, Chikashi Shinagawa, Daisuke Motoki, Kosuke Nakago, Wenwen Li, Iori Kurata, Taku Watanabe, Yoshihiro Yayama, Hiroki Iriguchi, Yusuke Asano, Tasuku Onodera, Takafumi Ishii, Takao Kudo, Hideki Ono, Ryohto Sawada, Ryuichiro Ishitani, Marc Ong, Taiki Yamaguchi, Toshiki Kataoka, Akihide Hayashi, Nontawat Charoenphakdee, Takeshi Ibuka
2498
69420
37
DFT-PBE
VASP 5.4.4
34
Configurations of Li from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K.
10.60732/f5cc2a19
Li
Christopher M. Andolina, Wissam A. Saidi
2531
93579
1
DFT-PBE
VASP
34
A training dataset of diverse atomic configurations of Zn, varying in aggregation states, crystal structures, defect types, and sizes. The aim was to derive a potential capable of accurately describing a broad spectrum of local atomic configurations in Zn.
10.60732/54902e18
Zn
Haojie Mei, Luyao Cheng, Liang Chen, Feifei Wang, Jinfu Li, Lingti Kong
13299
276240
1
DFT-PBE
VASP
34
Approximately 20,000 configurations of Au used as part of a training dataset for a DP-GEN-based ML model for a Ag-Au nanoalloy potential.
10.60732/c4492535
Au
Yinan Wang, Xiaoyang Wang, Linfeng Zhang, Ben Xu, Han Wang
9754
161580
1
DFT-PBE+D3
VASP, DP-GEN
34
The training split of the MAD (Massive Atomic Diversity) dataset. From the creators: Starting from relatively small sets of stable structures, the dataset is built to contain “massive atomic diversity” (MAD) by aggressively distorting these configurations, with near-complete disregard for the stability of the resulting configurations. The electronic structure details, on the other hand, are chosen to maximize consistency rather than to obtain the most accurate prediction fora given structure, or to minimize computational effort. The MAD dataset we present here, despite containing fewer than 100k structures, has already been shown to enable training universal interatomic potentials that are competitive with models trained on traditional datasets with two to three orders of magnitude more structures.
10.60732/f5b6ea1b
Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, O, Os, P, Pb, Pd, Pm, Po, Pr, Pt, Rb, Re, Rh, Rn, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Ti, Tl, Tm, V, W, Xe, Y, Yb, Zn, Zr
Arslan Mazitov, Sofiia Chorna, Guillaume Fraux, Marnik Bercx, Giovanni Pizzi, Sandip De, Michele Ceriotti
76482
2064229
85
DFT-PBEsol
VASP
33
Training set (DFT output) for CE models and MC simulation output for the manuscript 'Phase behaviour of (Ti:Mo)S2binary alloys arising from electron-lattice coupling'. The DFT calculations are performed using VASP 5.4.3, compiled with intel MPI and Intel MKL support.
10.60732/864a2df0
Mo, S, Ti
Andrea Silva, Tomas Polcar, Denis Kramer
259
3996
3
DFT-SCAN+rVV10
VASP 5.4.3
33
This dataset provides DFT (as implemented in VASP) calculations for pure magnesium. Configuration sets include bulk, generalized stacking fault energies, stable stacking fault, decohesion, relaxed surfaces, dimer, corner and rod, and vacancy configurations of Mg.
10.60732/28f038f2
Mg
Binglun Yin, Markus Stricker, W. A. Curtin
405
10730
1
DFT-PBE
VASP
33
Glycine test PBE-TS dataset from "Semi-local and hybrid functional DFT data for thermalised snapshots of polymorphs of benzene, succinic acid, and glycine". DFT reference energies and forces were calculated using Quantum Espresso v6.3. The calculations were performed with the semi-local PBE xc functional, Tkatchenko-Scheffler dispersion correction, optimised norm-conserving Vanderbilt pseudopotentials, a Monkhorst-Pack k-point grid with a maximum spacing of 0.06 x 2π A^-1, and a plane-wave energy cut-off of 100 Ry for the wavefunction.
10.60732/d68e42bc
C, H, N, O
Venkat Kapil, Edgar A. Engel
500
17710
4
DFT-PBE+TS
Quantum ESPRESSO v6.3
33
Training configurations of Li8Mo2Ni7Ti7O32 from HO_LiMoNiTi_NPJCM_2020 used in the training of an ANN, whereby total energy is extrapolated by a Taylor expansion as a means of reducing computational costs.
10.60732/1c1bf708
Li, Mo, Ni, O, Ti
April M. Cooper, Johannes Kästner, Alexander Urban, Nongnuch Artrith
824
46144
5
DFT-SCAN
VASP
33
Carbon_GAP_20 dataset from CGM-MLP_natcomm2023. This dataset was one of the datasets used in training during the process of producing an active learning dataset for the purposes of exploring substrate-catalyzed deposition on metal surfaces such as Cu(111), Cr(110), Ti(001), and oxygen-contaminated Cu(111) as a means of controllable synthesis of carbon nanomaterials. The combined dataset includes structures from the Carbon_GAP_20 dataset and additional configurations of carbon clusters on a Cu, Cr and Ti surfaces.
10.60732/b996b7e0
C, Cu
Di Zhang, Peiyun Yi, Xinmin Lai, Linfa Peng, Hao Li
6178
400485
2
DFT-PBE+D3
CP2K
33
Configurations of o-hbdi from WS22. The WS22 database combines Wigner sampling with geometry interpolation to generate 1.18 million molecular geometries equally distributed into 10 independent datasets of flexible organic molecules with varying sizes and chemical complexity. In addition to the potential energy and forces required to construct potential energy surfaces, the WS22 database provides several other quantum chemical properties, all obtained via single-point calculations for each molecular geometry. All quantum chemical calculations were performed with the Gaussian 09 program.
10.60732/5dce8a9a
C, H, N, O
Max Pinheiro Jr, Shuang Zhang, Pavlo O. Dral, Mario Barbatti
119995
2639890
4
DFT-PBE0
Gaussian 09
33
Test configurations from CA-9 dataset used to evaluate trained NNPs.CA-9 consists of configurations of carbon with curated subsets chosen to test the effects of intentionally choosing dissimilar configurations when training neural network potentials
10.60732/5a57f6ad
C
Daniel Hedman, Tom Rothe, Gustav Johansson, Fredrik Sandin, J. Andreas Larsson, Yoshiyuki Miyamoto
2726
206238
1
DFT-PBE
VASP
33
The JARVIS_mlearn dataset is part of the joint automated repository for various integrated simulations (JARVIS) database. This dataset contains configurations from the Organic Materials Database (OMDB): a dataset of 12,500 crystal materials for the purpose of training models for the prediction of properties for complex and lattice-periodic organic crystals with large numbers of atoms per unit cell. Dataset covers 69 space groups, 65 elements; averages 82 atoms per unit cell. This dataset also includes classical force-field inspired descriptors (CFID) for each configuration. JARVIS is a set of tools and collected datasets built to meet current materials design challenges.
10.60732/f3f6ad68
Cu, Ge, Li, Mo, Ni, Si
Yunxing Zuo, Chi Chen, Xiangguo Li, Zhi Deng, Yiming Chen, Jörg Behler, Gábor Csányi, Alexander V. Shapeev, Aidan P. Thompson, Mitchell A. Wood, Shyue Ping Ong
1566
115742
6
DFT-PBE
VASP 5.4.1
33
The test set from HME21. The high-temperature multi-element 2021 (HME21) dataset comprises approximately 25,000 configurations, including 37 elements, used in the training of a universal NNP called PreFerential Potential (PFP). The dataset specifically contains disordered and unstable structures, and structures that include irregular substitutions, as well as varied temperature and density.
10.60732/bddeac8f
Ag, Al, Au, Ba, C, Ca, Cl, Co, Cr, Cu, F, Fe, H, In, Ir, K, Li, Mg, Mn, Mo, N, Na, Ni, O, P, Pb, Pd, Pt, Rh, Ru, S, Sc, Si, Sn, Ti, V, Zn
So Takamoto, Chikashi Shinagawa, Daisuke Motoki, Kosuke Nakago, Wenwen Li, Iori Kurata, Taku Watanabe, Yoshihiro Yayama, Hiroki Iriguchi, Yusuke Asano, Tasuku Onodera, Takafumi Ishii, Takao Kudo, Hideki Ono, Ryohto Sawada, Ryuichiro Ishitani, Marc Ong, Taiki Yamaguchi, Toshiki Kataoka, Akihide Hayashi, Nontawat Charoenphakdee, Takeshi Ibuka
2495
69572
37
DFT-PBE
VASP 5.4.4
33
Approximately 9,850 configurations of CO2 with a movable Ni(100) surface.
10.60732/b44e9fd6
C, Ni, O
Yaolong Zhang, Junfan Xia, Bin Jiang
9845
383955
3
DFT-PBE
VASP
33
The data used for training the DFT models were created running ab initio MD in the NVT ensemble using the Nosé-Hoover thermostat at 500 K during a 200 ps simulation with a resolution of 0.5 fs. Forces and energies were computed using all-electrons at the generalized gradient approximation level of theory with the Perdew-Burke-Ernzerhof (PBE) exchange-correlation functional, treating van der Waals interactions with the Tkatchenko-Scheffler (TS) method. All calculations were performed with FHI-aims. The final training data was generated by subsampling the full trajectory under preservation of the Maxwell-Boltzmann distribution for the energies.
10.60732/18404d62
C, H
Stefan Chmiela, Huziel E. Sauceda, Klaus-Robert Müller, Alexandre Tkatchenko
49862
598344
2
DFT-PBE+TS
FHI-aims
33
DFT dataset consisting of 6828 resampled Pt-Ni alloys used for training an NNP. The energy and forces of each structure in the resampled database are calculated using DFT. All reference DFT calculations for the training set of 6828 Pt-Ni alloy structures have been performed using the Vienna Ab initio Simulation Package (VASP) with the spin-polarized revised Perdew-Burke-Ernzerhof (rPBE) exchange-correlation functional.
10.60732/9d0ff0eb
Ni, Pt
Shuang Han, Giovanni Barcaro, Alessandro Fortunelli, Steen Lysgaard, Tejs Vegge, Heine Anton Hansen
6820
1072856
2
DFT-rPBE
VASP
33
Training sets from Si_Al_Ti_Seko_PRB_2019. This dataset is compiled of 10,000 selected structures from the ICSD, divided into training and test sets. The dataset was generated for the purpose of training a MLIP with introduced high-order linearly independent rotational invariants up to the sixth order based on spherical harmonics. DFT calculations were carried out with VASP using the PBE cross-correlation functional and an energy cutoff of 400 eV.
10.60732/59585f0a
Al, Si, Ti
Atsuto Seko, Atsushi Togo, Isao Tanaka
3989
197628
3
DFT-PBE
VASP
33
The rattled-1000-subsampled validation split of OMat24 (Open Materials 2024). OMat24 is a large-scale open dataset of density functional theory (DFT) calculations. The dataset is available in subdatasets and subsampled sub-datasets based on the structure generation strategy used. There are two main splits in OMat24: train and validation, each divided into the aforementioned subsampling and sub-datasets.
10.60732/8e8871be
Ac, Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr
Luis Barroso-Luque, Muhammed Shuaibi, Xiang Fu, Brandon M. Wood, Misko Dzamba, Meng Gao, Ammar Rizvi, C. Lawrence Zitnick, Zachary W. Ulissi
38271
549832
87
DFT-PBE+U
VASP
33
Approximately 11,500 configurations of In2Se3, including monolayer (20-atom slab) and bulk (30-atom supercell) models.
10.60732/a8e05a1b
In, Se
Jing Wu, Liyi Bai, Jiawei Huang, Liyang Ma, Jian Liu, Shi Liu
11516
248370
2
DFT-PBE
VASP
33
This dataset was designed to enable machine-learning of Nb elastic, thermal, and defect properties, as well as surface energetics, melting, and the structure of the liquid phase. The dataset was constructed by starting with the dataset from J. Byggmästar et al., Phys. Rev. B 100, 144105 (2019), then rescaling all of the configurations to the correct lattice spacing and adding in gamma surface configurations.
10.60732/a90f7f6e
Nb
Jesper Byggmästar, Kai Nordlund, Flyura Djurabekova
3787
45641
1
DFT-PBE
VASP
33
All structures calculated for Vector-QM24 (VQM24) with properties calculated using DFT. Vector-QM24 is a quantum chemistry dataset of ~836 thousand small organic and inorganic molecules. Dataset covers all possible neutral closed-shell small organic and inorganic molecules with up to five heavy (p-block) atoms: C, N, O, F, Si, P, S, Cl, Br.
Br, C, Cl, F, H, N, O, P, S, Si
Danish Khan, Anouar Benali, Scott Y. H. Kim, Guido Falk von Rudorff, O. Anatole von Lilienfeld
784838
8079877
10
DFT-ωB97X+D3
Psi4
33
We establish the sign of the linear magnetoelectric (ME) coefficient, α, in chromia, Cr₂O₃. Cr₂O₃ is the prototypical linear ME material, in which an electric (magnetic) field induces a linearly proportional magnetization (polarization), and a single magnetic domain can be selected by annealing in combined magnetic (H) and electric (E) fields. Opposite antiferromagnetic domains have opposite ME responses, and which antiferromagnetic domain corresponds to which sign of response has previously been unclear. We use density functional theory (DFT) to calculate the magnetic response of a single antiferromagnetic domain of Cr₂O₃ to an applied in-plane electric field at 0 K. We find that the domain with nearest neighbor magnetic moments oriented away from (towards) each other has a negative (positive) in-plane ME coefficient, α⊥, at 0 K. We show that this sign is consistent with all other DFT calculations in the literature that specified the domain orientation, independent of the choice of DFT code or functional, the method used to apply the field, and whether the direct (magnetic field) or inverse (electric field) ME response was calculated. Next, we reanalyze our previously published spherical neutron polarimetry data to determine the antiferromagnetic domain produced by annealing in combined E and H fields oriented along the crystallographic symmetry axis at room temperature. We find that the antiferromagnetic domain with nearest-neighbor magnetic moments oriented away from (towards) each other is produced by annealing in (anti-)parallel E and H fields, corresponding to a positive (negative) axial ME coefficient, α∥, at room temperature. Since α⊥ at 0 K and α∥ at room temperature are known to be of opposite sign, our computational and experimental results are consistent. This dataset contains the input data to reproduce the calculation of the magnetoelectric effect as plotted in Fig. 3 of the manuscript, for Elk, Vasp, and Quantum Espresso.
10.60732/85b7fa44
Cr, O
Eric Bousquet, Eddy Lelièvre-Berna, Navid Qureshi, Jian-Rui Soh, Nicola Ann Spaldin, Andrea Urru, Xanthe Henderike Verbeek, Sophie Francis Weber
165
1650
2
DFT-LDA
VASP
32
DFT-optimized geometries and properties for Li-S electrolytes. These make up the Computational Database for Li-S Batteries (ComBat), calculated using Gaussian 16 at the B3LYP/6-31+G* level of theory.
10.60732/682b12b1
C, F, H, Li, N, O, P, S, Si
Rasha Atwi, Matthew Bliss, Maxim Makeev, Nav Nidhi Rajput
174
4719
9
DFT-B3LYP
Gaussian 16
32
The JARVIS_TinNet dataset is part of the joint automated repository for various integrated simulations (JARVIS) database. This dataset contains configurations from the TinNet-N dataset: a collection assembled to train a machine learning model for the purposes of assisting catalyst design by predicting chemical reactivity of transition-metal surfaces. The adsorption systems contained in this dataset consist of {100}-terminated Pt-based bimetallic surfaces doped with a third element. JARVIS is a set of tools and collected datasets built to meet current materials design challenges.
10.60732/81faec41
Ag, Au, Cd, Co, Cr, Cu, Fe, H, Hf, Ir, Mn, Mo, N, Nb, Ni, O, Os, Pd, Pt, Re, Rh, Ru, Sc, Tc, V, W, Zn
Shih-Han Wang, Hemanth Somarajan Pillai, Siwen Wang, Luke E. K. Achenie, Hongliang Xin
329
6251
27
DFT-rPBE
VASP
32
Dataset for "Appraisal of calcium ferrites as cathodes for calcium rechargeable batteries: DFT, synthesis, characterization and electrochemistry of Ca4Fe9O17" created to explore Fe-based cathode materials for Ca-ion batteries. Structures include CaFe(2+n)O(4+n), where 0 < n < 3.
10.60732/c8fdee31
Ca, Fe, O
M. Elena Arroyo-de Dompablo, José Luis Casals
345
35462
3
DFT-PBE
VASP 4.6.35
32
The test set of a train/test pair from the malonaldehyde dataset from sGDML. To create the coupled cluster datasets, the data used for training the models were created by running ab initio MD in the NVT ensemble using the Nosé-Hoover thermostat at 500 K during a 200 ps simulation with a resolution of 0.5 fs. Energies and forces were recalculated using all-electron coupled cluster with single, double and perturbative triple excitations (CCSD(T)). The Dunning correlation-consistent basis set cc-pVDZ was used for malonaldehyde. All calculations were performed with the Psi4 software suite.
10.60732/c459d6f4
C, H, O
Stefan Chmiela, Huziel E. Sauceda, Klaus-Robert Müller, Alexandre Tkatchenko
500
4500
3
CCSD(T)
Psi4
32
The test set of a train/test pair from the ethanol dataset from sGDML. To create the coupled cluster datasets, the data used for training the models were created by running ab initio MD in the NVT ensemble using the Nosé-Hoover thermostat at 500 K during a 200 ps simulation with a resolution of 0.5 fs. Energies and forces were recalculated using all-electron coupled cluster with single, double and perturbative triple excitations (CCSD(T)).The Dunning correlation-consistent basis set cc-pVTZ was used for ethanol. All calculations were performed with the Psi4 software suite.
10.60732/76c53b98
C, H, O
Stefan Chmiela, Huziel E. Sauceda, Klaus-Robert Müller, Alexandre Tkatchenko
1000
9000
3
CCSD(T)
Psi4
32
The JARVIS_ALIGNN_FF dataset is part of the joint automated repository for various integrated simulations (JARVIS) database. This dataset is a subset of the JARVIS DFT dataset, filtered to contain just the first, last, middle, maximum energy and minimum energy structures. Additionally, calculation run snapshots are filtered for uniqueness, and the dataset contains only perfect structures. DFT energies, stresses and forces in this dataset were used to train an atomisitic line graph neural network (ALIGNN)-based FF model. JARVIS is a set of tools and datasets built to meet current materials design challenges.
10.60732/45deafd8
Ac, Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr
Kamal Choudhary, Brian DeCost, Lily Major, Keith Butler, Jeyan Thiyagalingam, Francesca Tavazza
304146
3178329
89
IP-ALIGNN-FF
VASP
32
A comprehensive database generated using density functional theory simulations, encompassing a wide range of crystal structures, point defects, extended defects, and disordered structure.
10.60732/41115bd2
O, Si
Karim Zongo, Hao Sun, Claudiane Ouellet-Plamondon, Laurent Karim Beland
1061
71594
2
DFT-PBE
Quantum ESPRESSO
32
Training and simulation data from machine learning force field model applied to steps of the hydrogenation of carbon dioxide to methanol over an indium oxide catalyst, with and without platinum doping.
10.60732/d16f9667
C, H, In, O, Pt
Lars Schaaf, Edvin Fako, Sandip De, Ansgar Schafer, Gabor Csanyi
1994
163746
5
DFT-PBE
Quantum ESPRESSO
32
Test configurations with MD simulations performed at 600K from 3BPA, used to showcase the performance of linear atomic cluster expansion (ACE) force fields in a machine learning model to predict the potential energy surfaces of organic molecules.
10.60732/cf8c3842
C, H, N, O
Dávid Péter Kovács, Cas van der Oord, Jiri Kucera, Alice E. A. Allen, Daniel J. Cole, Christoph Ortner, Gábor Csányi
2138
57726
4
DFT-ωB97X
ORCA
32
This dataset contains dimer molecules of Co(II) with potential energy calculations for structures with ferromagnetic and antiferromagnetic spin configurations. Calculations were carried out in Gaussian 16 with the PBE exchange-correlation functional and 6-31+G* basis set. All molecules contain the same atomic core region, consisting of the tetrahedral and octahedral Co centers and the three PO2R2 bridging ligands. The ligand exchange provides a broad range of exchange energies (ΔEJ), from +50 to -200 meV, with 80% of the ligands yielding ΔEJ < 10 meV.
10.60732/16b96cbc
C, Cl, Co, H, N, O, P, S
Sijin Ren, Eric Fonseca, William Perry, Hai-Ping Cheng, Xiao-Guang Zhang, Richard Hennig
2158
188149
8
DFT-PBE
Gaussian 16
32
Validation configurations from the SAIT_semiconductors_ACS_2023_HfO dataset. This dataset contains HfO configurations from the SAIT semiconductors datasets. SAIT semiconductors datasets comprise two rich datasets for the important semiconductor thin film materials silicon nitride (SiN) and hafnium oxide (HfO), gathered for the development of MLFFs. DFT simulations were conducted under various conditions that include differing initial structures, stoichiometry, temperature, strain, and defects.
10.60732/bde379de
Hf, O
Geonu Kim, Byunggook Na, Gunhee Kim, Hyuntae Cho, Seung-Jin Kang, Hee Sun Lee, Saerom Choi, Heejae Kim, Seungwon Lee, Yongdeok Kim
3510
336960
2
DFT-PBE
VASP
32
The dataset consists of energies, forces and virials for DFT-VASP-generated Ag-Pd systems. The data was used to fit an active learned dataset which was used to compare MTP- and SOAP-GAP-generated potentials.
10.60732/b0e39006
Ag, Pd
Conrad W. Rosenbrock, Konstantin Gubaev, Alexander V. Shapeev, Livia B. Pártay, Noam Bernstein, Gábor Csányi, Gus L. W. Hart
993
7260
2
DFT-PBE
VASP
32
Succinic acid training PBE0-MBD dataset from "Semi-local and hybrid functional DFT data for thermalised snapshots of polymorphs of benzene, succinic acid, and glycine". DFT reference energies and forces were calculated using Quantum Espresso v6.3. The calculations were performed with the semi-local PBE xc functional, Tkatchenko-Scheffler dispersion correction, optimised norm-conserving Vanderbilt pseudopotentials, a Monkhorst-Pack k-point grid with a maximum spacing of 0.06 x 2π A^-1, and a plane-wave energy cut-off of 100 Ry for the wavefunction.
10.60732/45375ebf
C, H, O
Venkat Kapil, Edgar A. Engel
1800
50400
3
DFT-PBE+TS
Quantum ESPRESSO v6.3
32
Test sets from Si_Al_Ti_Seko_PRB_2019. This dataset is compiled of 10,000 selected structures from the ICSD, divided into training and test sets. The dataset was generated for the purpose of training a MLIP with introduced high-order linearly independent rotational invariants up to the sixth order based on spherical harmonics. DFT calculations were carried out with VASP using the PBE cross-correlation functional and an energy cutoff of 400 eV.
10.60732/9b58ca47
Al, Si, Ti
Atsuto Seko, Atsushi Togo, Isao Tanaka
36152
1774526
3
DFT-PBE
VASP
32
One configuration of an enzyme: training data for a quantum-guided molecular mechanics model.
10.60732/e75f2602
C, H, N, O, S
Taylor R. Quinn, Himani N. Patel, Kevin H. Koh, Brandon E. Haines, Per-Ola Norrby, Paul Helquist, Olaf Wiest
1
117
5
DFT-RM06
Gaussian 09
31
NEB path of proton transfer reaction between the two forms of acetylacetone. Acetylacetone dataset generated from a long molecular dynamics simulation at 300 K using a Langevin thermostat at the semi-empirical GFN2-xTB level of theory. Configurations were sampled at an interval of 1 ps and the resulting set of configurations were recomputed with density functional theory using the PBE exchange-correlation functional with D3 dispersion correction and def2-SVP basis set and VeryTightSCF convergence settings using the ORCA electronic structure package.
10.60732/88a37621
C, H, O
Ilyes Batatia, Simon Batzner, Dávid Péter Kovács, Albert Musaelian, Gregor N. C. Simm, Ralf Drautz, Christoph Ortner, Boris Kozinsky, Gábor Csányi
15
225
3
DFT-PBE+D3
ORCA 5.0
31
This dataset includes Mg and Mg-Zn alloy structures with solute atoms at the prism edge locations. The dataset was created to study the strengthening effect of solute atoms at the prism edge locations in Mg alloys.
10.60732/95b38454
Mg, Zn
Masoud Rahbar Niazi, W. A Curtin
94
28615
2
DFT-PBE
VASP
31
The test set of a train/test pair from the benzene dataset from sGDML. To create the coupled cluster datasets, the data used for training the models were created by running ab initio MD in the NVT ensemble using the Nosé-Hoover thermostat at 500 K during a 200 ps simulation with a resolution of 0.5 fs. Energies and forces were recalculated using all-electron coupled cluster with single , double and perturbative triple excitations (CCSD(T)). The Dunning correlation-consistent basis set cc-pVDZ was used for benzene. All calculations were performed with the Psi4 software suite.
10.60732/81df086b
C, H
Stefan Chmiela, Huziel E. Sauceda, Klaus-Robert Müller, Alexandre Tkatchenko
500
6000
2
CCSD(T)
Psi4
31
Test set of decorrelated geometries sampled from 300 K xTB MD. Acetylacetone dataset generated from a long molecular dynamics simulation at 300 K using a Langevin thermostat at the semi-empirical GFN2-xTB level of theory. Configurations were sampled at an interval of 1 ps and the resulting set of configurations were recomputed with density functional theory using the PBE exchange-correlation functional with D3 dispersion correction and def2-SVP basis set and VeryTightSCF convergence settings using the ORCA electronic structure package.
10.60732/9ed20baf
C, H, O
Ilyes Batatia, Simon Batzner, Dávid Péter Kovács, Albert Musaelian, Gregor N. C. Simm, Ralf Drautz, Christoph Ortner, Boris Kozinsky, Gábor Csányi
650
9750
3
DFT-PBE+D3
ORCA 5.0
31
The main part of the dataset consists of structures of liquid water at 300 K from first-principles molecular dynamics (FPMD) simulations using a hybrid density functional with dispersion corrections. The dataset is expanded to include nuclear quantum effects by adding structures from path-integral molecular dynamics (PIMD) simulations. The final dataset contains 814 structures of liquid water at different temperatures and pressures, water slab, and ice Ih and ice VIII. These systems cover a wide range of structural and dynamical properties of water and ice. This dataset builds on the dataset from Schran, et al (2020) https://doi.org/10.1063/5.0016004
10.60732/39dba9fb
H, O
Zekun Chen, Margaret L. Berrens, Kam-Tung Chan, Zheyong Fan, Davide Donadio
814
216144
2
DFT-revPBE0+D3
CP2K
31
This dataset contains pristine monolayer phosphorene as well as structures with monovacancies which were used to train an artificial neural network (ANN) for use with a high-dimensional neural network potentials molecular dynamics (HDNNP-MD) simulation. The publication investigates the mechanism and rates of the processes of defect diffusion, as well as monovacancy-to-divacancy defect coalescence.
10.60732/87b2341a
P
Lukáš Kývala, Andrea Angeletti, Cesare Franchini, Christoph Dellago
5085
722033
1
DFT-PBE
VASP
31
6095 isomers of C7O2H10. Energetics were calculated at the G4MP2 level of theory.
10.60732/64be4f16
C, H, O
Raghunathan Ramakrishnan, Pavlo Dral, Matthias Rupp, O. Anatole von Lilienfeld
6094
115786
3
G4MP2
Gaussian 09
31
Training simulations from CGM-MLP_natcomm2023 of carbon deposition on a Ti surface. This dataset was one of the datasets used in training during the process of producing an active learning dataset for the purposes of exploring substrate-catalyzed deposition on metal surfaces such as Cu(111), Cr(110), Ti(001), and oxygen-contaminated Cu(111) as a means of controllable synthesis of carbon nanomaterials. The combined dataset includes structures from the Carbon_GAP_20 dataset and additional configurations of carbon clusters on a Cu, Cr and Ti surfaces.
10.60732/4e8857ac
C, Ti
Di Zhang, Peiyun Yi, Xinmin Lai, Linfa Peng, Hao Li
1309
259636
2
DFT-PBE+D3
CP2K
31
Includes CHON molecules of 4-15 atoms, developed in counterpoint to the MD17 dataset, run at higher total energies (above 500 K) and with a broader configuration space.
10.60732/999055f6
C, H, O
Joel M. Bowman, Chen Qu, Riccardo Conte, Apurba Nandi, Paul L. Houston, Qi Yu
6762
101430
3
DFT-B3LYP
MOLPRO
31
Approximately 28,000 configurations split into 4 datasets, each using a different functional, used in the training of a high-dimensional neural network potential (HDNNP).
10.60732/4f9f05e5
H, O
Tobias Morawietz, Jörg Behler
14537
1523796
2
DFT-RPBE+D3, DFT-BLYP, DFT-rPBE, DFT-BLYP+D3
FHI-aims
31
Random-random configurations from CA-9 dataset used for training NNP_RR potential. CA-9 consists of configurations of carbon with curated subsets chosen to test the effects of intentionally choosing dissimilar configurations when training neural network potentials
10.60732/4096ff5c
C
Daniel Hedman, Tom Rothe, Gustav Johansson, Fredrik Sandin, J. Andreas Larsson, Yoshiyuki Miyamoto
20012
1099992
1
DFT-PBE
VASP
31
This dataset was designed to enable machine learning of Mo elastic, thermal, and defect properties, as well as surface energetics, melting, and the structure of the liquid phase. The dataset was constructed by starting with the dataset from J. Byggmästar et al., Phys. Rev. B 100, 144105 (2019), then rescaling all of the configurations to the correct lattice spacing and adding in gamma surface configurations.
10.60732/31dbb6ee
Mo
Jesper Byggmästar, Kai Nordlund, Flyura Djurabekova
3785
45667
1
DFT-PBE
VASP
31
Approximately 45,000 configurations of metal oxides of Mg, Ag, Pt, Cu and Zn, with initial training structures taken from the Materials Project database.
10.60732/009ff6b1
Ag, Cu, Mg, O, Pt, Zn
Pandu Wisesa, Christopher M. Andolina, Wissam A. Saidi
44010
1975080
6
DFT-PBE
VASP
31
16748 configurations of magnesium with gathered energy, stress and forces at the DFT level of theory.
10.60732/4b13be86
Mg
Marvin Poul
16746
78239
1
DFT-PBE
VASP 5.4.4
31
The val_aimd-from-PBE-3000-npt validation split of OMat24 (Open Materials 2024). OMat24 is a large-scale open dataset of density functional theory (DFT) calculations. The dataset is available in subdatasets and subsampled sub-datasets based on the structure generation strategy used. There are two main splits in OMat24: train and validation, each divided into the aforementioned subsampling and sub-datasets.
10.60732/3b112bfe
Ac, Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Y, Yb, Zn, Zr
Luis Barroso-Luque, Muhammed Shuaibi, Xiang Fu, Brandon M. Wood, Misko Dzamba, Meng Gao, Ammar Rizvi, C. Lawrence Zitnick, Zachary W. Ulissi
59516
4036396
85
DFT-PBE+U
VASP
31
Approximately 20,000 configurations from a dataset of alpha-iron and hydrogen. Properties include forces and potential energy, calculated using VASP at the DFT level using the GGA-PBE functional.
10.60732/6e08b70b
Fe, H
Fan-Shun Meng, Jun-Ping Du, Shuhei Shinzato, Hideki Mori, Peijun Yu, Kazuki Matsubara, Nobuyuki Ishikawa, Shigenobu Ogata
20800
1857588
2
DFT-PBE
VASP
31
Binning-binning configurations from CA-9 dataset used for training NNP_BB potential. CA-9 consists of configurations of carbon with curated subsets chosen to test the effects of intentionally choosing dissimilar configurations when training neural network potentials
10.60732/f3bbbd36
C
Daniel Hedman, Tom Rothe, Gustav Johansson, Fredrik Sandin, J. Andreas Larsson, Yoshiyuki Miyamoto
20006
1053753
1
DFT-PBE
VASP
31
The test split of the MAD (Massive Atomic Diversity) dataset. From the creators: Starting from relatively small sets of stable structures, the dataset is built to contain “massive atomic diversity” (MAD) by aggressively distorting these configurations, with near-complete disregard for the stability of the resulting configurations. The electronic structure details, on the other hand, are chosen to maximize consistency rather than to obtain the most accurate prediction fora given structure, or to minimize computational effort. The MAD dataset we present here, despite containing fewer than 100k structures, has already been shown to enable training universal interatomic potentials that are competitive with models trained on traditional datasets with two to three orders of magnitude more structures.
10.60732/e55c4ce1
Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, O, Os, P, Pb, Pd, Pm, Po, Pr, Pt, Rb, Re, Rh, Rn, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Ti, Tl, Tm, V, W, Xe, Y, Yb, Zn, Zr
Arslan Mazitov, Sofiia Chorna, Guillaume Fraux, Marnik Bercx, Giovanni Pizzi, Sandip De, Michele Ceriotti
9546
259376
85
DFT-PBEsol
VASP
30
Structures from discrepencies_and_error_metrics_NPJ_2023 test set; these include a single migrating vacancy. The full discrepencies_and_error_metrics_NPJ_2023 dataset includes the original mlearn_Si_train dataset, modified with the purpose of developing models with better diffusivity scores by replacing ~54% of the data with structures containing migrating interstitials. The enhanced validation set contains 50 total structures, consisting of 20 structures randomly selected from the 120 replaced structures of the original training dataset, 11 snapshots with vacancy rare events (RE) from AIMD simulations, and 19 snapshots with interstitial RE from AIMD simulations. We also construct interstitial-RE and vacancy-RE testing sets, each consisting of 100 snapshots of atomic configurations with a single migrating vacancy or interstitial, respectively, from AIMD simulations at 1230 K.
10.60732/63c3da57
Si
Yunsheng Liu, Xingfeng He, Yifei Mo
100
6300
1
DFT-PBE
VASP 5.4.4
30
The dataset for "Origin of high strength in the CoCrFeNiPd high-entropy alloy", containing DFT-calculated values of the high-entropy alloy CoCrFeNiPd, created to explore the reasons behind experimental findings of the increased strength CoCrFeNiPd in comparison to CoCrFeNi.
10.60732/74f33a37
Co, Cr, Fe, Ni, Pd
Binglun Yin, W. A. Curtin
102
8508
5
DFT-PBEsol
VASP
30
Structures from discrepencies_and_error_metrics_NPJ_2023 training set; includes some structures with vacancies. The full discrepencies_and_error_metrics_NPJ_2023 dataset includes the original mlearn_Si_train dataset, modified with the purpose of developing models with better diffusivity scores by replacing ~54% of the data with structures containing migrating interstitials. The enhanced validation set contains 50 total structures, consisting of 20 structures randomly selected from the 120 replaced structures of the original training dataset, 11 snapshots with vacancy rare events (RE) from AIMD simulations, and 19 snapshots with interstitial RE from AIMD simulations. We also construct interstitial-RE and vacancy-RE testing sets, each consisting of 100 snapshots of atomic configurations with a single migrating vacancy or interstitial, respectively, from AIMD simulations at 1230 K.
10.60732/5a780a3a
Si
Yunsheng Liu, Xingfeng He, Yifei Mo
218
13389
1
DFT-PBE
VASP 5.4.4
30
Dataset created for "Vanadium is an optimal element for strengthening in both fcc and bcc high-entropy alloys", to explore the effect of V in the high-entropy systems fcc Co-Cr-Fe-Mn-Ni-V and bcc Cr-Mo-Nb-Ta-V-W-Hf-Ti-Zr. Structures include pure V, misfit volumes of V in Ni, and misfit volumes of Ni2V random alloys
10.60732/2a29960c
Ni, V
Binglun Yin, Francesco Maresca, W. A. Curtin
232
21148
2
DFT-PBE
VASP
30
The JARVIS_AGRA_COOH dataset is part of the joint automated repository for various integrated simulations (JARVIS) DFT database. This dataset contains data from the CO2 reduction reaction (CO2RR) dataset from Chen et al., as used in the automated graph representation algorithm (AGRA) training dataset: a collection of DFT training data for training a graph representation method to extract the local chemical environment of metallic surface adsorption sites. JARVIS is a set of tools and datasets built to meet current materials design challenges.
10.60732/692a415c
C, Co, Cu, Fe, H, Mo, Ni, O
Zhi Wen Chen, Zachary Gariepy, Lixin Chen, Xue Yao, Abu Anand, Szu-Jia Liu, Conrard Giresse Tetsassi Feugmo, Isaac Tamblyn, Chandra Veer Singh
280
19040
8
DFT-PBE
VASP
30
This dataset contains configurations of lithium titanate from the publication Kinetic Pathways of ionic transport in fast-charging lithium titanate. In order to understand the origin of various EELS (electron energy-loss spectroscopy) spectra features, EELS spectra were simulated using the Vienna Ab initio Simulation (VASP) package. For a specific Li in a given configuration, this is done by calculating the DOS and integrated DOS considering a Li core-hole on the position of the specific Li and calculating the EELS based on the DOS. The minimum energy paths (MEP) and migration energy of Li were calculated in various compositions, including Li4Ti5O12 with an additional Li carrier, Li5Ti5O12 with an additional Li carrier, and Li7Ti5O12 with a Li vacancy carrier.
10.60732/03896523
Be, Li, O, Ti
Tina Chen, Dong-hwa Seo
848
149914
4
DFT-PBE
VASP
30
Benzene test PBE-TS dataset from "Semi-local and hybrid functional DFT data for thermalised snapshots of polymorphs of benzene, succinic acid, and glycine". DFT reference energies and forces were calculated using Quantum Espresso v6.3. The calculations were performed with the semi-local PBE xc functional, Tkatchenko-Scheffler dispersion correction, optimised norm-conserving Vanderbilt pseudopotentials, a Monkhorst-Pack k-point grid with a maximum spacing of 0.06 x 2π A^-1, and a plane-wave energy cut-off of 100 Ry for the wavefunction.
10.60732/de02dca3
C, H
Venkat Kapil, Edgar A. Engel
1000
29736
2
DFT-PBE+TS
Quantum ESPRESSO v6.3
30
~100,000 configurations of water, ethanol, malondialdehyde and uracil gathered at the PBE/def2-SVP level of theory using ORCA.
10.60732/b0a10262
C, H, N, O
Kristof T. Schütt, Michael Gastegger, Alexandre Tkatchenko, Klaus-Robert Müller, Reinhard J. Maurer
91966
887691
4
DFT-PBE
ORCA
30
Approximately 6,500 configurations of Li10GeP2S12, based on crystal structures from the Materials Project database, material ID mp-696129. One of two LiGePS datasets from this source. The other uses the PBEsol functional, rather than the PBE functional.
10.60732/5ebf5a54
Ge, Li, P, S
Jianxing Huang, Linfeng Zhang, Han Wang, Jinbao Zhao, Jun Cheng, Weinan E
6549
1478600
4
DFT-PBE
VASP 5.4.4
30
Energy, computed with LR-CCSD, hybrid DFT (B3LYP & SCAN0) for 7211 molecules in QM7b and 52 molecules in AlphaML showcase database.
10.60732/8fb1d4c7
C, Cl, H, N, O, S
Yang Yang, Ka Un Lao, David M. Wilkins, Andrea Grisafi, Michele Ceriotti, Robert A. DiStasio Jr
7255
112218
6
CCSD, DFT-B3LYP
Psi4
30
Succinic acid training PBE-TS dataset from "Semi-local and hybrid functional DFT data for thermalised snapshots of polymorphs of benzene, succinic acid, and glycine". DFT reference energies and forces were calculated using Quantum Espresso v6.3. The calculations were performed with the semi-local PBE xc functional, Tkatchenko-Scheffler dispersion correction, optimised norm-conserving Vanderbilt pseudopotentials, a Monkhorst-Pack k-point grid with a maximum spacing of 0.06 x 2π A^-1, and a plane-wave energy cut-off of 100 Ry for the wavefunction.
10.60732/e3efb796
C, H, O
Venkat Kapil, Edgar A. Engel
29211
817908
3
DFT-PBE+TS
Quantum ESPRESSO v6.3
30
This dataset was designed to enable machine-learning of V elastic, thermal, and defect properties, as well as surface energetics, melting, and the structure of the liquid phase. The dataset was constructed by starting with the dataset from J. Byggmästar et al., Phys. Rev. B 100, 144105 (2019), then rescaling all of the configurations to the correct lattice spacing and adding in gamma surface configurations.
10.60732/aad06a25
V
Jesper Byggmästar, Kai Nordlund, Flyura Djurabekova
3801
46454
1
DFT-PBE
VASP
30
Binning-random configurations from CA-9 dataset used for training NNP_BR potential. CA-9 consists of configurations of carbon with curated subsets chosen to test the effects of intentionally choosing dissimilar configurations when training neural network potentials
10.60732/07b7d297
C
Daniel Hedman, Tom Rothe, Gustav Johansson, Fredrik Sandin, J. Andreas Larsson, Yoshiyuki Miyamoto
20013
1072779
1
DFT-PBE
VASP
30
The QMC-calculated split of the Graphene-hBN_and_Graphene-Graphene dataset. This dataset family (see other Graphene-hBN_and_Graphene_Graphene datasets) contains data for Graphene-Graphene and Graphene-hexagonal boron nitride (hBN) ab initio calculations for structures with different interlayer distances and disregistries, calculated using DFT with D2 van der Waals corrections, DFT with D3 van der Waals corrections, and QMC methods.
B, C, N
Kittithat Krongchon, Lucas K. Wagner, Tawfiqur Rakib, Daniel Palmer, Elif Ertekin, Harley T. Johnson
75
2700
3
IP-QMC
QMCPACK
30
Structures from Vector-QM24 (VQM24) that represent constitutional isomers, or the most stable conformers, with properties calculated using DFT. Vector-QM24 is a quantum chemistry dataset of ~836 thousand small organic and inorganic molecules. Dataset covers all possible neutral closed-shell small organic and inorganic molecules with up to five heavy (p-block) atoms: C, N, O, F, Si, P, S, Cl, Br.
Br, C, Cl, F, H, N, O, P, S, Si
Danish Khan, Anouar Benali, Scott Y. H. Kim, Guido Falk von Rudorff, O. Anatole von Lilienfeld
258242
2430476
10
DFT-ωB97X+D3
Psi4
30
Structures from discrepencies_and_error_metrics_NPJ_2023 validation set, enhanced by inclusion of rare events. The full discrepencies_and_error_metrics_NPJ_2023 dataset includes the original mlearn_Si_train dataset, modified with the purpose of developing models with better diffusivity scores by replacing ~54% of the data with structures containing migrating interstitials. The enhanced validation set contains 50 total structures, consisting of 20 structures randomly selected from the 120 replaced structures of the original training dataset, 11 snapshots with vacancy rare events (RE) from AIMD simulations, and 19 snapshots with interstitial RE from AIMD simulations. We also construct interstitial-RE and vacancy-RE testing sets, each consisting of 100 snapshots of atomic configurations with a single migrating vacancy or interstitial, respectively, from AIMD simulations at 1230 K.
10.60732/9c77bb8c
Si
Yunsheng Liu, Xingfeng He, Yifei Mo
50
3198
1
DFT-PBE
VASP 5.4.4
29
Structures from discrepencies_and_error_metrics_NPJ_2023 training set, enhanced by inclusion of interstitials. The full discrepencies_and_error_metrics_NPJ_2023 dataset includes the original mlearn_Si_train dataset, modified with the purpose of developing models with better diffusivity scores by replacing ~54% of the data with structures containing migrating interstitials. The enhanced validation set contains 50 total structures, consisting of 20 structures randomly selected from the 120 replaced structures of the original training dataset, 11 snapshots with vacancy rare events (RE) from AIMD simulations, and 19 snapshots with interstitial RE from AIMD simulations. We also construct interstitial-RE and vacancy-RE testing sets, each consisting of 100 snapshots of atomic configurations with a single migrating vacancy or interstitial, respectively, from AIMD simulations at 1230 K.
10.60732/f0a44294
Si
Yunsheng Liu, Xingfeng He, Yifei Mo
218
13629
1
DFT-PBE
VASP 5.4.4
29
This is the verification dataset (see companion training dataset: datasets_for_magnetic_MTP_NatSR2024_training) used in training a magnetic multi-component machine-learning potential for Fe-Al systems. The configurations from the verification set include different levels of magnetic moment perturbation than configurations from the training set. For this reason, the authors refer to this dataset as a "verification set", rather than a "validation set".
10.60732/acd42be9
Al, Fe
Alexey S. Kotykhov, Konstantin Gubaev, Max Hodapp, Christian Tantardini, Alexander V. Shapeev, Ivan S. Novikov
210
3360
2
DFT-PBE
ABINIT
29
The face-centered cubic medium-entropy alloy NiCoCr has received considerable attention for its good mechanical properties, uncertain stacking fault energy, etc, some of which have been attributed to chemical short-range order (SRO). Here, we examine the yield strength and misfit volumes of NiCoCr to determine whether SRO has measurably influenced mechanical properties. Polycrystalline strengths show no systematic trend with different processing conditions. Measured misfit volumes in NiCoCr are consistent with those in random binaries. Yield strength prediction of a random NiCoCr alloy matches well with experiments. Finally, we show that standard spin-polarized density functional theory (DFT) calculations of misfit volumes are not accurate for NiCoCr. This implies that DFT may be inaccurate for other subtle structural quantities such as atom-atom bond distance so that caution is required in drawing conclusions about NiCoCr based on DFT. These findings all lead to the conclusion that, under typical processing conditions, SRO in NiCoCr is either negligible or has no systematic measurable effect on strength.
10.60732/aa9d7982
Co, Cr, Ni
Binglun Yin, William Curtin
428
40624
3
DFT-PBE
VASP
29
493 structures available from the GAP-20 database, excluding any structures present in the training set. This dataset was one of the datasets used in testing screening parameters during the process of producing an active learning dataset for Cu-C interactions for the purposes of exploring substrate-catalyzed deposition as a means of controllable synthesis of carbon nanomaterials. The combined dataset includes structures from the Carbon_GAP_20 dataset and additional configurations of carbon clusters on a Cu(111) surface.
10.60732/3e23c305
C
Di Zhang, Peiyun Yi, Xinmin Lai, Linfa Peng, Hao Li
494
32279
1
DFT-PBE+D3
CP2K
29
500 decorrelated geometries sampled from 300 K xTB MD run. Acetylacetone dataset generated from a long molecular dynamics simulation at 300 K using a Langevin thermostat at the semi-empirical GFN2-xTB level of theory. Configurations were sampled at an interval of 1 ps and the resulting set of configurations were recomputed with density functional theory using the PBE exchange-correlation functional with D3 dispersion correction and def2-SVP basis set and VeryTightSCF convergence settings using the ORCA electronic structure package.
10.60732/e359e8ed
C, H, O
Ilyes Batatia, Simon Batzner, Dávid Péter Kovács, Albert Musaelian, Gregor N. C. Simm, Ralf Drautz, Christoph Ortner, Boris Kozinsky, Gábor Csányi
500
7500
3
DFT-PBE+D3
ORCA 5.0
29
This dataset was generated using the following active learning scheme: 1) candidate structures were relaxed by a partially-trained MTP model, 2) structures for which the MTP had to perform extrapolation were passed to DFT to be re-computed, 3) the MTP was retrained, including the structures that were re-computed with DFT, 4) steps 1-3 were repeated until the MTP no longer extrapolated on any of the original candidate structures. The original candidate structures for this dataset included 40,000 unrelaxed configurations with BCC, FCC, and HCP lattices.
10.60732/1058e01c
Cu, Pd
Konstantin Gubaev, Evgeny V. Podryabinkin, Gus L.W. Hart, Alexander V. Shapeev
522
2450
2
DFT-undefined
VASP
29
The JARVIS_AGRA_OH dataset is part of the joint automated repository for various integrated simulations (JARVIS) DFT database. This dataset contains data from the training set for the oxygen reduction reaction (ORR) dataset from Batchelor et al., as used in the automated graph representation algorithm (AGRA) training dataset: a collection of DFT training data for training a graph representation method to extract the local chemical environment of metallic surface adsorption sites. Bulk calculations were performed with k-point = 8 x 8 x 4. Training adsorption energies were calculated on slabs, k-point = 4 x 4 x 1, while testing energies used k-point = 3 x 3 x 1. JARVIS is a set of tools and datasets built to meet current materials design challenges.
10.60732/4db2a2e7
H, Ir, O, Pd, Pt, Rh, Ru
Thomas A.A. Batchelor, Jack K. Pedersen, Simon H. Winther, Ivano E. Castelli, Karsten W. Jacobsen, Jan Rossmeisl
877
15786
7
DFT-rPBE
GPAW
29
The JARVIS_AGRA_O dataset is part of the joint automated repository for various integrated simulations (JARVIS) DFT database. This dataset contains data from the training set for the oxygen reduction reaction (ORR) dataset from Batchelor et al., as used in the automated graph representation algorithm (AGRA) training dataset: a collection of DFT training data for training a graph representation method to extract the local chemical environment of metallic surface adsorption sites. Bulk calculations were performed with k-point = 8 x 8 x 4. Training adsorption energies were calculated on slabs, k-point = 4 x 4 x 1, while testing energies used k-point = 3 x 3 x 1. JARVIS is a set of tools and datasets built to meet current materials design challenges.
10.60732/a3177807
Ir, O, Pd, Pt, Rh, Ru
Thomas A.A. Batchelor, Jack K. Pedersen, Simon H. Winther, Ivano E. Castelli, Karsten W. Jacobsen, Jan Rossmeisl
1000
17000
6
DFT-rPBE
GPAW
29
The rattled-relax validation split of OMat24 (Open Materials 2024). OMat24 is a large-scale open dataset of density functional theory (DFT) calculations. The dataset is available in subdatasets and subsampled sub-datasets based on the structure generation strategy used. There are two main splits in OMat24: train and validation, each divided into the aforementioned subsampling and sub-datasets.
10.60732/7a878cdf
Ac, Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, Hf, Hg, Ho, I, In, Ir, K, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Y, Yb, Zn, Zr
Luis Barroso-Luque, Muhammed Shuaibi, Xiang Fu, Brandon M. Wood, Misko Dzamba, Meng Gao, Ammar Rizvi, C. Lawrence Zitnick, Zachary W. Ulissi
91043
764266
84
DFT-PBE+U
VASP
29
OC20_S2EF_val_ood_both is the out-of-domain validation set of the OC20 Structure to Energy and Forces (S2EF) dataset featuring both unseen catalyst composition and unseen adsorbate. Features include energy, atomic forces and data from the OC20 mappings file, including adsorbate id, materials project bulk id and miller index.
10.60732/f8398b5c
Ag, Al, As, Au, Bi, C, Ca, Cd, Cl, Co, Cr, Cs, Cu, Fe, Ga, Ge, H, Hf, Hg, In, Ir, K, Mn, Mo, N, Na, Nb, Ni, O, Os, P, Pb, Pd, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Tc, Te, Ti, Tl, V, W, Y, Zn, Zr
Lowik Chanussot, Abhishek Das, Siddharth Goyal, Thibaut Lavril, Muhammed Shuaibi, Morgane Riviere, Kevin Tran, Javier Heras-Domingo, Caleb Ho, Weihua Hu, Aini Palizhati, Anuroop Sriram, Brandon Wood, Junwoong Yoon, Devi Parikh, C. Lawrence Zitnick, Zachary Ulissi
999944
84604635
55
DFT-rPBE
VASP
29
Configurations from CA-9 dataset used during validation step for NNP_CA-9 potential. CA-9 consists of configurations of carbon with curated subsets chosen to test the effects of intentionally choosing dissimilar configurations when training neural network potentials
10.60732/5cebd981
C
Daniel Hedman, Tom Rothe, Gustav Johansson, Fredrik Sandin, J. Andreas Larsson, Yoshiyuki Miyamoto
8000
436601
1
DFT-PBE
VASP
29
2558 structures selected from the GAP-20 database. This dataset was one of the datasets used in testing screening parameters during the process of producing an active learning dataset for Cu-C interactions for the purposes of exploring substrate-catalyzed deposition as a means of controllable synthesis of carbon nanomaterials. The combined dataset includes structures from the Carbon_GAP_20 dataset and additional configurations of carbon clusters on a Cu(111) surface.
10.60732/f340d1d9
C
Di Zhang, Peiyun Yi, Xinmin Lai, Linfa Peng, Hao Li
2558
168066
1
DFT-PBE+D3
CP2K
29
Dataset used to train a machine learning model to calculate density functional theory-quality formation energies of all ~2 x 106 pristine ABC2D6 elpasolite crystals that can be made up from main-group elements (up to bismuth).
10.60732/f87a64e4
Al, Ar, As, B, Ba, Be, Bi, Br, C, Ca, Cl, Cs, F, Ga, Ge, H, He, I, In, K, Kr, Li, Mg, N, Na, Ne, O, P, Pb, Rb, S, Sb, Se, Si, Sn, Sr, Te, Tl, Xe
Felix Faber, Alexander Lindmaa, O. Anatole von Lilienfeld, Rickard Armiento
21881
218810
39
DFT-PBE
VASP 5.2.2
29
Binning-random configurations from CA-9 dataset used during validation step for NNP_BR potential. CA-9 consists of configurations of carbon with curated subsets chosen to test the effects of intentionally choosing dissimilar configurations when training neural network potentials
10.60732/0f8a1418
C
Daniel Hedman, Tom Rothe, Gustav Johansson, Fredrik Sandin, J. Andreas Larsson, Yoshiyuki Miyamoto
4002
214310
1
DFT-PBE
VASP
29
Partial dataset for "Accuracy evaluation of different machine learning force field features". The included data is limited to that hosted directly on the repository at the related GitHub link. From publication abstract: Predicting energies and forces using machine learning force field (MLFF) depends on accurate descriptions (features) of chemical environment. Despite the numerous features proposed, there is a lack of controlled comparison among them for their universality and accuracy. In this work, we compared several commonly used feature types for their ability to describe physical systems. These different feature types include cosine feature, Gaussian feature, moment tensor potential (MTP) feature, spectral neighbor analysis potential feature, simplified smooth deep potential with Chebyshev polynomials feature and Gaussian polynomials feature, and atomic cluster expansion feature. We evaluated the training root mean square error (RMSE) for the atomic group energy, total energy, and force using linear regression model regarding to the density functional theory results. We applied these MLFF models to an amorphous sulfur system and carbon systems, and the fitting results show that MTP feature can yield the smallest RMSE results compared with other feature types for either sulfur system or carbon system in the disordered atomic configurations. Moreover, as an extending test of other systems, the MTP feature combined with linear regression model can also reproduce similar quantities along the ab initio molecular dynamics trajectory as represented by Cu systems. Our results are helpful in selecting the proper features for the MLFF development.
10.60732/209e0c9c
C, H, Mg, Ni, O, Si
Ting Han, Jie Li, Liping Liu, Fengyu Li, Lin-Wang Wang
17255
918240
6
DFT-PBE
PWmat
29
The MAD benchmark dataset, containing a selection of MAD test, MPtrj, Alexandria, SPICE, MD22 and OC2020 datasets, computed with MPtrj DFT settings. Part of the MAD (Massive Atomic Diversity) dataset family. From the creators: Starting from relatively small sets of stable structures, the dataset is built to contain “massive atomic diversity” (MAD) by aggressively distorting these configurations, with near-complete disregard for the stability of the resulting configurations. The electronic structure details, on the other hand, are chosen to maximize consistency rather than to obtain the most accurate prediction fora given structure, or to minimize computational effort. The MAD dataset we present here, despite containing fewer than 100k structures, has already been shown to enable training universal interatomic potentials that are competitive with models trained on traditional datasets with two to three orders of magnitude more structures.
10.60732/30653c33
Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, Np, O, Os, P, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr
Arslan Mazitov, Sofiia Chorna, Guillaume Fraux, Marnik Bercx, Giovanni Pizzi, Sandip De, Michele Ceriotti
2114
58755
85
DFT-PBEsol
VASP
28
Benzene validation PBE0-MBD dataset from "Semi-local and hybrid functional DFT data for thermalised snapshots of polymorphs of benzene, succinic acid, and glycine". DFT reference energies and forces were calculated using Quantum Espresso v6.3. The calculations were performed with the semi-local PBE xc functional, Tkatchenko-Scheffler dispersion correction, optimised norm-conserving Vanderbilt pseudopotentials, a Monkhorst-Pack k-point grid with a maximum spacing of 0.06 x 2π A^-1, and a plane-wave energy cut-off of 100 Ry for the wavefunction.
10.60732/eb286cb6
C, H
Venkat Kapil, Edgar A. Engel
200
6072
2
DFT-PBE+TS
Quantum ESPRESSO v6.3
28
Example dataset for MISPR (Materials Informatics for Structure-Property Relationships) materials science simulation software, with DFT-calculated configuration properties for three different MISPR workflows: nuclear magnetic resonance (NMR) chemical shifts, electrostatic partial charges (ESP) and bond dissociation energies (BDE).
10.60732/2b830270
C, Cl, F, H, N, O, P, S, Si
Rasha Atwi, Matthew Bliss, Maxim Makeev, Nav Nidhi Rajput
503
8996
9
DFT-ωB97X, DFT-B3LYP
Gaussian 16
28
The JARVIS_TinNet dataset is part of the joint automated repository for various integrated simulations (JARVIS) database. This dataset contains configurations from the TinNet-O dataset: a collection assembled to train a machine learning model for the purposes of assisting catalyst design by predicting chemical reactivity of transition-metal surfaces. The adsorption systems contained in this dataset consist of {111}-terminated metal surfaces. JARVIS is a set of tools and collected datasets built to meet current materials design challenges.
10.60732/9541fb8b
Ag, Al, Au, Bi, Cd, Co, Cr, Cu, Fe, Ga, Hf, In, Ir, La, Mn, Mo, Nb, Ni, O, Os, Pb, Pd, Pt, Re, Rh, Ru, Sc, Sn, Ta, Ti, Tl, V, W, Y, Zn, Zr
Shih-Han Wang, Hemanth Somarajan Pillai, Siwen Wang, Luke E. K. Achenie, Hongliang Xin
747
12699
36
DFT-PBE
Quantum ESPRESSO
28
Benzene validation PBE-TS dataset from "Semi-local and hybrid functional DFT data for thermalised snapshots of polymorphs of benzene, succinic acid, and glycine". DFT reference energies and forces were calculated using Quantum Espresso v6.3. The calculations were performed with the semi-local PBE xc functional, Tkatchenko-Scheffler dispersion correction, optimised norm-conserving Vanderbilt pseudopotentials, a Monkhorst-Pack k-point grid with a maximum spacing of 0.06 x 2π A^-1, and a plane-wave energy cut-off of 100 Ry for the wavefunction.
10.60732/01b77268
C, H
Venkat Kapil, Edgar A. Engel
1000
29712
2
DFT-PBE+TS
Quantum ESPRESSO v6.3
28
The val_aimd-from-PBE-3000-nvt validation split of OMat24 (Open Materials 2024). OMat24 is a large-scale open dataset of density functional theory (DFT) calculations. The dataset is available in subdatasets and subsampled sub-datasets based on the structure generation strategy used. There are two main splits in OMat24: train and validation, each divided into the aforementioned subsampling and sub-datasets.
10.60732/6f64849f
Ac, Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, Hf, Hg, Ho, I, In, Ir, K, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Y, Yb, Zn, Zr
Luis Barroso-Luque, Muhammed Shuaibi, Xiang Fu, Brandon M. Wood, Misko Dzamba, Meng Gao, Ammar Rizvi, C. Lawrence Zitnick, Zachary W. Ulissi
76478
5186115
84
DFT-PBE+U
VASP
28
This dataset contains structural calculations of LaMnO3 carried out in Quantum ESPRESSO at the DFT-PBEsol+U level of theory. The dataset was built to explore strained and stoichiometric and oxygen-deficient LaMnO3.
10.60732/9772459c
Ba, La, Mn, O, Ti
Chiara Ricca, Nicolas Niederhauser, Ulrich Aschauer
4513
174298
5
DFT-PBE+U
Quantum ESPRESSO
28
Training simulations from CGM-MLP_natcomm2023 of carbon deposition on a Cu surface. This appears similar to CGM-MLP_natcomm2023_CU-C_deposition, as there are no O atoms present in this set. This dataset was one of the datasets used in training during the process of producing an active learning dataset for the purposes of exploring substrate-catalyzed deposition on metal surfaces such as Cu(111), Cr(110), Ti(001), and oxygen-contaminated Cu(111) as a means of controllable synthesis of carbon nanomaterials. The combined dataset includes structures from the Carbon_GAP_20 dataset and additional configurations of carbon clusters on a Cu, Cr and Ti surfaces.
10.60732/ae9380c5
C, Cu
Di Zhang, Peiyun Yi, Xinmin Lai, Linfa Peng, Hao Li
1693
326182
2
DFT-PBE+D3
CP2K
28
Training simulations from CGM-MLP_natcomm2023 of carbon on an oxygen-contaminated Cu surface. This dataset was one of the datasets used in training during the process of producing an active learning dataset for the purposes of exploring substrate-catalyzed deposition on metal surfaces such as Cu(111), Cr(110), Ti(001), and oxygen-contaminated Cu(111) as a means of controllable synthesis of carbon nanomaterials. The combined dataset includes structures from the Carbon_GAP_20 dataset and additional configurations of carbon clusters on a Cu, Cr and Ti surfaces.
10.60732/215303a5
C, Cu, O
Di Zhang, Peiyun Yi, Xinmin Lai, Linfa Peng, Hao Li
1717
387151
3
DFT-PBE+D3
CP2K
28
Approximately 28,500 configurations of hafnia (HfO2) used in the training of a DP model for the prediction of properties of various hafnia polymorphs, including transition barriers between different phases.
10.60732/9fbd0fcb
Hf, O
Jing Wu, Yuzhi Zhang, Linfeng Zhang, Shi Liu
28506
2736576
2
DFT-PBE
VASP
28
The chalcopyrite Cu(In,Ga)S2 has gained renewed interest in recent years due to its potential application in tandem solar cells. In this contribution, a combined theoretical and experimental approach is applied to investigate stable and metastable phases forming in sputtered CuInS2 (CIS) thin films. Ab initio calculations are performed to obtain formation energies, X-ray diffraction patterns, and Raman spectra of various CIS polytypes and related compounds. Multiple low-energy CIS structures with zinc-blende and wurtzite-derived lattices are identified and their XRD/Raman patterns are shown to contain many overlapping features, which could lead to misidentification unless the techniques are duly combined and analyzed. The results are verified against experimental XRD/Raman spectra measured on a series of CIS films with different compositions and treated at different temperatures, revealing the formation of several CIS polymorphs and secondary phases. The characteristic features and the mechanisms behind the formation of different phases are discussed with the focus on the thin-film photovoltaic application of CIS. The dataset contains structures and VASP output files used to derive the discussed trends. version 2
10.60732/bcce3f87
Cu, In, Na, S
Jes Larsen, Kostiantyn Sopiha, Clas Persson, Charlotte Platzer-Björkman, Marika Edoff
3103
117852
4
DFT-PBE
VASP
28
A dataset used to train machine-learning interatomic potentials (moment tensor potentials) for multicomponent alloys to ab initio data in order to investigate the disordered body-centered cubic (bcc) TiZrHfTax system with varying Ta concentration.
10.60732/434db566
Hf, Ta, Ti, Zr
Konstantin Gubaev, Yuji Ikeda, Ferenc Tasnádi, Jörg Neugebauer, Alexander V. Shapeev, Blazej Grabowski, Fritz Körmann
3622
223930
4
DFT-PBE
VASP
28
The rattled-300-subsampled validation split of OMat24 (Open Materials 2024). OMat24 is a large-scale open dataset of density functional theory (DFT) calculations. The dataset is available in subdatasets and subsampled sub-datasets based on the structure generation strategy used. There are two main splits in OMat24: train and validation, each divided into the aforementioned subsampling and sub-datasets.
10.60732/de580670
Ac, Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Y, Yb, Zn, Zr
Luis Barroso-Luque, Muhammed Shuaibi, Xiang Fu, Brandon M. Wood, Misko Dzamba, Meng Gao, Ammar Rizvi, C. Lawrence Zitnick, Zachary W. Ulissi
34244
490880
85
DFT-PBE+U
VASP
28
This dataset was designed to enable machine-learning of Ta elastic, thermal, and defect properties, as well as surface energetics, melting, and the structure of the liquid phase. The dataset was constructed by starting with the dataset from J. Byggmästar et al., Phys. Rev. B 100, 144105 (2019), then rescaling all of the configurations to the correct lattice spacing and adding in gamma surface configurations.
10.60732/43837a12
Ta
Jesper Byggmästar, Kai Nordlund, Flyura Djurabekova
3773
45385
1
DFT-PBE
VASP
28
130,000 configurations of zeolite from the Database of Zeolite Structures. Calculations performed using Amsterdam Modeling Suite software.
10.60732/7eb1fefb
Al, Ba, Be, C, Ca, Cs, F, Ge, H, K, Li, N, Na, O, Si
Leonid Komissarov, Toon Verstraelen
12929
1841496
15
DFT-revPBE+D3(BJ)
BAND
28
This dataset contains data from density functional theory calculations of various atomic configurations of pure Zr, pure Sn, and Zr-Sn alloys with different structures, defects, and compositions. Energies, forces, and stresses are calculated at the DFT level of theory. Includes 23,956 total configurations.
10.60732/8f77465e
Sn, Zr
Haojie Mei, Liang Chen, Feifei Wang, Guisen Liu, Jing Hu, Weitong Lin, Yao Shen, Jinfu Li, Lingti Kong
23232
680289
2
DFT-PBE
VASP
28
The validation split of the MAD (Massive Atomic Diversity) dataset. From the creators: Starting from relatively small sets of stable structures, the dataset is built to contain “massive atomic diversity” (MAD) by aggressively distorting these configurations, with near-complete disregard for the stability of the resulting configurations. The electronic structure details, on the other hand, are chosen to maximize consistency rather than to obtain the most accurate prediction fora given structure, or to minimize computational effort. The MAD dataset we present here, despite containing fewer than 100k structures, has already been shown to enable training universal interatomic potentials that are competitive with models trained on traditional datasets with two to three orders of magnitude more structures.
10.60732/8ff541c9
Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, O, Os, P, Pb, Pd, Pm, Po, Pr, Pt, Rb, Re, Rh, Rn, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Ti, Tl, Tm, V, W, Xe, Y, Yb, Zn, Zr
Arslan Mazitov, Sofiia Chorna, Guillaume Fraux, Marnik Bercx, Giovanni Pizzi, Sandip De, Michele Ceriotti
9566
257052
85
DFT-PBEsol
VASP
27
This dataset consists of graphene superlattices with tungsten adatoms with properties calculated at the DFT level of theory. The authors modeled the placement of tungsten adatoms on a graphene monolayer. The resulting superlattice structures were then used to calculate electronic band structure and phonon dispersion relations. The dataset was used to investigate the effect of adatom placement on electronic band structure and phonon dispersion relations of graphene superlattices. The creation of the dataset involved the following steps: 1. Selection of the graphene monolayer as the starting point for the superlattice construction. 2. Placement of tungsten adatoms in the center of the unit cell 3. Calculation of the electronic structure and other properties of the resulting superlattice using DFT. 4. Generation of a set of reduced Brillouin zones representing the symmetry of the superlattice. 5. Calculation of the electronic band structure and phonon dispersion relations for each superlattice structure in the dataset.
10.60732/bdd0d26b
C, Cr, Ir, Mo, Nb, Os, Re, Rh, Ru, Ta, W
Anastasiia Skurativska, Stepan S. Tsirkin, Fabian D Natterer, Titus Neupert, Mark H Fischer
18
774
11
DFT-PBE
VASP
27
This data was assembled to investigate rare-earth-catalyzed benzylic C(sp3)-H addition of pyridines to olefins. All calculations were performed with the Gaussian 09 software package. The B3PW91 functional was used for geometric optimization without any symmetric constraints. Each optimized structure was subsequently analyzed by harmonic vibrational frequencies at the same level of theory for characterization of a minimum (NImag = 0) or a transition state (NImag = 1) to obtain the thermodynamic data. The 6-31G(d) basis set was used for C, H, and N atoms, and Stuttgart/Dresden relativistic effective core potentials (RECPs) as well as the associated valence basis sets were used for the Y atom. To obtain more accurate energies, single-point energy calculations were performed with a larger basis set. In such single-point calculations, the M06-L functional, which often shows good performance in the treatment of transition-metal systems, was used together with the CPCM solvation model for consideration of the toluene solvation effect. The same basis set together with associated pseudopotentials as in geometry optimization was used for the Y atom, and the 6-311+G(d,p) basis set was used for the remaining atoms.
10.60732/445f826b
C, H, N, Y
Guangli Zhou, Gen Luo, Xiaohui Kang, Zhaomin Hou, Yi Luo
58
3514
4
DFT-M06-L
Gaussian 09
27
192 structures were uniformly selected from the AIMD simulation, excluding any structures that are part of the training set. This dataset was one of the datasets used in testing screening parameters during the process of producing an active learning dataset for Cu-C interactions for the purposes of exploring substrate-catalyzed deposition as a means of controllable synthesis of carbon nanomaterials. The combined dataset includes structures from the Carbon_GAP_20 dataset and additional configurations of carbon clusters on a Cu(111) surface.
10.60732/eb6e9ead
C, Cu
Di Zhang, Peiyun Yi, Xinmin Lai, Linfa Peng, Hao Li
193
38004
2
DFT-PBE+D3
CP2K
27
The train set of a train and test set pair.The combined datasets comprise approximately 275 configurations of monolayer quasi-hexagonal-phase fullerene (qHPF) membrane used to train and test an NEP model.
10.60732/906d79f3
C
Penghua Ying
237
28440
1
DFT-PBE
VASP
27
Glycine validation PBE-TS dataset from "Semi-local and hybrid functional DFT data for thermalised snapshots of polymorphs of benzene, succinic acid, and glycine". DFT reference energies and forces were calculated using Quantum Espresso v6.3. The calculations were performed with the semi-local PBE xc functional, Tkatchenko-Scheffler dispersion correction, optimised norm-conserving Vanderbilt pseudopotentials, a Monkhorst-Pack k-point grid with a maximum spacing of 0.06 x 2π A^-1, and a plane-wave energy cut-off of 100 Ry for the wavefunction.
10.60732/2d6ebecc
C, H, N, O
Venkat Kapil, Edgar A. Engel
500
17800
4
DFT-PBE+TS
Quantum ESPRESSO v6.3
27
The JARVIS_TinNet dataset is part of the joint automated repository for various integrated simulations (JARVIS) database. This dataset contains configurations from the TinNet-OH dataset: a collection assembled to train a machine learning model for the purposes of assisting catalyst design by predicting chemical reactivity of transition-metal surfaces. The adsorption systems contained in this dataset consist of {111}-terminated metal surfaces. JARVIS is a set of tools and collected datasets built to meet current materials design challenges.
10.60732/401689c1
Ag, Al, Au, Bi, Cd, Co, Cr, Cu, Fe, Ga, H, Hf, In, Ir, La, Mn, Mo, Nb, Ni, O, Os, Pb, Pd, Pt, Re, Rh, Ru, Sc, Sn, Ta, Ti, Tl, V, W, Y, Zn, Zr
Shih-Han Wang, Hemanth Somarajan Pillai, Siwen Wang, Luke E. K. Achenie, Hongliang Xin
748
13464
37
DFT-PBE
Quantum ESPRESSO
27
This dataset of molecular structures was extracted, using the NOMAD API, from all available structures in the NOMAD Archive that only include C, H, O, and N. This dataset consists of 50.42% H, 30.41% C, 10.36% N, and 8.81% O and includes 96 804 atomic environments in 5217 structures.
10.60732/c5e37779
C, H, N, O
Berk Onat, Christoph Ortner, James R. Kermode
3774
60197
4
DFT-PBE, DFT-HSE06, DFT-mPW1PW91, DFT-B1B95, DFT-M06-2X, DFT-B3PW91, DFT-B88-LYP, DFT-LDA-PW-PZ, DFT-LDA-PZ_MOD, DFT-LDA-C_VWN, DFT-B2PLYP, DFT-TPSSh, DFT-PBE0
Octopus, Gaussian, VASP, exciting, FHI-aims
27
This dataset is formed from two parts: single-species datasets for Al, Ni, and Cu from the NOMAD Encyclopedia and multi-species datasets that include Al, Ni and Cu from NOMAD Archive. Duplicates have been removed from NOMAD Encyclopedia data. For the multi-species data, only the last configuration steps for each NOMAD Archive record were used because the last configuration typically cooresponds with a fully relaxed configuration. In this dataset, the NOMAD unique reference access IDs are retained along with a subset of their meta information that includes whether the supplied configuration is from a converged calculation as well as the Density Functional Theory (DFT) code, version, and type of DFT functionals with the total potential energies. This dataset consists of 39.1% Al, 30.7% Ni, and 30.2% Cu and has 27,987 atomic environments in 3337 structures.
10.60732/2744ff4e
Al, Cu, Ni
Berk Onat, Christoph Ortner, James R. Kermode
1016
4646
3
DFT-undefined
GPAW, VASP, exciting, FHI-aims
27
Training simulations from CGM-MLP_natcomm2023 of carbon deposition on a Cu surface. This dataset was one of the datasets used in training during the process of producing an active learning dataset for the purposes of exploring substrate-catalyzed deposition on metal surfaces such as Cu(111), Cr(110), Ti(001), and oxygen-contaminated Cu(111) as a means of controllable synthesis of carbon nanomaterials. The combined dataset includes structures from the Carbon_GAP_20 dataset and additional configurations of carbon clusters on a Cu, Cr and Ti surfaces.
10.60732/c3b4e684
C, Cu
Di Zhang, Peiyun Yi, Xinmin Lai, Linfa Peng, Hao Li
1177
204591
2
DFT-PBE+D3
CP2K
27
The extended training dataset for GST_GAP_22, calculated using the PBEsol functional. New configurations, simulated under external electric fields, were labelled with DFT and added to the original reference database GST-GAP-22 contains configurations of phase-change materials on the quasi-binary GeTe-Sb2Te3 (GST) line of chemical compositions. Data was used for training a machine learning interatomic potential to simulate a range of germanium-antimony-tellurium compositions under realistic device conditions.
10.60732/37c76fa8
Ge, Sb, Te
Yuxing Zhou, Wei Zhang, Evan Ma, Volker L. Deringer
2913
398991
3
DFT-PBEsol
CASTEP
27
Approximately 15,000 configurations of copper used to demonstrate the DP-GEN data generator for PES machine learning models.
10.60732/2060021e
Cu
Yuzhi Zhang, Haidi Wang, Weijie Chen, Jinzhe Zeng, Linfeng Zhang, Han Wang, Weinan E
15269
297369
1
DFT-PBE
VASP
27
Configurations from CA-9 dataset used for training NNP_CA-9 potential. CA-9 consists of configurations of carbon with curated subsets chosen to test the effects of intentionally choosing dissimilar configurations when training neural network potentials
10.60732/8b765383
C
Daniel Hedman, Tom Rothe, Gustav Johansson, Fredrik Sandin, J. Andreas Larsson, Yoshiyuki Miyamoto
39993
2195024
1
DFT-PBE
VASP
27
Data from the paper 'Ferrimagnetism induced by thermal vibrations in oxygen-deficient manganite heterostructures'. Includes Quantum ESPRESSO calculations of SrCaMnO3 and SrMnO3, stoichiometric and defective cells.
Ca, Mn, O, Sr
Moloud Kaviani, Chiara Ricca, Ulrich Aschauer
11594
459546
4
DFT-PBEsol+U
Quantum ESPRESSO
27
Succinic acid validation PBE-TS dataset from "Semi-local and hybrid functional DFT data for thermalised snapshots of polymorphs of benzene, succinic acid, and glycine". DFT reference energies and forces were calculated using Quantum Espresso v6.3. The calculations were performed with the semi-local PBE xc functional, Tkatchenko-Scheffler dispersion correction, optimised norm-conserving Vanderbilt pseudopotentials, a Monkhorst-Pack k-point grid with a maximum spacing of 0.06 x 2π A^-1, and a plane-wave energy cut-off of 100 Ry for the wavefunction.
10.60732/fa9d54f4
C, H, O
Venkat Kapil, Edgar A. Engel
500
14000
3
DFT-PBE+TS
Quantum ESPRESSO v6.3
26
500 configurations of Mg2 for MD prediction using a model fitted on Al, W, Mg and Si.
10.60732/965983d1
Mg
Connor Allen, Albert P. Bartok
500
1000
1
IP-GAP
CASTEP
26
500 decorrelated geometries sampled from 600 K xTB MD run. Acetylacetone dataset generated from a long molecular dynamics simulation at 300 K using a Langevin thermostat at the semi-empirical GFN2-xTB level of theory. Configurations were sampled at an interval of 1 ps and the resulting set of configurations were recomputed with density functional theory using the PBE exchange-correlation functional with D3 dispersion correction and def2-SVP basis set and VeryTightSCF convergence settings using the ORCA electronic structure package.
10.60732/7239a192
C, H, O
Ilyes Batatia, Simon Batzner, Dávid Péter Kovács, Albert Musaelian, Gregor N. C. Simm, Ralf Drautz, Christoph Ortner, Boris Kozinsky, Gábor Csányi
500
7500
3
DFT-PBE+D3
ORCA 5.0
26
This data set was used to generate a multi-element linear SNAP potential for InP, as published in Cusentino, M. A. et. al, J. Chem. Phys. (2020). Intended to produce an interatomic potential for indium phosphide capable of capturing high-energy defects that result from radiation damage cascades.
10.60732/50cc0906
In, P
Mary Alice Cusentino, Mitchell A. Wood, Aidan P. Thompson
1802
106761
2
DFT-LDA
VASP
26
Configurations of water clusters from HO_LiMoNiTi_NPJCM_2020 used in the training of an ANN, whereby total energy is extrapolated by a Taylor expansion as a means of reducing computational costs.
10.60732/b633b325
H, O
April M. Cooper, Johannes Kästner, Alexander Urban, Nongnuch Artrith
1847
33246
2
DFT-BLYP+D3
VASP
26
Training data only from the Co_dimer_JPCA_2022 dataset. This dataset contains dimer molecules of Co(II) with potential energy calculations for structures with ferromagnetic and antiferromagnetic spin configurations. Calculations were carried out in Gaussian 16 with the PBE exchange-correlation functional and 6-31+G* basis set. All molecules contain the same atomic core region, consisting of the tetrahedral and octahedral Co centers and the three PO2R2 bridging ligands. The ligand exchange provides a broad range of exchange energies (ΔEJ), from +50 to -200 meV, with 80% of the ligands yielding ΔEJ < 10 meV.
10.60732/07315f04
C, Cl, Co, H, N, O, P, S
Sijin Ren, Eric Fonseca, William Perry, Hai-Ping Cheng, Xiao-Guang Zhang, Richard Hennig
1794
154593
8
DFT-PBE
Gaussian 16
26
Benzene training PBE0-MBD dataset from "Semi-local and hybrid functional DFT data for thermalised snapshots of polymorphs of benzene, succinic acid, and glycine". DFT reference energies and forces were calculated using Quantum Espresso v6.3. The calculations were performed with the semi-local PBE xc functional, Tkatchenko-Scheffler dispersion correction, optimised norm-conserving Vanderbilt pseudopotentials, a Monkhorst-Pack k-point grid with a maximum spacing of 0.06 x 2π A^-1, and a plane-wave energy cut-off of 100 Ry for the wavefunction.
10.60732/8d563e8a
C, H
Venkat Kapil, Edgar A. Engel
1799
49512
2
DFT-PBE+TS
Quantum ESPRESSO v6.3
26
About 2,500 configurations of alpha-Fe used in the training and testing of a ML model with the goal of building magneto-elastic machine-learning interatomic potentials for large-scale spin-lattice dynamics simulations.
10.60732/fe28ef5e
Fe
Svetoslav Nikolov, Mitchell A. Wood, Attila Cangi, Jean-Bernard Maillet, Mihai-Cosmin Marinica, Aidan P. Thompson, Michael P. Desjarlais, Julien Tranchida
2157
44480
1
DFT-PBE
VASP
26
The rattled-500-subsampled validation split of OMat24 (Open Materials 2024). OMat24 is a large-scale open dataset of density functional theory (DFT) calculations. The dataset is available in subdatasets and subsampled sub-datasets based on the structure generation strategy used. There are two main splits in OMat24: train and validation, each divided into the aforementioned subsampling and sub-datasets.
10.60732/6f9ded6d
Ac, Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, Hf, Hg, Ho, I, In, Ir, K, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr
Luis Barroso-Luque, Muhammed Shuaibi, Xiang Fu, Brandon M. Wood, Misko Dzamba, Meng Gao, Ammar Rizvi, C. Lawrence Zitnick, Zachary W. Ulissi
39464
564068
85
DFT-PBE+U
VASP
26
This dataset was generated using the following active learning scheme: 1) candidate structures were relaxed by a partially-trained MTP model, 2) structures for which the MTP had to perform extrapolation were passed to DFT to be re-computed, 3) the MTP was retrained, including the structures that were re-computed with DFT, 4) steps 1-3 were repeated until the MTP no longer extrapolated on any of the original candidate structures. The original candidate structures for this dataset included about 27,000 configurations that were bcc-like and close-packed (fcc, hcp, etc.) with 8 or fewer atoms in the unit cell and different concentrations of Co, Nb, and V.
10.60732/f2c623f1
Co, Nb, V
Konstantin Gubaev, Evgeny V. Podryabinkin, Gus L.W. Hart, Alexander V. Shapeev
383
2812
3
DFT-undefined
VASP
25
468 structures uniformly selected from the MD/tfMC simulation, excluding any structures that are part of the training set. This dataset was one of the datasets used in testing screening parameters during the process of producing an active learning dataset for Cu-C interactions for the purposes of exploring substrate-catalyzed deposition as a means of controllable synthesis of carbon nanomaterials. The combined dataset includes structures from the Carbon_GAP_20 dataset and additional configurations of carbon clusters on a Cu(111) surface.
10.60732/8ad1a886
C, Cu
Di Zhang, Peiyun Yi, Xinmin Lai, Linfa Peng, Hao Li
469
156312
2
DFT-PBE+D3
CP2K
25
The train set of a train/test pair from the ethanol dataset from sGDML. To create the coupled cluster datasets, the data used for training the models were created by running ab initio MD in the NVT ensemble using the Nosé-Hoover thermostat at 500 K during a 200 ps simulation with a resolution of 0.5 fs. Energies and forces were recalculated using all-electron coupled cluster with single, double and perturbative triple excitations (CCSD(T)).The Dunning correlation-consistent basis set cc-pVTZ was used for ethanol. All calculations were performed with the Psi4 software suite.
10.60732/c254fdb2
C, H, O
Stefan Chmiela, Huziel E. Sauceda, Klaus-Robert Müller, Alexandre Tkatchenko
998
8982
3
CCSD(T)
Psi4
25
A dataset of DFT-calculated energies created to investigate the effect of hydrogen doping on the crystal structure and the electronic state in SmNiO3.Configuration sets include sets for apically and side-bonded hydrogen atoms for 1-9 hydrogen atoms.
10.60732/b834b71f
H, Ni, O, Sm
Kunihiko Yamauchi, Ikutaro Hamada
3318
156419
4
DFT-PBE+U
VASP
25
This dataset contains DFT calculations that were carried out in conjunction with experimental investigation of a cationic phenoxyimine yttrium complex as an isoprene polimerization catalyst. Calculations were performed using the Gaussian 09 D.01 suite of programs.Electronic structure calculations were performed at the DFT level using the B3PW91 functional. The Stuttgart-Cologne small-core quasi-relativistic pseudopotential ECP28MWB and its available basis set including up to the g function were used to describe yttrium. Similarly, silicon and phosphorus were represented by a Stuttgart-Dresden-Bonn pseudopotential along with the related basis set augmented by a d function of polarization (αd(P) = 0.387 and αd(Si) = 0.284). Other atoms were described by a polarized all-electron triple-ζ 6-311G(d,p) basis set. Bulk solvent effect of toluene or THF was simulated using the SMD continuum model. The Grimme empirical correction with the original D3 damping function was used to include the dispersion correction as a single-point calculation. Transition-state optimization was followed by frequency calculations to characterize the stationary point. Intrinsic reaction coordinate calculations were performed to confirm the connectivity of the transition states. Gibbs energies were estimated within the harmonic oscillator approximation and estimated at 298 K and 1 atm.
10.60732/bd18acbe
Al, B, C, F, H, N, O, Si, Y
Alexis D. Oswald, Ludmilla Verrieux, Pierre-Alain R. Breuil, Hélène Olivier-Bourbigou, Julien Thuilliez, Florent Vaultier, Mostafa Taoufik, Lionel Perrin, Christophe Boisson
109
9074
9
DFT-B3PW91+D3(BJ)
Gaussian 09
24
The train set of a train/test pair from the malonaldehyde dataset from sGDML. To create the coupled cluster datasets, the data used for training the models were created by running ab initio MD in the NVT ensemble using the Nosé-Hoover thermostat at 500 K during a 200 ps simulation with a resolution of 0.5 fs. Energies and forces were recalculated using all-electron coupled cluster with single, double and perturbative triple excitations (CCSD(T)). The Dunning correlation-consistent basis set cc-pVDZ was used for malonaldehyde. All calculations were performed with the Psi4 software suite.
10.60732/b53c02ad
C, H, O
Stefan Chmiela, Huziel E. Sauceda, Klaus-Robert Müller, Alexandre Tkatchenko
1000
9000
3
CCSD(T)
Psi4
23
6000 configurations of liquid and amorphous HfO2 generated for use with an active learning ML model.
10.60732/dcb4440a
Hf, O
Ganesh Sivaraman, Anand Narayanan Krishnamoorthy, Matthias Baur, Christian Holm, Marius Stan, Gábor Csányi, Chris Benmore, Álvaro Vázquez-Mayagoitia
5999
575904
2
DFT-PBE
VASP 5.4.4
23
Approximately 2,800 configurations of Li10GeP2S12, based on crystal structures from the Materials Project database, material ID mp-696129. One of two LiGePS datasets from this source. The other uses the PBE functional, rather than the PBEsol functional.
10.60732/03312bdd
Ge, Li, P, S
Jianxing Huang, Linfeng Zhang, Han Wang, Jinbao Zhao, Jun Cheng, Weinan E
2835
504350
4
DFT-PBEsol
VASP 5.4.4
23
This dataset contains all frames from the trajectories for the training configurations in the OC20 initial structure to relaxed energy (IS2RE) and initial structure to relaxed structure (IS2RS) tasks of Open Catalyst 2020 (OC20). Dataset corresponds to the "All IS2RE/S training" data split under the "Relaxation Trajectories" section of the Open Catalyst Project page.
10.60732/d63dce0c
Ag, Al, As, Au, B, Bi, C, Ca, Cd, Cl, Co, Cr, Cs, Cu, Fe, Ga, Ge, H, Hf, Hg, In, Ir, K, Mn, Mo, N, Na, Nb, Ni, O, Os, P, Pb, Pd, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Tc, Te, Ti, Tl, V, W, Y, Zn, Zr
Lowik Chanussot, Abhishek Das, Siddharth Goyal, Thibaut Lavril, Muhammed Shuaibi, Morgane Riviere, Kevin Tran, Javier Heras-Domingo, Caleb Ho, Weihua Hu, Aini Palizhati, Anuroop Sriram, Brandon Wood, Junwoong Yoon, Devi Parikh, C. Lawrence Zitnick, Zachary Ulissi
92897924
7522584885
56
DFT-rPBE
VASP
13
DFT reference structures used to train neuroevolution potentials (NEP) for a study disentangling lone-pair chemistry and geometric effects in the octahedral tilting of halide double perovskites. The dataset contains the training configurations (with energies, forces, and stresses) for three representative compounds: Cs2AgAlBr6, Cs2AgBiBr6, and Cs2InBiBr6. Reference calculations used VASP with the SCAN+rVV10 meta-GGA functional (BPARAM=15.7, CPARAM=0.0093), a 520 eV plane-wave cutoff, Gaussian smearing (SIGMA=0.1 eV), and Gamma-centered Brillouin-zone sampling (KSPACING=0.25). Configuration sets group the structures by compound.
Ag, Al, Bi, Br, Cs, In
Mehmet Baskurt, Erik Fransson, Madeleine Lindvik, Paul Erhart, Julia Wiktor
1389
89760
6
DFT-SCAN+rVV10
VASP
11
Density functional theory dataset for cobalt, platinum, and CoPt bimetallic catalysts investigated for the dry reforming of methane (DRM). It comprises bulk metals and alloys (Co, Pt, CoPt L1_0 and fcc), (111) surface slab models, and minimum-energy reaction paths for CH4 and CO2 dissociation obtained with the machine-learning nudged elastic band (ML-NEB) method. All ionic relaxation/path images are included. Calculations used VASP 6.4.2 with the GGA-PBE functional, Grimme D3 dispersion with Becke-Johnson damping (IVDW=12), a plane-wave cutoff of 400 eV for slabs and 520 eV for bulk, spin polarization for cobalt-containing systems, Methfessel-Paxton smearing (ISMEAR=1), Gamma-centered k-point meshes, and dipole corrections normal to the surfaces. Transition states were located with the CatLearn ML-NEB module. Configuration sets separate bulk structures, surface slabs, and reaction-barrier images.
C, Co, H, O, Pt
David Niedbalka, Hector Prats, Estefanía Díaz López, Marcel Janák, Diana Piankova, Anna Loiudice, Raffaella Buonsanti, Aleix Comas-Vives, Christoph R. Müller, Paula M. Abdala
4028
248920
5
DFT-PBE+D3(BJ)
VASP 6.4.2
11
VASP single-point (SCF) DFT calculations underpinning a mechanistic study of A-site doping in lithium-lanthanum-titanate (LMTO/LLTO) perovskite nanorods and their interfaces with a p(MTFSI) polymer electrolyte, aimed at understanding interfacial Li-ion and Na-ion transport for composite polymer electrolyte design. The dataset includes bulk/reference systems (Li, Na, MTFSI, LiMTFSI, NaMTFSI, MTFSI dimers) and LMTO/polymer interface slabs at varying A-site compositions. Calculations used VASP with the r2SCAN meta-GGA functional (with rVV10 nonlocal correlation for surfaces) via pymatgen's MPScanRelaxSet, PBE PAW (version 54) potentials, EDIFF=1e-5 eV, and Gaussian smearing (ISMEAR=0). Energies and forces are taken from the SCF OUTCAR; structures from the paired CONTCAR. Configuration sets group calculations by reference/interface category.
C, F, H, K, La, Li, N, Na, O, S, Ti
Lauren B. Shepard, Ji-young Ock, Amit Bhattacharya, Tao Wang, Albina Borisevich, Michelle Lehmann, Sheng Dai, Raphaële Clément, Alexei P. Sokolov, X. Chelsea Chen, Susan B. Sinnott
50
16097
11
DFT-R2SCAN
VASP
10
Density functional theory reference data for constructing a machine-learning force field (MLFF) of cerium oxide (CeO2) surfaces containing an oxygen vacancy, generated with VASP on-the-fly machine-learning and stored in ML_AB training files. The dataset follows a dataset-merging strategy, combining six independently sampled surface families that vary the surface orientation (CeO2(100) Ce-terminated and CeO2(111)), slab thickness (two- vs three-layer), and oxygen-vacancy content (zero or one vacancy), for roughly 1,700 configurations carrying total energies, atomic forces, and stresses. Configuration sets group the data by surface family. Reference calculations used VASP with the spin-polarized PBE functional plus Grimme D3 dispersion and a Hubbard U correction on the Ce 4f states (DFT+U, Ueff=5.0 eV), a 520 eV plane-wave cutoff, and a 1x1x1 Gamma-centered k-point grid.
Ce, O
Kai Oshiro, Min Gao, Jun-ya Hasegawa
1746
51361
2
DFT-PBE+U+D3
VASP 6.4.2
9
DFT reference dataset for an Allegro machine-learned interatomic potential for silica (SiO2) valid up to 15000 K, spanning the high-temperature melt, melt-quench amorphization, and mechanical-deformation regimes. The configurations were selected by HYAL active learning - an Allegro/LAMMPS sampler proposing structures of alpha-quartz, beta-cristobalite, coesite, and amorphous silica across an initial set and melt, high-temperature melt, melt-quench, and mechanical shear/tension sampling stages - and each was then labeled with a single-point VASP calculation (roughly 2780 successfully labeled configurations with total energies, atomic forces, and stresses). Calculations used VASP 6.3.2 with the r2SCAN meta-GGA functional (PAW_PBE potentials), a 1000 eV plane-wave cutoff, an electronic convergence of 1e-6 eV, Gaussian smearing (ISMEAR=0, SIGMA=0.1 eV), and Gamma-point Brillouin-zone sampling; each r2SCAN calculation was preceded by a PBE pre-convergence step. The resulting Allegro potential was used to study dynamic fracture and energy dissipation in silica glass. Configuration sets group the data by silica system (quartz, cristobalite, coesite, amorphous).
O, Si
Henrik Andersen Sveinsson
2781
558432
2
DFT-r2SCAN
VASP 6.3.2
9
VASP machine-learning-force-field (ML_AB) training set for the CsPbI3 halide perovskite, covering the cubic-tetragonal phase transition. This is one of seven sister datasets from the same publication, each providing a VASP ML_AB training set for a slightly different characteristic of the CsPbI3 perovskite (the cubic-tetragonal phase transition and the migration of iodide vacancies and interstitials in neutral, positively charged, and negatively charged states), all generated to train machine-learned force fields for ion migration. Each configuration carries the total energy, atomic forces, and stress from VASP single-point reference calculations using the PBE functional with Grimme D3 dispersion and Becke-Johnson damping (PBE-D3-BJ), on 2x2x2 cubic supercells (~40 atoms). (Plane-wave cutoff and k-point details are reported only in the paper's Supporting Information.)
Cs, I, Pb
Viren Tyagi, Mike Pols, Geert Brocks, Shuxia Tao
862
34520
3
DFT-PBE+D3(BJ)
VASP
9
VASP machine-learning-force-field (ML_AB) training set for the CsPbI3 halide perovskite, covering migration of a neutral iodide interstitial. This is one of seven sister datasets from the same publication, each providing a VASP ML_AB training set for a slightly different characteristic of the CsPbI3 perovskite (the cubic-tetragonal phase transition and the migration of iodide vacancies and interstitials in neutral, positively charged, and negatively charged states), all generated to train machine-learned force fields for ion migration. Each configuration carries the total energy, atomic forces, and stress from VASP single-point reference calculations using the PBE functional with Grimme D3 dispersion and Becke-Johnson damping (PBE-D3-BJ), on 2x2x2 cubic supercells (~40 atoms). (Plane-wave cutoff and k-point details are reported only in the paper's Supporting Information.)
Cs, I, Pb
Viren Tyagi, Mike Pols, Geert Brocks, Shuxia Tao
2008
82328
3
DFT-PBE+D3(BJ)
VASP
9
Density functional theory study of the adsorption of hexachlorobenzene (C6Cl6, HCB) on a montmorillonite clay surface and the effect of partial hydration, with explicit co-adsorbed water molecules. Each configuration is a VASP geometry optimization of an HCB molecule (with water) on a montmorillonite slab; all ionic relaxation steps are included. The dataset also contains the isolated montmorillonite-slab and HCB-molecule reference calculations used to evaluate interaction energies. Calculations used VASP 6.2.0 with the PBE functional, the Tkatchenko-Scheffler dispersion correction with iterative Hirshfeld partitioning (IVDW=21), Gaussian smearing (ISMEAR=0), and Gamma-point Brillouin-zone sampling.
Al, C, Ca, Cl, F, Fe, H, Mg, Na, O, Si
Daniel Tunega, Peter Grančič, Martin H. Gerzabek, Leonard Böhm
9929
3551246
11
DFT-PBE
VASP 6.2.0
8
VASP machine-learning-force-field (ML_AB) training set for the CsPbI3 halide perovskite, covering migration of a positively charged iodide interstitial. This is one of seven sister datasets from the same publication, each providing a VASP ML_AB training set for a slightly different characteristic of the CsPbI3 perovskite (the cubic-tetragonal phase transition and the migration of iodide vacancies and interstitials in neutral, positively charged, and negatively charged states), all generated to train machine-learned force fields for ion migration. Each configuration carries the total energy, atomic forces, and stress from VASP single-point reference calculations using the PBE functional with Grimme D3 dispersion and Becke-Johnson damping (PBE-D3-BJ), on 2x2x2 cubic supercells (~40 atoms). (Plane-wave cutoff and k-point details are reported only in the paper's Supporting Information.)
Cs, I, Pb
Viren Tyagi, Mike Pols, Geert Brocks, Shuxia Tao
2176
89216
3
DFT-PBE+D3(BJ)
VASP
8
VASP machine-learning-force-field (ML_AB) training set for the CsPbI3 halide perovskite, covering migration of a positively charged iodide vacancy. This is one of seven sister datasets from the same publication, each providing a VASP ML_AB training set for a slightly different characteristic of the CsPbI3 perovskite (the cubic-tetragonal phase transition and the migration of iodide vacancies and interstitials in neutral, positively charged, and negatively charged states), all generated to train machine-learned force fields for ion migration. Each configuration carries the total energy, atomic forces, and stress from VASP single-point reference calculations using the PBE functional with Grimme D3 dispersion and Becke-Johnson damping (PBE-D3-BJ), on 2x2x2 cubic supercells (~40 atoms). (Plane-wave cutoff and k-point details are reported only in the paper's Supporting Information.)
Cs, I, Pb
Viren Tyagi, Mike Pols, Geert Brocks, Shuxia Tao
1827
71253
3
DFT-PBE+D3(BJ)
VASP
8
VASP machine-learning-force-field (ML_AB) training set for the CsPbI3 halide perovskite, covering migration of a neutral iodide vacancy. This is one of seven sister datasets from the same publication, each providing a VASP ML_AB training set for a slightly different characteristic of the CsPbI3 perovskite (the cubic-tetragonal phase transition and the migration of iodide vacancies and interstitials in neutral, positively charged, and negatively charged states), all generated to train machine-learned force fields for ion migration. Each configuration carries the total energy, atomic forces, and stress from VASP single-point reference calculations using the PBE functional with Grimme D3 dispersion and Becke-Johnson damping (PBE-D3-BJ), on 2x2x2 cubic supercells (~40 atoms). (Plane-wave cutoff and k-point details are reported only in the paper's Supporting Information.)
Cs, I, Pb
Viren Tyagi, Mike Pols, Geert Brocks, Shuxia Tao
2715
105885
3
DFT-PBE+D3(BJ)
VASP
7
VASP machine-learning-force-field (ML_AB) training set for the CsPbI3 halide perovskite, covering migration of a negatively charged iodide interstitial. This is one of seven sister datasets from the same publication, each providing a VASP ML_AB training set for a slightly different characteristic of the CsPbI3 perovskite (the cubic-tetragonal phase transition and the migration of iodide vacancies and interstitials in neutral, positively charged, and negatively charged states), all generated to train machine-learned force fields for ion migration. Each configuration carries the total energy, atomic forces, and stress from VASP single-point reference calculations using the PBE functional with Grimme D3 dispersion and Becke-Johnson damping (PBE-D3-BJ), on 2x2x2 cubic supercells (~40 atoms). (Plane-wave cutoff and k-point details are reported only in the paper's Supporting Information.)
Cs, I, Pb
Viren Tyagi, Mike Pols, Geert Brocks, Shuxia Tao
2217
90897
3
DFT-PBE+D3(BJ)
VASP
7
VASP machine-learning-force-field (ML_AB) training set for the CsPbI3 halide perovskite, covering migration of a negatively charged iodide vacancy. This is one of seven sister datasets from the same publication, each providing a VASP ML_AB training set for a slightly different characteristic of the CsPbI3 perovskite (the cubic-tetragonal phase transition and the migration of iodide vacancies and interstitials in neutral, positively charged, and negatively charged states), all generated to train machine-learned force fields for ion migration. Each configuration carries the total energy, atomic forces, and stress from VASP single-point reference calculations using the PBE functional with Grimme D3 dispersion and Becke-Johnson damping (PBE-D3-BJ), on 2x2x2 cubic supercells (~40 atoms). (Plane-wave cutoff and k-point details are reported only in the paper's Supporting Information.)
Cs, I, Pb
Viren Tyagi, Mike Pols, Geert Brocks, Shuxia Tao
2371
114129
3
DFT-PBE+D3(BJ)
VASP
7
Spin-polarized density functional theory structural relaxations probing hydrogen-induced lattice strain in a chemically complex Fe-based (bcc) hybrid steel. A 55-atom supercell of composition VMoCrMnFe47NiAlSiC is relaxed with 0, 1, 2, and 5 hydrogen atoms inserted into interstitial sites; all ionic steps of each cell-and-ion optimization are included, yielding configurations with total energies, atomic forces, and stresses. Calculations used VASP 6.4.2 (within the MedeA environment) with the GGA-PBE functional, a 400 eV plane-wave cutoff, spin polarization (ISPIN=2), Methfessel-Paxton smearing (SIGMA=0.2 eV), a 2x2x2 Gamma-centered k-point mesh, and full relaxation of lattice vectors and atomic positions (IBRION=2, ISIF=3, EDIFFG=-0.02 eV/Angstrom). Configuration sets group the relaxations by hydrogen content.
Al, C, Cr, Fe, H, Mn, Mo, Ni, Si, V
Ammar Aksoy, Cem Örnek, Beste Payam, Bilgehan M. Şeşen, Çağatay Yelkarası, Steve Ooi
214
12324
10
DFT-PBE
VASP 6.4.2
7
DFT-optimized structures and total energies of graphene interacting with urea and water molecules, supporting a combined experimental and first-principles study of a graphene field-effect-transistor (FET) sensor for the detection of urea in water. The configurations span graphene with one or more urea/water molecules in various adsorption geometries. Calculations used VASP with the optB86b-vdW exchange-correlation functional (GGA=MK, LUSE_VDW, PARAM1=0.1234, PARAM2=1.0), a 900 eV plane-wave cutoff, Gaussian smearing (SIGMA=0.01 eV), and a 3x3x1 Monkhorst-Pack k-point mesh. The relaxed geometries (CONTCAR) were not archived, so each input geometry (POSCAR) is paired with the energy of the first ionic step from OSZICAR - the single-point energy of that geometry.
C, H, N, O
Ondřej Špaček, Linda Supalová, Jindřich Mach, David Nezval, Tomáš Šikola, Miroslav Bartošík
13
1764
4
DFT-optB86b-vdW
VASP
6
OC20_S2EF_train_all is the ~63 million structure full training set of the OC20 Structure to Energy and Forces (S2EF) dataset. Features include energy, atomic forces and data from the OC20 mappings file, including adsorbate id, materials project bulk id and miller index.
10.60732/a9baab35
Ag, Al, As, Au, B, Bi, C, Ca, Cd, Cl, Co, Cr, Cs, Cu, Fe, Ga, Ge, H, Hf, Hg, In, Ir, K, Mn, Mo, N, Na, Nb, Ni, O, Os, P, Pb, Pd, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Tc, Te, Ti, Tl, V, W, Y, Zn, Zr
Lowik Chanussot, Abhishek Das, Siddharth Goyal, Thibaut Lavril, Muhammed Shuaibi, Morgane Riviere, Kevin Tran, Javier Heras-Domingo, Caleb Ho, Weihua Hu, Aini Palizhati, Anuroop Sriram, Brandon Wood, Junwoong Yoon, Devi Parikh, C. Lawrence Zitnick, Zachary Ulissi
133934018
9810895377
56
DFT-rPBE
VASP
3
The test set of OMol25. OMol25 (Open Molecules 2025) is a large dataset of structures with up to 350 atoms, calculated at a high level of DFT theory (ωB97M-V/def2-TZVPD). This dataset is intended to provide a broad sampling of chemical complexity and structural diversity. OMol2 includes biomolecules, metal complexes, electrolytes, and community datasets that have been recalculated at this higher level of theory. Included community datasets are: ANI-2X, Transition-1X, ANI-1xBB, OrbNet Denali, SPICE2, and Solvated Protein Fragments. OMol25 also includes 30% of the GEOM dataset, with these systems optimized and a fraction of these having their initial positions randomly perturbed.
Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, O, Os, P, Pb, Pd, Pm, Pr, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Ti, Tl, Tm, V, W, Xe, Y, Yb, Zn, Zr
Daniel S. Levine, Muhammed Shuaibi, Evan Walter Clark Spotte-Smith, Michael G. Taylor, Muhammad R. Hasyim, Kyle Michel, Ilyes Batatia, Gábor Csányi, Misko Dzamba, Peter Eastman, Nathan C. Frey, Xiang Fu, Vahe Gharakhanyan, Aditi S. Krishnapriyan, Joshua A. Rackers, Sanjeev Raja, Ammar Rizvi, Andrew S. Rosen, Zachary Ulissi, Santiago Vargas, C. Lawrence Zitnick, Samuel M. Blau, Brandon M. Wood
2766167
342021649
83
DFT-ωB97M-V
ORCA 6.0.0
3