| Name | Description | DOI | Elements | Authors | # Configurations | # Atoms | # Elements | Methods | Software | Downloads | Source Data | Source Pub. | Other Links |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
The Alexandria Materials Database contains theoretical crystal structures in 1D, 2D and 3D discovered by machine learning approaches using DFT with PBE, PBEsol and SCAN methods. This dataset represents the geometry optimization paths for 3D crystal structures from Alexandria calculated using PBE methods. |
10.60732/c88da7df |
Ac, Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr |
Jonathan Schmidt, Noah Hoffmann, Hai-Chen Wang, Pedro Borlido, Pedro J. M. A. Carriço, Tiago F. T. Cerqueira, Silvana Botti, Miguel A. L. Marques |
106825218 |
1313552132 |
89 |
DFT-PBE |
VASP |
101401 |
||||
The Alexandria Materials Database contains theoretical crystal structures in 1D, 2D and 3D discovered by machine learning approaches using DFT with PBE, PBEsol and SCAN methods. This dataset represents the geometry optimization paths for 2D crystal structures from Alexandria calculated using PBE methods. |
10.60732/8781419f |
Ac, Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, Hf, Hg, Ho, I, In, Ir, K, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Y, Yb, Zn, Zr |
Jonathan Schmidt, Noah Hoffmann, Hai-Chen Wang, Pedro Borlido, Pedro J. M. A. Carriço, Tiago F. T. Cerqueira, Silvana Botti, Miguel A. L. Marques |
11742482 |
118265549 |
84 |
DFT-PBE |
VASP |
3709 |
||||
The full-size training set from OMol25. From the dataset creator: OMol25 represents the largest high quality molecular DFT dataset spanning biomolecules, metal complexes, electrolytes, and community datasets. OMol25 was generated at the ω B97M-V/def2-TZVPD level of theory. |
10.60732/41666b82 |
Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, O, Os, P, Pb, Pd, Pm, Pr, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Ti, Tl, Tm, V, W, Xe, Y, Yb, Zn, Zr |
Daniel S. Levine, Muhammed Shuaibi, Evan Walter Clark Spotte-Smith, Michael G. Taylor, Muhammad R. Hasyim, Kyle Michel, Ilyes Batatia, Gábor Csányi, Misko Dzamba, Peter Eastman, Nathan C. Frey, Xiang Fu, Vahe Gharakhanyan, Aditi S. Krishnapriyan, Joshua A. Rackers, Sanjeev Raja, Ammar Rizvi, Andrew S. Rosen, Zachary Ulissi, Santiago Vargas, C. Lawrence Zitnick, Samuel M. Blau, Brandon M. Wood |
101666280 |
5237539207 |
83 |
DFT-ωB97M-V |
ORCA |
2552 |
||||
This is the filtered training split of ODAC25. ODAC25 is a large-scale DFT dataset intended to advance the computational screening of Metal-Organic Framework (MOF) sorbents for direct air capture (DAC) of atmospheric CO2 from humid air. Spanning ~15,000 MOFs, including experimental, defective, synthetic, and amine-functionalized frameworks, the dataset comprises nearly 60 million single-point calculations covering four adsorbates: CO2, H2O, N2, and O2. All calculations were performed with VASP 6.3 using the PBE functional augmented with D3 dispersion corrections (Becke-Johnson damping). Spin-polarized calculations (ISPIN=2) were used throughout. Relative to ODAC23, ODAC25 adds two new adsorbates (N2 and O2), functionalized MOF variants, improved k-point convergence, and re-relaxations of bare MOF cells. Three configuration sets are provided: mof_plus_adsorbate (full DFT relaxations of adsorbate-loaded MOFs), mof (re-relaxations of empty frameworks), and gcmc (DFT single points derived from Grand Canonical Monte Carlo simulations). Structures identified as problematic by Jin et al. (2025) have been excluded (see https://zenodo.org/records/14802658). |
Ag, Al, Au, B, Ba, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, H, Hf, Hg, Ho, I, In, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, Np, O, P, Pd, Pr, Pt, Re, Ru, S, Sc, Se, Si, Sm, Sr, Tb, Ti, Tm, U, V, W, Y, Zn, Zr |
Anuroop Sriram, Logan M. Brabson, Xiaohan Yu, Sihoon Choi, Kareem Abdelmaqsoud, Elias Moubarak, Pim de Haan, Sindy Löwe, Johann Brehmer, John R. Kitchin, Max Welling, C. Lawrence Zitnick, Zachary Ulissi, Andrew J. Medford, David S. Sholl |
36058136 |
8523144395 |
62 |
DFT-PBE+D3 |
VASP 6.3 |
1142 |
|||||
The training split of sAlex. sAlex is a subsample of the Alexandria dataset that was used to fine tune the OMat24 (Open Materials 2024) models. From the site: sAlex was created by removing structures matched in WBM and only sampling structure along a trajectory with an energy difference greater than 10 meV/atom. |
10.60732/efbb7935 |
Ac, Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr |
Luis Barroso-Luque, Muhammed Shuaibi, Xiang Fu, Brandon M. Wood, Misko Dzamba, Meng Gao, Ammar Rizvi, C. Lawrence Zitnick, Zachary W. Ulissi |
10345613 |
106888622 |
89 |
DFT-PBE+U |
VASP |
1075 |
||||
Training configurations for the initial structure to relaxed total energy (IS2RE) task of OC22. Open Catalyst 2022 (OC22) is a database of training trajectories for predicting catalytic reactions on oxide surfaces meant to complement OC20, which did not contain oxide surfaces. |
10.60732/722bcab6 |
Ag, Al, As, Au, Ba, Be, Bi, C, Ca, Cd, Ce, Co, Cr, Cs, Cu, Fe, Ga, Ge, H, Hf, Hg, In, Ir, K, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Ni, O, Os, Pb, Pd, Pt, Rb, Re, Rh, Ru, Sb, Sc, Se, Si, Sn, Sr, Ta, Te, Ti, Tl, V, W, Y, Zn, Zr |
Richard Tran, Janice Lan, Muhammed Shuaibi, Brandon M. Wood, Siddharth Goyal, Abhishek Das, Javier Heras-Domingo, Adeesh Kolluru, Ammar Rizvi, Nima Shoghi, Anuroop Sriram, Felix Therrien, Jehad Abed, Oleksandr Voznyy, Edward H. Sargent, Zachary Ulissi, C. Lawrence Zitnick |
7861269 |
633950726 |
57 |
DFT-PBE+U |
VASP |
771 |
||||
This DFT dataset is curated in response to the growing interest in property-guided molecule genaration using generative AI models. Typically, the properties of generated molecules are evaluated using machine learning (ML) property predictors trained on fully relaxed dataset. However, since generated molecules may deviate significantly from relaxed structures, these predictors can be highly unreliable for assessing their quality. This data provides DFT-evaluated properties, energy and forces for generated molecules. These structures are unrelaxed and can serve as a validation set for machine learning property predictors used in conditional molecule generation. It includes 10,773 molecules generated using PropMolFlow, a state-of-the-art conditional molecule generation model. PropMolFlow employs a flow matching process parameterized with an SE(3)-equivariant graph neural network. PropMolFlow models are trained on QM9 dataset. Molecules are generated by conditioning on six properties---polarizibility, gap, HOMO, LUMO, dipole moment and heat capacity at room temperature 298K---across two tasks: in-distribution and out-of-distribution generation. Full details are available in the corresponding paper. |
10.60732/1f7cae3c |
C, F, H, N, O |
Cheng Zeng, Jirui Jin, George Karypis, Mark Transtrum, Ellad B. Tadmor, Richard G. Hennig, Adrian Roitberg, Stefano Martiniani, Mingjie Liu |
10773 |
205304 |
5 |
DFT-B3LYP |
Gaussian 16 |
715 |
||||
The Train 4M set from OMol25 (~4 million structure training subset). From the dataset creator: OMol25 represents the largest high quality molecular DFT dataset spanning biomolecules, metal complexes, electrolytes, and community datasets. OMol25 was generated at the ω B97M-V/def2-TZVPD level of theory. |
10.60732/b6f9382a |
Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, O, Os, P, Pb, Pd, Pm, Pr, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Ti, Tl, Tm, V, W, Xe, Y, Yb, Zn, Zr |
Daniel S. Levine, Muhammed Shuaibi, Evan Walter Clark Spotte-Smith, Michael G. Taylor, Muhammad R. Hasyim, Kyle Michel, Ilyes Batatia, Gábor Csányi, Misko Dzamba, Peter Eastman, Nathan C. Frey, Xiang Fu, Vahe Gharakhanyan, Aditi S. Krishnapriyan, Joshua A. Rackers, Sanjeev Raja, Ammar Rizvi, Andrew S. Rosen, Zachary Ulissi, Santiago Vargas, C. Lawrence Zitnick, Samuel M. Blau, Brandon M. Wood |
3986754 |
218680957 |
83 |
DFT-ωB97M-V |
ORCA |
705 |
||||
OC20_S2EF_train_20M is the 20 million structure training subset of the OC20 Structure to Energy and Forces dataset. Features include potential energy, free energy and atomic forces. Data from the OC20 mappings file, including adsorbate id, materials project bulk id, miller index, shift and others. |
10.60732/9f03e9be |
Ag, Al, As, Au, B, Bi, C, Ca, Cd, Cl, Co, Cr, Cs, Cu, Fe, Ga, Ge, H, Hf, Hg, In, Ir, K, Mn, Mo, N, Na, Nb, Ni, O, Os, P, Pb, Pd, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Tc, Te, Ti, Tl, V, W, Y, Zn, Zr |
Lowik Chanussot, Abhishek Das, Siddharth Goyal, Thibaut Lavril, Muhammed Shuaibi, Morgane Riviere, Kevin Tran, Javier Heras-Domingo, Caleb Ho, Weihua Hu, Aini Palizhati, Anuroop Sriram, Brandon Wood, Junwoong Yoon, Devi Parikh, C. Lawrence Zitnick, Zachary Ulissi |
20000000 |
1465265878 |
56 |
DFT-rPBE |
VASP |
693 |
||||
Training set of the Open Polymers 2026 (OPoly26) dataset. OPoly26 contains over 6.57 million density functional theory (DFT) calculations on cluster fragments of up to 360 atoms derived from polymeric systems, comprising over 1.2 billion total atoms. The dataset encompasses variations in monomer composition, polymerization degree, chain architectures, and solvation environments to improve machine learning model performance for polymer property prediction. Calculations were performed at the B97M-V/def2-SVP level of theory using ORCA. |
Al, B, Br, C, Ca, Cl, Co, Cs, Cu, F, Fe, H, I, K, La, Li, Mg, N, Na, Ni, O, P, S, Sr, Zn |
Daniel S. Levine, Nicholas Liesen, Lauren Chua, James Diffenderfer, Helgi I. Ingolfsson, Matthew P. Kroonblawd, Nitesh Kumar, Amitesh Maiti, Supun S. Mohottalalage, Muhammed Shuaibi, Brian Van Essen, Brandon M. Wood, C. Lawrence Zitnick, Samuel M. Blau, Evan R. Antoniuk |
6104876 |
1125111811 |
25 |
DFT-ωB97M-V |
ORCA |
619 |
|||||
Training configurations for the structure to total energy and forces task (S2EF) of OC22. Open Catalyst 2022 (OC22) is a database of training trajectories for predicting catalytic reactions on oxide surfaces meant to complement OC20, which did not contain oxide surfaces. |
10.60732/68160e50 |
Ag, Al, As, Au, Ba, Be, Bi, C, Ca, Cd, Ce, Co, Cr, Cs, Cu, Fe, Ga, Ge, H, Hf, Hg, In, Ir, K, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Ni, O, Os, Pb, Pd, Pt, Rb, Re, Rh, Ru, Sb, Sc, Se, Si, Sn, Sr, Ta, Te, Ti, Tl, V, W, Y, Zn, Zr |
Richard Tran, Janice Lan, Muhammed Shuaibi, Brandon M. Wood, Siddharth Goyal, Abhishek Das, Javier Heras-Domingo, Adeesh Kolluru, Ammar Rizvi, Nima Shoghi, Anuroop Sriram, Felix Therrien, Jehad Abed, Oleksandr Voznyy, Edward H. Sargent, Zachary Ulissi, C. Lawrence Zitnick |
8356688 |
668033119 |
57 |
DFT-PBE+U |
VASP |
591 |
||||
The Train neutral set from OMol25. From the dataset creator: OMol25 represents the largest high quality molecular DFT dataset spanning biomolecules, metal complexes, electrolytes, and community datasets. OMol25 was generated at the ω B97M-V/def2-TZVPD level of theory. |
10.60732/3c2ddc75 |
B, Br, C, Ca, Cl, F, H, I, K, Li, Mg, N, Na, O, P, S, Si |
Daniel S. Levine, Muhammed Shuaibi, Evan Walter Clark Spotte-Smith, Michael G. Taylor, Muhammad R. Hasyim, Kyle Michel, Ilyes Batatia, Gábor Csányi, Misko Dzamba, Peter Eastman, Nathan C. Frey, Xiang Fu, Vahe Gharakhanyan, Aditi S. Krishnapriyan, Joshua A. Rackers, Sanjeev Raja, Ammar Rizvi, Andrew S. Rosen, Zachary Ulissi, Santiago Vargas, C. Lawrence Zitnick, Samuel M. Blau, Brandon M. Wood |
34335828 |
929562799 |
17 |
DFT-ωB97M-V |
ORCA |
548 |
||||
This is the full (unfiltered) training split of ODAC25. ODAC25 is a large-scale DFT dataset intended to advance the computational screening of Metal-Organic Framework (MOF) sorbents for direct air capture (DAC) of atmospheric CO2 from humid air. Spanning ~15,000 MOFs, including experimental, defective, synthetic, and amine-functionalized frameworks, the dataset comprises nearly 60 million single-point calculations covering four adsorbates: CO2, H2O, N2, and O2. All calculations were performed with VASP 6.3 using the PBE functional augmented with D3 dispersion corrections (Becke-Johnson damping). Spin-polarized calculations (ISPIN=2) were used throughout. Relative to ODAC23, ODAC25 adds two new adsorbates (N2 and O2), functionalized MOF variants, improved k-point convergence, and re-relaxations of bare MOF cells. Three configuration sets are provided: mof_plus_adsorbate (full DFT relaxations of adsorbate-loaded MOFs), mof (re-relaxations of empty frameworks), and gcmc (DFT single points derived from Grand Canonical Monte Carlo simulations). |
Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, Hf, Hg, Ho, I, In, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, Np, O, P, Pd, Pr, Pt, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Tb, Te, Th, Ti, Tm, U, V, W, Y, Zn, Zr |
Anuroop Sriram, Logan M. Brabson, Xiaohan Yu, Sihoon Choi, Kareem Abdelmaqsoud, Elias Moubarak, Pim de Haan, Sindy Löwe, Johann Brehmer, John R. Kitchin, Max Welling, C. Lawrence Zitnick, Zachary Ulissi, Andrew J. Medford, David S. Sholl |
54620287 |
12829845961 |
71 |
DFT-PBE+D3 |
VASP 6.3 |
533 |
|||||
Out-of-domain validation configurations for the initial structure to relaxed total energy (IS2RE) task of OC22. Open Catalyst 2022 (OC22) is a database of training trajectories for predicting catalytic reactions on oxide surfaces meant to complement OC20, which did not contain oxide surfaces. |
10.60732/15fa94f2 |
Au, Ba, Be, Bi, C, Ca, Cd, Ce, Co, Cr, Cs, Cu, Fe, Ga, Ge, H, Hf, Hg, In, Ir, K, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Ni, O, Os, Pb, Pd, Pt, Rb, Re, Rh, Ru, Sb, Sc, Se, Si, Sn, Sr, Ta, Ti, Tl, V, W, Zn, Zr |
Richard Tran, Janice Lan, Muhammed Shuaibi, Brandon M. Wood, Siddharth Goyal, Abhishek Das, Javier Heras-Domingo, Adeesh Kolluru, Ammar Rizvi, Nima Shoghi, Anuroop Sriram, Felix Therrien, Jehad Abed, Oleksandr Voznyy, Edward H. Sargent, Zachary Ulissi, C. Lawrence Zitnick |
520744 |
42168125 |
52 |
DFT-PBE+U |
VASP |
529 |
||||
Configurations from the Materials Project database: an online resource with the goal of computing properties of all inorganic materials. |
10.60732/4bf2e346 |
Ac, Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr |
Anubhav Jain, Shyue Ping Ong, Geoffroy Hautier, Wei Chen, William Davidson Richards, Stephen Dacek, Shreyas Cholia, Dan Gunter, David Skinner, Gerbrand Ceder, Kristin A. Persson |
6125462 |
194446050 |
89 |
DFT-R2SCAN, DFT-PBEsol, DFT-SCAN, DFT-GGA+U, DFT-GGA |
VASP |
529 |
||||
MP-ALOE is a dataset of nearly 1 million DFT calculations computed with the r2SCAN meta-generalized gradient approximation, covering 89 elements. The dataset was constructed using active learning via Query by Committee (QBC) and downsampling via the DIRECT method, and primarily consists of off-equilibrium structures. Initial structures were generated by elemental substitution into prototype structures from the ICSD and Materials Project databases (restricted to 2-8 atoms and up to ternary compositions). QBC used an ensemble of interatomic potentials (initially MACE-MP-0, CHGNet, and M3GNet, followed by iteratively trained MACE models) to select structures with energy uncertainty exceeding 100 meV/atom, force uncertainty exceeding 100 meV/Å, or stress uncertainty exceeding 100 meV/ų. DIRECT downsampling reduced approximately 500,000 selected structures to approximately 125,000 for DFT calculation. Near-equilibrium structures from the Materials Project (up to 3 elements, up to 32 atoms, approximately 30,000 structures) were recalculated with identical DFT settings. A two-stage VASP workflow was applied: an initial static calculation using PBE, followed by r2SCAN relaxation for three ionic steps. In total, 909,792 frames from 303,264 structure relaxations are included. DFT calculations used projector-augmented wave (PAW) potentials, a 680 eV plane-wave cutoff, and KSPACING=0.2, with additional parameters from the MP24RelaxSet in pymatgen. Calculations were managed by the atomate2 workflow package. |
Ac, Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr |
Matthew C. Kuner, Aaron D. Kaplan, Kristin A. Persson, Mark Asta, Daryl C. Chrzan |
909789 |
5891262 |
89 |
DFT-R2SCAN |
VASP |
499 |
|||||
The QE-TB dataset is part of the joint automated repository for various integrated simulations (JARVIS) DFT database. This subset contains configurations generated in Quantum ESPRESSO. JARVIS is a set of tools and datasets built to meet current materials design challenges. |
10.60732/9e9e5b29 |
Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Cl, Co, Cr, Cs, Cu, F, Fe, Ga, Ge, H, Hf, Hg, I, In, Ir, K, La, Li, Mg, Mn, Mo, N, Na, Nb, Ni, O, Os, P, Pb, Pd, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Tc, Te, Ti, Tl, V, W, Y, Zn, Zr |
Kevin F. Garrity, Kamal Choudhary |
829576 |
2578920 |
64 |
DFT-PBEsol |
Quantum ESPRESSO |
492 |
||||
A dataset of 10 molecules (aspirin, azobenzene, benzene, ethanol, malonaldehyde, naphthalene, paracetamol, salicylic, toluene, uracil) with 100,000 structures calculated for each at the PBE/def2-SVP level of theory using ORCA. Based on the MD17 dataset, but with refined measurements. |
10.60732/682fe04a |
C, H, N, O |
Anders S. Christensen, O. Anatole von Lilienfeld |
999906 |
15598381 |
4 |
DFT-PBE |
ORCA 4.0.1 |
391 |
||||
OC20_IS2RES_val_id is the in-domain validation set for the OC20 Initial Structure to Relaxed Structure (IS2RS) and Initial Structure to Relaxed Energy (IS2RE) tasks. Features include energy, atomic forces and data from the OC20 mappings file, including adsorbate id, materials project bulk id and miller index. |
10.60732/b4005525 |
Ag, Al, As, Au, B, Bi, C, Ca, Cd, Cl, Co, Cr, Cs, Cu, Fe, Ga, Ge, H, Hf, Hg, In, Ir, K, Mn, Mo, N, Na, Nb, Ni, O, Os, P, Pb, Pd, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Tc, Te, Ti, Tl, V, W, Y, Zn, Zr |
Lowik Chanussot, Abhishek Das, Siddharth Goyal, Thibaut Lavril, Muhammed Shuaibi, Morgane Riviere, Kevin Tran, Javier Heras-Domingo, Caleb Ho, Weihua Hu, Aini Palizhati, Anuroop Sriram, Brandon Wood, Junwoong Yoon, Devi Parikh, C. Lawrence Zitnick, Zachary Ulissi |
5024223 |
406465318 |
56 |
DFT-rPBE |
VASP |
379 |
||||
Matbench v0.1 test dataset for predicting DFT formation energy from structure. Adapted from Materials Project database. Entries having formation energy more than 2.5eV and those containing noble gases are removed. Retrieved April 2, 2019. For benchmarking w/ nested cross validation, the order of the dataset must be identical to the retrieved data; refer to the Automatminer/Matbench publication for more details.Matbench is an automated leaderboard for benchmarking state of the art ML algorithms predicting a diverse range of solid materials' properties. It is hosted and maintained by the Materials Project. |
10.60732/3cef7b09 |
Ac, Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, Hf, Hg, Ho, I, In, Ir, K, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Y, Yb, Zn, Zr |
Alexander Dunn, Qi Wang, Alex Ganose, Daniel Dopp, Anubhav Jain |
132741 |
3869238 |
84 |
DFT-undefined |
VASP |
323 |
||||
The Matbench_mp_gap dataset is a Matbench v0.1 test dataset for predicting DFT PBE band gap from structure, adapted from the Materials Project database. Entries having a formation energy (or energy above the convex hull) greater than 150meV and those containing noble gases have been removed. Retrieved April 2, 2019. Refer to the Automatminer/Matbench publication for more details. This dataset contains band gap as calculated by PBE DFT from the Materials Project, in eV. Matbench is an automated leaderboard for benchmarking state of the art ML algorithms predicting a diverse range of solid materials' properties. It is hosted and maintained by the Materials Project. |
10.60732/fb4d895d |
Ac, Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, Hf, Hg, Ho, I, In, Ir, K, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Y, Yb, Zn, Zr |
Alexander Dunn, Qi Wang, Alex Ganose, Daniel Dopp, Anubhav Jain |
106105 |
3184639 |
84 |
DFT-PBE |
VASP |
311 |
||||
The DFT_3D_12_12_2022 dataset is part of the joint automated repository for various integrated simulations (JARVIS) DFT database. This subset contains configurations of 3D materials. JARVIS is a set of tools and datasets built to meet current materials design challenges. |
10.60732/e9e65ccd |
Ac, Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr |
Kamal Choudhary, Kevin F. Garrity, Andrew C. E. Reid, Brian DeCost, Adam J. Biacchi, Angela R. Hight Walker, Zachary Trautt, Jason Hattrick-Simpers, A. Gilad Kusne, Andrea Centrone, Albert Davydov, Jie Jiang, Ruth Pachter, Gowoon Cheon, Evan Reed, Ankit Agrawal, Xiaofeng Qian, Vinit Sharma, Houlong Zhuang, Sergei V. Kalinin, Bobby G. Sumpter, Ghanshyam Pilania, Pinar Acar, Subhasish Mandal, Kristjan Haule, David Vanderbilt, Karin Rabe, Francesca Tavazza |
66617 |
683506 |
89 |
DFT-optB88-vdW, DFT-TBmBJ |
VASP |
298 |
||||
OC20_IS2RES_ood_ads is the out-of-domain validation set for the OC20 Initial Structure to Relaxed Structure (IS2RS) and Initial Structure to Relaxed Energy (IS2RE) tasks with unseen adsorbates. Features include energy, atomic forces and data from the OC20 mappings file, including adsorbate id, materials project bulk id and miller index. |
10.60732/0947596b |
Ag, Al, As, Au, B, Bi, C, Ca, Cd, Cl, Co, Cr, Cs, Cu, Fe, Ga, Ge, H, Hf, Hg, In, Ir, K, Mn, Mo, N, Na, Nb, Ni, O, Os, P, Pb, Pd, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Tc, Te, Ti, Tl, V, W, Y, Zn, Zr |
Lowik Chanussot, Abhishek Das, Siddharth Goyal, Thibaut Lavril, Muhammed Shuaibi, Morgane Riviere, Kevin Tran, Javier Heras-Domingo, Caleb Ho, Weihua Hu, Aini Palizhati, Anuroop Sriram, Brandon Wood, Junwoong Yoon, Devi Parikh, C. Lawrence Zitnick, Zachary Ulissi |
4883196 |
390308139 |
56 |
DFT-rPBE |
VASP |
297 |
||||
Configurations of Mo from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K. |
10.60732/b6ece7fd |
Mo |
Christopher M. Andolina, Wissam A. Saidi |
3663 |
66220 |
1 |
DFT-PBE |
VASP |
288 |
||||
OC20_S2EF_train_2M is the 2 million structure training subset of the OC20 Structure to Energy and Forces dataset. Features include potential energy, free energy and atomic forces. Data from the OC20 mappings file, including adsorbate id, materials project bulk id, miller index, shift and others. |
10.60732/672cc613 |
Ag, Al, As, Au, B, Bi, C, Ca, Cd, Cl, Co, Cr, Cs, Cu, Fe, Ga, Ge, H, Hf, Hg, In, Ir, K, Mn, Mo, N, Na, Nb, Ni, O, Os, P, Pb, Pd, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Tc, Te, Ti, Tl, V, W, Y, Zn, Zr |
Lowik Chanussot, Abhishek Das, Siddharth Goyal, Thibaut Lavril, Muhammed Shuaibi, Morgane Riviere, Kevin Tran, Javier Heras-Domingo, Caleb Ho, Weihua Hu, Aini Palizhati, Anuroop Sriram, Brandon Wood, Junwoong Yoon, Devi Parikh, C. Lawrence Zitnick, Zachary Ulissi |
2000000 |
146496199 |
56 |
DFT-rPBE |
VASP |
284 |
||||
The training split of the Open Catalyst 2025 (OC25) dataset for solid-liquid interfaces. OC25 consists of single-point DFT calculations of catalyst/solvent/ion/adsorbate structures, covering 88 elements, 8 solvents (water, methanol, CCl4, DMSO, benzene, hexane, THF, diethyl ether), 9 ionic species (Cs+, OH-, Li+, SO4^2-, Ca^2+, [Me4N]+, HCO3-, H+, F-), and adsorbates from the OC20 set plus reactive intermediates. Surfaces are derived from 39,821 Materials Project bulk structures with miller indices <= 3. Structures are highly off-equilibrium, sampled from short ab initio molecular dynamics simulations (10-50 steps, 1000K, NVT) or short DFT relaxations (5 ionic steps). The training split contains ~7.4 million structures filtered to total force drift < 1 eV/Å. All DFT calculations used VASP 6.3.2 with the non-spin-polarized RPBE functional supplemented with D3 dispersion correction (zero damping), plane wave cutoff 400 eV, EDIFF=1e-4 eV, k-point reciprocal density of 40, and a dipole correction in the z-direction. |
Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, F, Fe, Ga, Ge, H, He, Hf, Hg, I, In, Ir, K, Kr, La, Li, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, O, Os, P, Pb, Pd, Pm, Pr, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Tc, Te, Ti, Tl, V, W, Xe, Y, Zn, Zr |
Sushree Jagriti Sahoo, Mikael Maroschin, Daniel S. Levine, Zachary Ulissi, C. Lawrence Zitnick, Joel B Varley, Joseph A. Gauthier, Nitish Govindarajan, Muhammed Shuaibi |
7395509 |
1068208517 |
73 |
DFT-rPBE+D3 |
VASP 6.3.2 |
284 |
|||||
OC20_IS2RES_val_ood_cat is the out-of-domain validation set for the OC20 Initial Structure to Relaxed Structure (IS2RS) and Initial Structure to Relaxed Energy (IS2RE) tasks with unseen catalyst composition. Features include energy, atomic forces and data from the OC20 mappings file, including adsorbate id, materials project bulk id and miller index. |
10.60732/3c47e0d4 |
Ag, Al, As, Au, Bi, C, Ca, Cd, Cl, Co, Cr, Cs, Cu, Fe, Ga, Ge, H, Hf, Hg, In, Ir, K, Mn, Mo, N, Na, Nb, Ni, O, Os, P, Pb, Pd, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Tc, Te, Ti, Tl, V, W, Y, Zn, Zr |
Lowik Chanussot, Abhishek Das, Siddharth Goyal, Thibaut Lavril, Muhammed Shuaibi, Morgane Riviere, Kevin Tran, Javier Heras-Domingo, Caleb Ho, Weihua Hu, Aini Palizhati, Anuroop Sriram, Brandon Wood, Junwoong Yoon, Devi Parikh, C. Lawrence Zitnick, Zachary Ulissi |
5151015 |
411767380 |
55 |
DFT-rPBE |
VASP |
279 |
||||
The Alexandria Materials Database contains theoretical crystal structures in 1D, 2D and 3D discovered by machine learning approaches using DFT with PBE, PBEsol and SCAN methods. This dataset represents the geometry optimization paths for 1D crystal structures from Alexandria calculated using PBE methods. |
10.60732/12246d46 |
Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, Hf, Hg, Ho, I, In, Ir, K, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, O, Os, P, Pb, Pd, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Te, Ti, Tl, Tm, V, W, Y, Yb, Zn, Zr |
Jonathan Schmidt, Noah Hoffmann, Hai-Chen Wang, Pedro Borlido, Pedro J. M. A. Carriço, Tiago F. T. Cerqueira, Silvana Botti, Miguel A. L. Marques |
614833 |
6062475 |
74 |
DFT-PBE |
VASP |
265 |
||||
OC20_IS2RES_ood_ads is the out-of-domain validation set for the OC20 Initial Structure to Relaxed Structure (IS2RS) and Initial Structure to Relaxed Energy (IS2RE) tasks with unseen adsorbates. Features include energy, atomic forces and data from the OC20 mappings file, including adsorbate id, materials project bulk id and miller index. |
10.60732/b8c9473b |
Ag, Al, As, Au, Bi, C, Ca, Cd, Cl, Co, Cr, Cs, Cu, Fe, Ga, Ge, H, Hf, Hg, In, Ir, K, Mn, Mo, N, Na, Nb, Ni, O, Os, P, Pb, Pd, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Tc, Te, Ti, Tl, V, W, Y, Zn, Zr |
Lowik Chanussot, Abhishek Das, Siddharth Goyal, Thibaut Lavril, Muhammed Shuaibi, Morgane Riviere, Kevin Tran, Javier Heras-Domingo, Caleb Ho, Weihua Hu, Aini Palizhati, Anuroop Sriram, Brandon Wood, Junwoong Yoon, Devi Parikh, C. Lawrence Zitnick, Zachary Ulissi |
3665193 |
308297930 |
55 |
DFT-rPBE |
VASP |
257 |
||||
ANI-2x-B973c-def2mTZVP is a portion of the ANI-2x dataset, which includes DFT-calculated energies for structures from 2 to 63 atoms in size containing H, C, N, O, S, F, and Cl. This portion of ANI-2x was calculated in ORCA at the B973c level of theory using the def2m-TZVP basis set. Configuration sets are divided by number of atoms per structure. Force corrections and dipoles are recorded in the metadata. |
10.60732/d4e67cf8 |
C, Cl, F, H, N, O, S |
Kate Huddleston, Roman Zubatyuk, Justin Smith, Adrian Roitberg, Olexandr Isayev, Ignacio Pickering, Christian Devereux, Kipton Barros |
9642825 |
146644476 |
7 |
DFT-B973c |
ORCA 4.2.1 |
254 |
||||
Configurations of Zr from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K. |
10.60732/efb626b6 |
Zr |
Christopher M. Andolina, Wissam A. Saidi |
4637 |
80393 |
1 |
DFT-PBE |
VASP |
252 |
||||
This dataset contains data from eight AIMD simulations run in VASP to study electrochemical *CO-*CO coupling -- coupling of two *CO molecules -- at the liquid water-Cu(100) interface. |
10.60732/62aed547 |
C, Cs, Cu, H, Li, O |
Henrik H. Kristoffersen, Karen Chan |
1671061 |
226245754 |
6 |
DFT-RPBE+D3 |
VASP |
244 |
||||
Approximately 300,000 benchmarking configurations derived partly from the MD-17 and LiPS datasets, partly from original simulated water and alanine dipeptide configurations. |
10.60732/62c08514 |
C, H, Li, N, O, P, S |
Xiang Fu, Zhenghao Wu, Wujie Wang, Tian Xie, Sinan Keten, Rafael Gomez-Bombarelli, Tommi Jaakkola |
294980 |
23733532 |
7 |
IP-AMBER-03, DFT-PBE |
DLPOLY, i-PI, VASP, GROMACS |
243 |
||||
Dataset contains DFT calculations of oxygen-deficient perovskites from the Ca2Fe2O5-brownmillerite and Ca2Mn2O5 structures; and tunnel CaMn4O8, a derivative of the CaMn2O4-marokite with Ca vacancies. The dataset was produced to investigate the effects of oxygen introduction or Ca vacancy introduction in ternary transition metal oxides, as a means to assess potential new Ca-ion battery materials. |
10.60732/8dfc08c5 |
Ca, Fe, Mn, O |
M. Elena Arroyo-de Dompablo, José Luis Casals |
2919 |
387258 |
4 |
DFT-PBE |
VASP |
228 |
||||
Configurations of Ag from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K. |
10.60732/7cf58a2f |
Ag |
Christopher M. Andolina, Wissam A. Saidi |
3654 |
99918 |
1 |
DFT-PBE |
VASP |
228 |
||||
OC20_S2EF_val_ood_cat is the out-of-domain validation set of the OC20 Structure to Energy and Forces (S2EF) dataset featuring unseen catalyst composition. Features include energy, atomic forces and data from the OC20 mappings file, including adsorbate id, materials project bulk id and miller index. |
10.60732/4221d2fa |
Ag, Al, As, Au, Bi, C, Ca, Cd, Cl, Co, Cr, Cs, Cu, Fe, Ga, Ge, H, Hf, Hg, In, Ir, K, Mn, Mo, N, Na, Nb, Ni, O, Os, P, Pb, Pd, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Tc, Te, Ti, Tl, V, W, Y, Zn, Zr |
Lowik Chanussot, Abhishek Das, Siddharth Goyal, Thibaut Lavril, Muhammed Shuaibi, Morgane Riviere, Kevin Tran, Javier Heras-Domingo, Caleb Ho, Weihua Hu, Aini Palizhati, Anuroop Sriram, Brandon Wood, Junwoong Yoon, Devi Parikh, C. Lawrence Zitnick, Zachary Ulissi |
999809 |
74059718 |
55 |
DFT-rPBE |
VASP |
216 |
||||
133885 molecular structures from the QM9 with revised bond and charges in the SDF format. Bond information can be gathered from the metadata column of the parquet files, a map where the key bonds contains the bond indices as they appear in the final rows of an SDF molecule block. If additional charges are present, these are contained under the key charge_info. rQM9 is derived from DeepChem's QM9 SDF dataset and rectifies the original dataset's net-charge discrepancies and invalid bond orders by enforcing correct valency-charge configurations. Nevertheless, a subset of molecules remains problematic, as they either fail RDKit sanitization or fragment into multiple components. The zero-based indices of these unresolved molecules are provided in a NumPy file in the original data file. |
C, F, H, N, O |
Cheng Zeng, Jirui Jin, George Karypis, Mark Transtrum, Ellad B. Tadmor, Richard G. Hennig, Adrian Roitberg, Stefano Martiniani, Mingjie Liu |
133885 |
2407753 |
5 |
DFT-B3LYP |
Gaussian 09 |
215 |
|||||
The JARVIS_DFT_3D_8_18_2021 dataset is part of the joint automated repository for various integrated simulations (JARVIS) DFT database. This subset contains configurations of 3D materials. JARVIS is a set of tools and datasets built to meet current materials design challenges. |
10.60732/a9dd64f6 |
Ac, Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr |
Kamal Choudhary, Kevin F. Garrity, Andrew C. E. Reid, Brian DeCost, Adam J. Biacchi, Angela R. Hight Walker, Zachary Trautt, Jason Hattrick-Simpers, A. Gilad Kusne, Andrea Centrone, Albert Davydov, Jie Jiang, Ruth Pachter, Gowoon Cheon, Evan Reed, Ankit Agrawal, Xiaofeng Qian, Vinit Sharma, Houlong Zhuang, Sergei V. Kalinin, Bobby G. Sumpter, Ghanshyam Pilania, Pinar Acar, Subhasish Mandal, Kristjan Haule, David Vanderbilt, Karin Rabe, Francesca Tavazza |
47036 |
465994 |
89 |
DFT-optB88-vdW, DFT-TBmBJ |
VASP |
204 |
||||
Dataset containing MD trajectories of the 42-atom tetrapeptide Ac-Ala3-NHMe from the MD22 benchmark set. MD22 represents a collection of datasets in a benchmark that can be considered an updated version of the MD17 benchmark datasets, including more challenges with respect to system size, flexibility and degree of non-locality. The datasets in MD22 include MD trajectories of the protein Ac-Ala3-NHMe; the lipid DHA (docosahexaenoic acid); the carbohydrate stachyose; nucleic acids AT-AT and AT-AT-CG-CG; and the buckyball catcher and double-walled nanotube supramolecules. Each of these is included here in a separate dataset, as represented on sgdml.org. Calculations were performed using FHI-aims and i-Pi software at the DFT-PBE+MBD level of theory. Trajectories were sampled at temperatures between 400-500 K at 1 fs resolution. |
10.60732/4bc7295f |
C, H, N, O |
Stefan Chmiela, Valentin Vassilev-Galindo, Oliver T. Unke, Adil Kabylda, Huziel E. Sauceda, Alexandre Tkatchenko, Klaus-Robert Müller |
85099 |
3574158 |
4 |
DFT-PBE+MBE |
FHI-aims |
202 |
||||
The aimd-from-PBE-3000-nvt training split of OMat24 (Open Materials 2024). OMat24 is a large-scale open dataset of density functional theory (DFT) calculations. The dataset is available in sub-datasets and subsampled sub-datasets based on the structure generation strategy used. There are two main splits in OMat24: train and validation, each divided into the aforementioned subsampling and sub-datasets. |
10.60732/105da475 |
Ac, Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr |
Luis Barroso-Luque, Muhammed Shuaibi, Xiang Fu, Brandon M. Wood, Misko Dzamba, Meng Gao, Ammar Rizvi, C. Lawrence Zitnick, Zachary W. Ulissi |
7839846 |
530963613 |
86 |
DFT-PBE+U |
VASP |
202 |
||||
Configurations of Al from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K. |
10.60732/ef6f5966 |
Al |
Christopher M. Andolina, Wissam A. Saidi |
2537 |
86924 |
1 |
DFT-PBE |
VASP |
202 |
||||
The aimd-from-PBE-3000-npt training split of OMat24 (Open Materials 2024). OMat24 is a large-scale open dataset of density functional theory (DFT) calculations. The dataset is available in sub-datasets and subsampled sub-datasets based on the structure generation strategy used. There are two main splits in OMat24: train and validation, each divided into the aforementioned subsampling and sub-datasets. |
10.60732/edd12490 |
Ac, Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr |
Luis Barroso-Luque, Muhammed Shuaibi, Xiang Fu, Brandon M. Wood, Misko Dzamba, Meng Gao, Ammar Rizvi, C. Lawrence Zitnick, Zachary W. Ulissi |
6076290 |
411540573 |
89 |
DFT-PBE+U |
VASP |
200 |
||||
Test configurations with fixed value for dihedral beta in alpha-gamma plane of 150 degreesfrom 3BPA dataset. Used to showcase the performance of linear atomic cluster expansion (ACE) force fields in a machine learning model to predict the potential energy surfaces of organic molecules. |
10.60732/1c4b1e1c |
C, H, N, O |
Dávid Péter Kovács, Cas van der Oord, Jiri Kucera, Alice E. A. Allen, Daniel J. Cole, Christoph Ortner, Gábor Csányi |
2350 |
63450 |
4 |
DFT-ωB97X |
ORCA |
193 |
||||
Test configurations from the 'random' split of Chig-AIMD. This dataset covers the conformational space of chignolin with DFT-level precision. We sequentially applied replica exchange molecular dynamics (REMD), conventional MD, and ab initio MD (AIMD) simulations on a 10 amino acid protein, Chignolin, and finally collected 2 million biomolecule structures with quantum level energy and force records. |
10.60732/ec7bfb65 |
C, H, N, O |
Tong Wang, Xinheng He, Mingyu Li, Bin Shao, Tie-Yan Liu |
198983 |
33031178 |
4 |
DFT-M06-2X |
ORCA 4.2.1 |
190 |
||||
The training split of OMC25. Open Molecular Crystals 2025 (OMC25) is a molecular crystal dataset produced by Meta. The OE62 dataset was used as a source for sampling molecules; crystals were generated with Genarris 3.0; from these, relaxation trajectories were generated and sampled to create the final dataset. See the publication for details. |
B, Br, C, Cl, F, H, I, N, O, P, S, Si |
Vahe Gharakhanyan, Luis Barroso-Luque, Yi Yang, Muhammed Shuaibi, Kyle Michel, Daniel S. Levine, Misko Dzamba, Xiang Fu, Meng Gao, Xingyu Liu, Haoran Ni, Keian Noori, Brandon M. Wood, Matt Uyttendaele, Arman Boromand, C. Lawrence Zitnick, Noa Marom, Zachary W. Ulissi, Anuroop Sriram |
24870226 |
3222851761 |
12 |
DFT-PBE |
VASP 6.3 |
189 |
|||||
Training configurations from the 'scaffold' split of Chig-AIMD. This dataset covers the conformational space of chignolin with DFT-level precision. We sequentially applied replica exchange molecular dynamics (REMD), conventional MD, and ab initio MD (AIMD) simulations on a 10 amino acid protein, Chignolin, and finally collected 2 million biomolecule structures with quantum level energy and force records. |
10.60732/facc4255 |
C, H, N, O |
Tong Wang, Xinheng He, Mingyu Li, Bin Shao, Tie-Yan Liu |
1592662 |
264381892 |
4 |
DFT-M06-2X |
ORCA 4.2.1 |
185 |
||||
The rattled-relax training split of OMat24 (Open Materials 2024). OMat24 is a large-scale open dataset of density functional theory (DFT) calculations. The dataset is available in subdatasets and subsampled sub-datasets based on the structure generation strategy used. There are two main splits in OMat24: train and validation, each divided into the aforementioned subsampling and sub-datasets. |
10.60732/a096865d |
Ac, Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr |
Luis Barroso-Luque, Muhammed Shuaibi, Xiang Fu, Brandon M. Wood, Misko Dzamba, Meng Gao, Ammar Rizvi, C. Lawrence Zitnick, Zachary W. Ulissi |
9433298 |
78952123 |
87 |
DFT-PBE+U |
VASP |
182 |
||||
Training dataset that captures chemical short-range order in equiatomic CrCoNi medium-entropy alloy published with our work Quantifying chemical short-range order in metallic alloys (description provided by authors) |
10.60732/76208b62 |
Co, Cr, Ni |
Yifan Cao, Killian Sheriff, Rodrigo Freitas |
1257 |
108684 |
3 |
DFT-PBE |
VASP 6.2.1 |
177 |
||||
The neutral validation set from OMol25. From the dataset creator: OMol25 represents the largest high quality molecular DFT dataset spanning biomolecules, metal complexes, electrolytes, and community datasets. OMol25 was generated at the ω B97M-V/def2-TZVPD level of theory. |
10.60732/0d5818c5 |
B, Br, C, Ca, Cl, F, H, I, K, Li, Mg, N, Na, O, P, S, Si |
Daniel S. Levine, Muhammed Shuaibi, Evan Walter Clark Spotte-Smith, Michael G. Taylor, Muhammad R. Hasyim, Kyle Michel, Ilyes Batatia, Gábor Csányi, Misko Dzamba, Peter Eastman, Nathan C. Frey, Xiang Fu, Vahe Gharakhanyan, Aditi S. Krishnapriyan, Joshua A. Rackers, Sanjeev Raja, Ammar Rizvi, Andrew S. Rosen, Zachary Ulissi, Santiago Vargas, C. Lawrence Zitnick, Samuel M. Blau, Brandon M. Wood |
27697 |
1238644 |
17 |
DFT-ωB97M-V |
ORCA |
176 |
||||
OC20_S2EF_val_id is the ~1-million structure in-domain validation set of the OC20 Structure to Energy and Forces (S2EF) dataset. Features include energy, atomic forces and data from the OC20 mappings file, including adsorbate id, materials project bulk id and miller index. |
10.60732/eaea5062 |
Ag, Al, As, Au, B, Bi, C, Ca, Cd, Cl, Co, Cr, Cs, Cu, Fe, Ga, Ge, H, Hf, Hg, In, Ir, K, Mn, Mo, N, Na, Nb, Ni, O, Os, P, Pb, Pd, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Tc, Te, Ti, Tl, V, W, Y, Zn, Zr |
Lowik Chanussot, Abhishek Das, Siddharth Goyal, Thibaut Lavril, Muhammed Shuaibi, Morgane Riviere, Kevin Tran, Javier Heras-Domingo, Caleb Ho, Weihua Hu, Aini Palizhati, Anuroop Sriram, Brandon Wood, Junwoong Yoon, Devi Parikh, C. Lawrence Zitnick, Zachary Ulissi |
999866 |
73147343 |
56 |
DFT-rPBE |
VASP |
175 |
||||
Configurations of Pt from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K. |
10.60732/49b97320 |
Pt |
Christopher M. Andolina, Wissam A. Saidi |
2605 |
62053 |
1 |
DFT-PBE |
VASP |
175 |
||||
The full (unfiltered) validation split of ODAC25.Open Direct Air Capture 2025 (ODAC25) is the largest high-quality DFT dataset for Direct Air Capture, containing over 15,000 Metal-Organic Frameworks (MOFs), including experimental, defective, synthetic, and amine-functionalized MOFs, with 4 adsorbates: CO2, H2O, N2, and O2. ODAC25 significantly improves upon ODAC23 by adding functionalized MOFs, new adsorbates (N2 and O2), higher k-point convergence, and re-relaxations of empty MOFs. The dataset contains three partitions: (1) mof_plus_adsorbate includes full DFT relaxations of different adsorbates on various MOFs; (2) mof includes re-relaxations of empty MOFs; (3) gcmc includes DFT single points of configurations derived from Grand Canonical Monte Carlo (GCMC) simulations. |
Ag, Al, Bi, Br, C, Cd, Ce, Cl, Co, Cr, Cu, Eu, F, Fe, Gd, H, Hg, I, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, O, P, Pr, S, Sc, Se, Si, Sm, Sr, Tb, Th, U, V, Y, Zn, Zr |
Anuroop Sriram, Logan M. Brabson, Xiaohan Yu, Sihoon Choi, Kareem Abdelmaqsoud, Elias Moubarak, Pim de Haan, Sindy Löwe, Johann Brehmer, John R. Kitchin, Max Welling, C. Lawrence Zitnick, Zachary Ulissi, Andrew J. Medford, David S. Sholl |
1290240 |
286971239 |
42 |
DFT-PBE+D3 |
VASP 6.3 |
172 |
|||||
The JARVIS_CFID_OQMD dataset is part of the joint automated repository for various integrated simulations (JARVIS) database. This dataset contains configurations from the Open Quantum Materials Database (OQMD), created to hold information about the electronic structure and stability of organic materials for the purpose of aiding in materials discovery. Calculations were performed at the DFT level of theory, using the PAW-PBE functional implemented by VASP. This dataset also includes classical force-field inspired descriptors (CFID) for each configuration. JARVIS is a set of tools and collected datasets built to meet current materials design challenges. |
10.60732/967596c1 |
Ac, Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr |
Scott Kirklin, James E Saal, Bryce Meredig, Alex Thompson, Jeff W Doak, Muratahan Aykol, Stephan Rühl, Chris Wolverton |
459943 |
2365987 |
89 |
DFT-PBE |
VASP |
172 |
||||
Approximately 57,000 configurations from the evaluation datasets for NequIP graph neural network model for interatomic potentials. Trajectories have been taken from LIPS, LIPO glass melt-quench simulation, and formate decomposition on Cu datasets. |
10.60732/e05d99fd |
C, Cu, H, Li, O, P, S |
Simon Batzner, Albert Musaelian, Lixin Sun, Mario Geiger, Jonathan P. Mailoa, Mordechai Kornbluth, Nicola Molinari, Tess E. Smidt, Boris Kozinsky |
56822 |
7629463 |
7 |
DFT-PBE |
CP2K, VASP |
171 |
||||
Training configurations from the 'random' split of Chig-AIMD. This dataset covers the conformational space of chignolin with DFT-level precision. We sequentially applied replica exchange molecular dynamics (REMD), conventional MD, and ab initio MD (AIMD) simulations on a 10 amino acid protein, Chignolin, and finally collected 2 million biomolecule structures with quantum level energy and force records. |
10.60732/fac841ac |
C, H, N, O |
Tong Wang, Xinheng He, Mingyu Li, Bin Shao, Tie-Yan Liu |
1592677 |
264384382 |
4 |
DFT-M06-2X |
ORCA 4.2.1 |
169 |
||||
This dataset is composed of fully-deuterated Gd(III) analogue d-[GdL] in a variety of solvent materials, including MeOH, D2O and d6-DMSO. |
10.60732/cd8c58b7 |
C, Gd, H, N, O, S |
Barak Alnami, Jon G. C. Kragskow, Jakob K. Staab, Jonathan M. Skelton, Nicholas F. Chilton |
41746 |
28418566 |
6 |
DFT-PBE+D3 |
VASP 6.2.0 |
166 |
||||
ANI-1x contains DFT calculations for approximately 5 million molecular conformations. From an initial training set, an active learning method was used to iteratively add conformations where insufficient diversity was detected. Additional conformations were sampled from existing databases of molecules, such as GDB-11 and ChEMBL. On each of these configurations, one of molecular dynamics sampling, normal mode sampling, dimer sampling, or torsion sampling was performed. |
10.60732/dd0270c8 |
C, H, N, O |
Justin S. Smith, Roman Zubatyuk, Benjamin Nebgen, Nicholas Lubbers, Kipton Barros, Adrian E. Roitberg, Olexandr Isayev, Sergei Tretiak |
308645 |
5229919 |
4 |
DFT-ωB97X |
Gaussian 09 |
163 |
||||
The Matbench_perovskites dataset is a Matbench v0.1 test dataset for predicting formation energy from crystal structure. Adapted from an original dataset generated by Castelli et al. Refer to the Automatminer/Matbench publication for more details. This dataset contains the energy of formation of the entire 5-atom perovskite cell in eV as calculated by RPBE GGA-DFT. Note the reference state for oxygen was computed from oxygen's chemical potential in water vapor, not as oxygen molecules, to reflect the application for which these perovskites were studied. Matbench is an automated leaderboard for benchmarking state of the art ML algorithms predicting a diverse range of solid materials' properties. It is hosted and maintained by the Materials Project. |
10.60732/c2d25b5f |
Ag, Al, As, Au, B, Ba, Be, Bi, Ca, Cd, Co, Cr, Cs, Cu, F, Fe, Ga, Ge, Hf, Hg, In, Ir, K, La, Li, Mg, Mn, Mo, N, Na, Nb, Ni, O, Os, Pb, Pd, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Si, Sn, Sr, Ta, Te, Ti, Tl, V, W, Y, Zn, Zr |
Alexander Dunn, Qi Wang, Alex Ganose, Daniel Dopp, Anubhav Jain |
18926 |
94630 |
56 |
DFT-rPBE |
GPAW |
160 |
||||
The train set of a train/test pair from the aspirin dataset from sGDML. To create the coupled cluster datasets, the data used for training the models were created by running abinitio MD in the NVT ensemble using the Nosé-Hoover thermostat at 500 K during a 200 ps simulation with a resolution of 0.5 fs. Energies and forces were recalculated by all-electron coupled cluster with single, double and perturbative triple excitations (CCSD(T)). The Dunning correlation-consistent basis set CCSD/cc-pVDZ was used for aspirin. All calculations were performed with the Psi4 software suite. |
10.60732/51775b8b |
C, H, O |
Stefan Chmiela, Huziel E. Sauceda, Klaus-Robert Müller, Alexandre Tkatchenko |
996 |
20916 |
3 |
CCSD |
Psi4 |
159 |
||||
The JARVIS_SNUMAT dataset is part of the joint automated repository for various integrated simulations (JARVIS) database. This dataset contains band gap data for >10,000 materials, computed using a hybrid functional and considering the stable magnetic ordering. Structure relaxation and band edges are obtained using the PBE XC functional; band gap energy is subsequently obtained using the HSE06 hybrid functional. Optical and fundamental band gap energies are included. Some gap energies are recalculated by including spin-orbit coupling. These are noted in the band gap metadata as "SOC=true". JARVIS is a set of tools and collected datasets built to meet current materials design challenges. |
10.60732/d2b06d5a |
Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, F, Fe, Ga, Ge, H, He, Hf, Hg, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Ne, Ni, O, Os, P, Pb, Pd, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Tc, Te, Th, Ti, Tl, V, W, Xe, Y, Zn, Zr |
Sangtae Kim, Miso Lee, Changho Hong, Youngchae Yoon, Hyungmin An, Dongheon Lee, Wonseok Jeong, Dongsun Yoo, Youngho Kang, Yong Youn, Seungwu Han |
10481 |
216749 |
73 |
DFT-PBE, DFT-HSE06 |
VASP |
157 |
||||
COMP6v2-B973c-def2mTZVP is the portion of COMP6v2 calculated at the B973c/def2mTZVP level of theory. COmprehensive Machine-learning Potential (COMP6) Benchmark Suite version 2.0 is an extension of the COMP6 benchmark found in the following repository: https://github.com/isayev/COMP6. COMP6v2 is a data set of density functional properties for molecules containing H, C, N, O, S, F, and Cl. It is available at the following levels of theory: wB97X/631Gd (data used to train model in the ANI-2x paper); wB97MD3BJ/def2TZVPP; wB97MV/def2TZVPP; B973c/def2mTZVP. The 6 subsets from COMP6 (ANI-MD, DrugBank, GDB07to09, GDB10to13 Tripeptides, and s66x8) are contained in each of the COMP6v2 datasets corresponding to the above levels of theory. |
10.60732/2228cf4a |
C, Cl, F, H, N, O, S |
Kate Huddleston, Roman Zubatyuk, Justin Smith, Adrian Roitberg, Olexandr Isayev, Ignacio Pickering, Christian Devereux, Kipton Barros |
156317 |
3785763 |
7 |
DFT-B973c |
ORCA 4.2.1 |
156 |
||||
Test configurations with MD simulations performed at 300K from 3BPA, used to showcase the performance of linear atomic cluster expansion (ACE) force fields in a machine learning model to predict the potential energy surfaces of organic molecules. |
10.60732/5737de70 |
C, H, N, O |
Dávid Péter Kovács, Cas van der Oord, Jiri Kucera, Alice E. A. Allen, Daniel J. Cole, Christoph Ortner, Gábor Csányi |
1669 |
45063 |
4 |
DFT-ωB97X |
ORCA |
155 |
||||
Training configurations with MD simulation performed at 300K, 600K and 1200K from 3BPA dataset, used to showcase the performance of linear atomic cluster expansion (ACE) force fields in a machine learning model to predict the potential energy surfaces of organic molecules. |
10.60732/1dbc6d0a |
C, H, N, O |
Dávid Péter Kovács, Cas van der Oord, Jiri Kucera, Alice E. A. Allen, Daniel J. Cole, Christoph Ortner, Gábor Csányi |
500 |
13500 |
4 |
DFT-ωB97X |
ORCA |
154 |
||||
The water data set comprises energies and forces of 9,189 condensed-phase structures. The data was obtained in an iterative procedure described in detail in Ref. [4]. The final ANN potential was employed in Refs. [4,5] to analyze temperature-dependent Raman spectra of liquid water. The data set contains structures from four iterations: Initial structures (iteration 0) were obtained from classical and path integral AIMD simulations of bulk liquid water in a cubic box containing 64 water molecules at 300 K as reported in Ref. [6]. Distorted configurations with higher forces were added by randomly displacing the Cartesian coordinates of these configurations. Iteration 1 contains a set of 500 configurations from MD simulations with the fully flexible SPC/E flex water model [7] employing a 25 % increased water density (simulation box with 80 water molecules) and elevated temperatures (T = 500 K) in order to sample highly repulsive configurations. Structures in iteration 2 were obtained by classical MD simulations with preliminary ANN potentials at T = 300 K, 325 K, 350 K, and 370 K employing cubic boxes with 64 molecules and the corresponding experimental densities. The final iteration 3 data contains structures from preliminary ANN simulations with classical and quantum nuclei, respectively, at a wide range of temperatures (T = 258 K, 268 K, 280 K, 290 K, 300 K, 310 K, 320 K, 330 K, 340 K, 350 K, 360 K, and 370 K) using cubic boxes with 64 molecules and the corresponding experimental densities. Energies and atomic forces were calculated with the CP2K program [8,9] using the revPBE exchange-correlation functional [10,11] with D3 dispersion correction [12] following the setup reported in Ref. [4]. Atomic cores were represented using the dual-space Goedecker-Teter-Hutter pseudopotentials [13], Kohn-Sham orbitals were expanded in the TZV2P basis set within the GPW method [14], and the density was represented by an auxiliary plane-wave basis with a cutoff of 400 Ry. [1] A. Kokalj, J. Mol. Graphics Modell. 17, 176–179 (1999). [2] N. Artrith, A. Urban, Comput. Mater. Sci. 114, 135–150 (2016). [3] N. Artrith, A. Urban, G. Ceder, Phys. Rev. B 96, 014112 (2017). [4] T. Morawietz, O. Marsalek, S. R. Pattenaude, L. M. Streacker, D. Ben-Amotz, and T. E. Markland, J. Phys. Chem. Lett. 9, 851 (2018). [5] T. Morawietz, A. S. Urbina, P. K. Wise, X. Wu, W. Lu, D. Ben-Amotz, and T. E. Markland, J. Phys. Chem. Lett. 10, 6067 (2019). [6] Marsalek and T. E. Markland, J. Phys. Chem. Lett. 8, 1545 (2017). [7] X. B. Zhang, Q. L. Liu, and A. M. Zhu, Fluid Ph. Equilibria 262, 210(2007). [8] J. VandeVondele, M. Krack, F. Mohamed, M. Parrinello, T. Chassaing, and J. Hutter, Comput. Phys. Commun. 167, 103 (2005). [9] J. Hutter, M. Iannuzzi, F. Schiffmann, and J. VandeVondele, WIRES Comput. Mol. Sci. 4, 15 (2014). [10] J. P. Perdew, K. Burke, and M. Ernzerhof, Phys. Rev. Lett. 77, 3865 (1996). [11] Y. Zhang and W. Yang, Phys. Rev. Lett. 80, 890 (1998). [12] S. Grimme, J. Antony, S. Ehrlich, and H. Krieg, J. Chem. Phys. 132, 154104 (2010). [13] S. Goedecker, M. Teter, and J. Hutter, Phys. Rev. B 54, 1703 (1996). [14] B. G. Lippert, J. Hutter, and M. Parrinello, Mol. Phys. 92, 477 (1997). |
10.60732/6ff013d4 |
H, O |
Michael S. Chen, Tobias Morawietz, Thomas E. Markland, Nongnuch Artrith |
9188 |
1788096 |
2 |
DFT-revPBE+D3 |
CP2K |
154 |
||||
Configurations of Co from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K. |
10.60732/15bb1dca |
Co |
Christopher M. Andolina, Wissam A. Saidi |
3337 |
67026 |
1 |
DFT-PBE |
VASP |
152 |
||||
Configurations of sma from WS22. The WS22 database combines Wigner sampling with geometry interpolation to generate 1.18 million molecular geometries equally distributed into 10 independent datasets of flexible organic molecules with varying sizes and chemical complexity. In addition to the potential energy and forces required to construct potential energy surfaces, the WS22 database provides several other quantum chemical properties, all obtained via single-point calculations for each molecular geometry. All quantum chemical calculations were performed with the Gaussian 09 program. |
10.60732/bd5d6dc9 |
C, H, N, O |
Max Pinheiro Jr, Shuang Zhang, Pavlo O. Dral, Mario Barbatti |
120028 |
2280532 |
4 |
DFT-PBE0 |
Gaussian 09 |
150 |
||||
Training split of the MAD-1.5 (Massive Atomic Diversity version 1.5) dataset, a highly curated collection designed for training broadly applicable atomistic machine-learning models across the full periodic table. MAD-1.5 extends the original MAD dataset with targeted enrichment strategies covering 102 chemical elements (all isotopes with half-life above one day). All 216,803 structures are computed with a single standardized all-electron DFT workflow using the r2SCAN meta-GGA functional in FHI-aims (version 250806), with tight basis sets, 8 Angstrom^-1 k-point density, Gaussian smearing of 0.05 eV, and SCF convergence thresholds of 1e-6 eV (energy), 1e-4 eV/Angstrom (forces), and 1e-5 e*a0^-3 (electron density). The dataset spans molecules (monomers, dimers, trimers, molecular crystals), bulk crystals, surfaces, nanoclusters, and low-dimensional structures organized into 14 subsets. Quality is ensured by two-step outlier removal: heuristic filtering of structures with forces >100 eV/Angstrom, followed by LLPR uncertainty-based filtering. The training split (~83% of cleaned data) includes all monomers, dimers, and trimers to anchor low-body-order interactions. A companion PBE-functional dataset (Massive_Atomic_Diversity_MAD-1.5_PBE) was used during model training with separate prediction heads. |
Ac, Ag, Al, Am, Ar, As, At, Au, B, Ba, Be, Bi, Bk, Br, C, Ca, Cd, Ce, Cf, Cl, Cm, Co, Cr, Cs, Cu, Dy, Er, Es, Eu, F, Fe, Fm, Fr, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Md, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, No, Np, O, Os, P, Pa, Pb, Pd, Pm, Po, Pr, Pt, Pu, Ra, Rb, Re, Rh, Rn, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr |
Cesare Malosso, Filippo Bigi, Paolo Pegolo, Joseph W. Abbott, Philip Loche, Mariana Rossi, Michele Ceriotti, Arslan Mazitov |
180174 |
2630122 |
102 |
DFT-r2SCAN |
FHI-aims v250806 |
149 |
|||||
In-domain validation configurations for the structure to total energy and forces (S2EF) task of OC22. Open Catalyst 2022 (OC22) is a database of training trajectories for predicting catalytic reactions on oxide surfaces meant to complement OC20, which did not contain oxide surfaces. |
10.60732/2e72b273 |
Ag, Al, As, Au, Ba, Be, Bi, C, Ca, Cd, Ce, Co, Cr, Cs, Cu, Fe, Ga, Ge, H, Hf, Hg, In, Ir, K, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Ni, O, Os, Pb, Pd, Pt, Rb, Re, Rh, Ru, Sb, Sc, Se, Si, Sn, Sr, Ta, Te, Ti, Tl, V, W, Y, Zn, Zr |
Richard Tran, Janice Lan, Muhammed Shuaibi, Brandon M. Wood, Siddharth Goyal, Abhishek Das, Javier Heras-Domingo, Adeesh Kolluru, Ammar Rizvi, Nima Shoghi, Anuroop Sriram, Felix Therrien, Jehad Abed, Oleksandr Voznyy, Edward H. Sargent, Zachary Ulissi, C. Lawrence Zitnick |
405444 |
31860942 |
57 |
DFT-PBE+U |
VASP |
148 |
||||
The n-tetradecane training split of the QM-22 datasets. QM-22 consists of CHON molecules of 4-15 atoms, developed in counterpoint to the MD17 dataset, run at higher total energies (above 500 K) and with a broader configuration space. |
10.60732/0396d7de |
C, H |
Chen Qu, Paul L. Houston, Thomas Allison, Barry I. Schneider, Joel M. Bowman |
253646 |
11160424 |
2 |
DFT-B3LYP |
Gaussian 16 |
146 |
||||
Approximately 2,300 configurations of Li10SiP2S12, based on crystal structures from the Materials Project database, material ID mp-696129. One of two LiSiPS datasets from this source. The other uses the PBE functional, rather than the PBEsol functional. |
10.60732/8e2d8e4c |
Li, P, S, Si |
Jianxing Huang, Linfeng Zhang, Han Wang, Jinbao Zhao, Jun Cheng, Weinan E |
2356 |
313100 |
4 |
DFT-PBEsol |
VASP 5.4.4 |
146 |
||||
Verification set for magnetic Moment Tensor Potentials (mMTPs) for the bcc Fe-Al system. Contains 336 configurations of 16-atom Fe-Al supercells with collinear atomic magnetic moments, used to validate mMTPs trained on the companion training set (FeAl-mMTP-Train). Configurations generated using constrained DFT (cDFT) with ABINIT and PAW PBE pseudopotentials with a 6x6x6 k-point mesh and 25 Hartree plane-wave cutoff energy. mMTPs predict formation energy, lattice parameters, and total magnetic moments of bcc Fe-Al at 0 K.Note: ColabFit dataset contains energy, atomic forces, and stress. Refer to the original files for per-atom magnetic moment data. |
Al, Fe |
Alexey S. Kotykhov, Konstantin Gubaev, Max Hodapp, Christian Tantardini, Alexander V. Shapeev, Ivan S. Novikov |
210 |
5376 |
2 |
DFT-PBE |
ABINIT |
146 |
|||||
The Acetaldehyde (singlet) set of the QM-22 datasets, with energies calculated at the CCSD(T)/MRCI level of theory. QM-22 consists of CHON molecules of 4-15 atoms, developed in counterpoint to the MD17 dataset, run at higher total energies (above 500 K) and with a broader configuration space. |
10.60732/27f8a97a |
C, H, O |
Yong-Chang Han, Benjamin C. Shepler, Joel M. Bowman |
202518 |
1417626 |
3 |
CCSD(T), MRCI |
MOLPRO |
142 |
||||
Configurations of Ni from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K. |
10.60732/b184fffc |
Ni |
Christopher M. Andolina, Wissam A. Saidi |
3778 |
74782 |
1 |
DFT-PBE |
VASP |
142 |
||||
Test split of the MAD-1.5 (Massive Atomic Diversity version 1.5) dataset, a highly curated collection designed for training broadly applicable atomistic machine-learning models across the full periodic table. MAD-1.5 extends the original MAD dataset with targeted enrichment strategies covering 102 chemical elements (all isotopes with half-life above one day). All 216,803 structures are computed with a single standardized all-electron DFT workflow using the r2SCAN meta-GGA functional in FHI-aims (version 250806), with tight basis sets, 8 Angstrom^-1 k-point density, Gaussian smearing of 0.05 eV, and SCF convergence thresholds of 1e-6 eV (energy), 1e-4 eV/Angstrom (forces), and 1e-5 e*a0^-3 (electron density). The dataset spans molecules (monomers, dimers, trimers, molecular crystals), bulk crystals, surfaces, nanoclusters, and low-dimensional structures organized into 14 subsets. Quality is ensured by two-step outlier removal: heuristic filtering of structures with forces >100 eV/Angstrom, followed by LLPR uncertainty-based filtering. The test split (~10% of cleaned data, excluding monomers, dimers, and trimers which are fixed in the training split) uses a stratified split method consistent with the training and validation splits. Subset-resolved MAE for PET-MAD-1.5-S on this test set is 11.09 meV/atom (energy) and 36.81 meV/Angstrom (forces). A companion PBE-functional dataset (Massive_Atomic_Diversity_MAD-1.5_PBE) was used during model training with separate prediction heads. |
Ac, Ag, Al, Am, Ar, As, At, Au, B, Ba, Be, Bi, Bk, Br, C, Ca, Cd, Ce, Cf, Cl, Cm, Co, Cr, Cs, Cu, Dy, Er, Es, Eu, F, Fe, Fm, Fr, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Md, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, No, Np, O, Os, P, Pa, Pb, Pd, Pm, Po, Pr, Pt, Pu, Ra, Rb, Re, Rh, Rn, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr |
Cesare Malosso, Filippo Bigi, Paolo Pegolo, Joseph W. Abbott, Philip Loche, Mariana Rossi, Michele Ceriotti, Arslan Mazitov |
18314 |
321704 |
102 |
DFT-r2SCAN |
FHI-aims v250806 |
141 |
|||||
The validation split of sAlex. sAlex is a subsample of the Alexandria dataset that was used to fine tune the OMat24 (Open Materials 2024) models. From the site: sAlex was created by removing structures matched in WBM and only sampling structure along a trajectory with an energy difference greater than 10 meV/atom. |
10.60732/1c59d4ac |
Ac, Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr |
Luis Barroso-Luque, Muhammed Shuaibi, Xiang Fu, Brandon M. Wood, Misko Dzamba, Meng Gao, Ammar Rizvi, C. Lawrence Zitnick, Zachary W. Ulissi |
547885 |
5670890 |
86 |
DFT-PBE+U |
VASP |
141 |
||||
MatPES (Materials Potential Energy Surface) is a foundational PES dataset developed collaboratively by the Materials Virtual Lab and Materials Project. The v2025.1 PBE release contains 434,712 structures sampled via the DIRECT method from 300 K NpT molecular dynamics simulations seeded from Materials Project entries. Static DFT calculations were performed using VASP with the PBE functional and MatPESStaticSet convergence settings optimized for energy, force, and stress calculations. There is a companion dataset calculated with the r2SCAN functional (MatPES-R2SCAN-2025.1). |
Ac, Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr |
Aaron D. Kaplan, Runze Liu, Ji Qi, Tsz Wai Ko, Bowen Deng, Janosh Riebesell, Gerbrand Ceder, Kristin A. Persson, Shyue Ping Ong |
434668 |
3881535 |
89 |
DFT-PBE |
VASP 6.4.x |
140 |
|||||
Dataset containing MD trajectories of AT-AT-CG-CG DNA base pairs from the MD22 benchmark set. MD22 represents a collection of datasets in a benchmark that can be considered an updated version of the MD17 benchmark datasets, including more challenges with respect to system size, flexibility and degree of non-locality. The datasets in MD22 include MD trajectories of the protein Ac-Ala3-NHMe; the lipid DHA (docosahexaenoic acid); the carbohydrate stachyose; nucleic acids AT-AT and AT-AT-CG-CG; and the buckyball catcher and double-walled nanotube supramolecules. Each of these is included here in a separate dataset, as represented on sgdml.org. Calculations were performed using FHI-aims and i-Pi software at the DFT-PBE+MBD level of theory. Trajectories were sampled at temperatures between 400-500 K at 1 fs resolution. |
10.60732/a87c6d4c |
C, H, N, O |
Stefan Chmiela, Valentin Vassilev-Galindo, Oliver T. Unke, Adil Kabylda, Huziel E. Sauceda, Alexandre Tkatchenko, Klaus-Robert Müller |
10153 |
1198054 |
4 |
DFT-PBE+MBE |
FHI-aims |
139 |
||||
The test split of the dataset Alex_MP-20. This dataset contains structures from the Alexandria (Schmidt et al. 2022) and MP-20 (Materials Project 2020) datasets. Data has been modified as follows: Exclude structures containing the elements Tc, Pm, or any element with atomic number 84 or higher. Relax structures with DFT using a PBE functional in order to have consistent energies. For the training set, remove any structure with more than 20 atoms inside the unit cell. For the training set, remove any structure with energy above the hull higher than 0.1 eV/atom. |
10.60732/4df848c7 |
Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, Hf, Hg, Ho, I, In, Ir, K, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, O, Os, P, Pb, Pd, Pr, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Te, Ti, Tl, Tm, V, W, Y, Yb, Zn, Zr |
Claudio Zeni, Robert Pinsler, Daniel Zügner, Andrew Fowler, Matthew Horton, Xiang Fu, Zilong Wang, Aliaksandra Shysheya, Jonathan Crabbé, Shoko Ueda, Roberto Sordillo, Lixin Sun, Jake Smith, Bichlien Nguyen, Hannes Schulz, Sarah Lewis, Chin-Wei Huang, Ziheng Lu, Yichi Zhou, Han Yang, Hongxia Hao, Jielan Li, Chunlei Yang, Wenjie Li, Ryota Tomioka, Tian Xie |
67521 |
647769 |
76 |
DFT-PBE |
VASP |
139 |
||||
The validation split of the dataset Alex_MP-20. This dataset contains structures from the Alexandria (Schmidt et al. 2022) and MP-20 (Materials Project 2020) datasets. Data has been modified as follows: Exclude structures containing the elements Tc, Pm, or any element with atomic number 84 or higher. Relax structures with DFT using a PBE functional in order to have consistent energies. For the training set, remove any structure with more than 20 atoms inside the unit cell. For the training set, remove any structure with energy above the hull higher than 0.1 eV/atom. |
10.60732/4132ee7c |
Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, Hf, Hg, Ho, I, In, Ir, K, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, O, Os, P, Pb, Pd, Pr, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Te, Ti, Tl, Tm, V, W, Y, Yb, Zn, Zr |
Claudio Zeni, Robert Pinsler, Daniel Zügner, Andrew Fowler, Matthew Horton, Xiang Fu, Zilong Wang, Aliaksandra Shysheya, Jonathan Crabbé, Shoko Ueda, Roberto Sordillo, Lixin Sun, Jake Smith, Bichlien Nguyen, Hannes Schulz, Sarah Lewis, Chin-Wei Huang, Ziheng Lu, Yichi Zhou, Han Yang, Hongxia Hao, Jielan Li, Chunlei Yang, Wenjie Li, Ryota Tomioka, Tian Xie |
67521 |
647222 |
76 |
DFT-PBE |
VASP |
137 |
||||
Benzene test PBE0-MBD dataset from "Semi-local and hybrid functional DFT data for thermalised snapshots of polymorphs of benzene, succinic acid, and glycine". DFT reference energies and forces were calculated using Quantum Espresso v6.3. The calculations were performed with the semi-local PBE xc functional, Tkatchenko-Scheffler dispersion correction, optimised norm-conserving Vanderbilt pseudopotentials, a Monkhorst-Pack k-point grid with a maximum spacing of 0.06 x 2π A^-1, and a plane-wave energy cut-off of 100 Ry for the wavefunction. |
10.60732/c88b64a0 |
C, H |
Venkat Kapil, Edgar A. Engel |
200 |
5760 |
2 |
DFT-PBE+TS |
Quantum ESPRESSO v6.3 |
137 |
||||
The n-syn-CH3CHOO set of the QM-22 datasets, with energies calculated at the CCSD(T)/MRCI level of theory. QM-22 consists of CHON molecules of 4-15 atoms, developed in counterpoint to the MD17 dataset, run at higher total energies (above 500 K) and with a broader configuration space. |
10.60732/4647e973 |
C, H, O |
Nathanael M. Kidwell, Hongwei Li, Xiaohong Wang, Joel M. Bowman, Marsha I. Lester |
159474 |
1275792 |
3 |
CCSD(T)-F12b |
MOLPRO, MOLCAS |
135 |
||||
The Glycine set of the QM-22 datasets. QM-22 consists of CHON molecules of 4-15 atoms, developed in counterpoint to the MD17 dataset, run at higher total energies (above 500 K) and with a broader configuration space. |
10.60732/9771a0c2 |
C, H, N, O |
Joel M. Bowman, Jeffrey Li, Chen Qu, Riccardo Conte, Paul L. Houston |
70099 |
700990 |
4 |
DFT-B3LYP |
MOLPRO |
135 |
||||
The JARVIS_Open_Catalyst_All dataset is part of the joint automated repository for various integrated simulations (JARVIS) DFT database. This subset contains configurations from the Open Catalyst Project (OCP) 460328 training, rest validation and test dataset. JARVIS is a set of tools and datasets built to meet current materials design challenges. |
10.60732/198ab33a |
Ag, Al, As, Au, B, Bi, C, Ca, Cd, Cl, Co, Cr, Cs, Cu, Fe, Ga, Ge, H, Hf, Hg, In, Ir, K, Mn, Mo, N, Na, Nb, Ni, O, Os, P, Pb, Pd, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Tc, Te, Ti, Tl, V, W, Y, Zn, Zr |
Lowik Chanussot, Abhishek Das, Siddharth Goyal, Thibaut Lavril, Muhammed Shuaibi, Morgane Riviere, Kevin Tran, Javier Heras-Domingo, Caleb Ho, Weihua Hu, Aini Palizhati, Anuroop Sriram, Brandon Wood, Junwoong Yoon, Devi Parikh, C. Lawrence Zitnick, Zachary Ulissi |
485236 |
37726627 |
56 |
DFT-rPBE |
VASP |
135 |
||||
Configurations of Pd from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K. |
10.60732/8d5bdb05 |
Pd |
Christopher M. Andolina, Wissam A. Saidi |
3413 |
137688 |
1 |
DFT-PBE |
VASP |
135 |
||||
The JARVIS_Materials_Project_84K dataset is part of the joint automated repository for various integrated simulations (JARVIS) DFT database. This subset contains 84,000 configurations of 3D materials from the Materials Project database. JARVIS is a set of tools and datasets built to meet current materials design challenges. |
10.60732/46681ef7 |
Ac, Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr |
Anubhav Jain, Shyue Ping Ong, Geoffroy Hautier, Wei Chen, William Davidson Richards, Stephen Dacek, Shreyas Cholia, Dan Gunter, David Skinner, Gerbrand Ceder, Kristin A. Persson |
83416 |
2339728 |
89 |
DFT-undefined |
VASP |
132 |
||||
The JARVIS_OQMD_no_CFID dataset is part of the joint automated repository for various integrated simulations (JARVIS) database. This dataset contains configurations from the Open Quantum Materials Database (OQMD), created to hold information about the electronic structure and stability of organic materials for the purpose of aiding in materials discovery. Calculations were performed at the DFT level of theory, using the PAW-PBE functional implemented by VASP. JARVIS is a set of tools and collected datasets built to meet current materials design challenges. |
10.60732/82cb32aa |
Ac, Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr |
Scott Kirklin, James E Saal, Bryce Meredig, Alex Thompson, Jeff W Doak, Muratahan Aykol, Stephan Rühl, Chris Wolverton |
811368 |
5015282 |
89 |
DFT-PBE |
VASP |
131 |
||||
ANI-1 is a dataset of 20 million conformations with calculated non-equilibrium energy values. The conformations are based on a subset of the GDB-11 dataset, each molecule containing between 1 and 8 heavy atoms, with atomic species limited to C, N and O. Configuration sets are included for standard and high energy (defined as energies greater than 275 kcal*mol-1 higher than the lowest energy conformer) conformations, and, within these, number of heavy atoms per molecule. |
10.60732/a57b3cb3 |
C, H, N, O |
Justin S. Smith, Olexandr Isayev, Adrian E. Roitberg |
24389594 |
392138641 |
4 |
DFT-ωB97X |
Gaussian 09 |
131 |
||||
The JARVIS-Polymer-Genome dataset is part of the joint automated repository for various integrated simulations (JARVIS) database. This dataset contains configurations from the Polymer Genome dataset, as created for the linked publication (Huan, T., Mannodi-Kanakkithodi, A., Kim, C. et al.). Structures were curated from existing sources and the original authors' works, removing redundant, identical structures before calculations, and removing redundant datapoints after calculations were performed. Band gap energies were calculated using two different DFT functionals: rPW86 and HSE06; atomization energy was calculated using rPW86. JARVIS is a set of tools and collected datasets built to meet current materials design challenges. |
10.60732/37f5fcea |
Al, C, Ca, Cd, Cl, F, H, Hf, Mg, N, O, Pb, S, Sn, Ti, Zn, Zr |
Tran Doan Huan, Arun Mannodi-Kanakkithodi, Chiho Kim, Vinit Sharma, Ghanshyam Pilania, Rampi Ramprasad |
1073 |
34441 |
17 |
DFT-rPW86, DFT-HSE06 |
VASP |
130 |
||||
The n-tetradecane testing split of the QM-22 datasets. This split includes DFT calculated atomic forces. Metadata includes energy difference in cm^-1 between given structure and the zig-zag minimum. QM-22 consists of CHON molecules of 4-15 atoms, developed in counterpoint to the MD17 dataset, run at higher total energies (above 500 K) and with a broader configuration space. |
10.60732/7cec33e0 |
C, H |
Chen Qu, Paul L. Houston, Thomas Allison, Barry I. Schneider, Joel M. Bowman |
89648 |
5375749 |
2 |
DFT-B3LYP |
Gaussian 16 |
128 |
||||
ANI-2x-wB97X-631Gd is a portion of the ANI-2x dataset, which includes DFT-calculated energies for structures from 2 to 63 atoms in size containing H, C, N, O, S, F, and Cl. This portion of ANI-2x was calculated in Gaussian 09 at the wB97X level of theory using the 6-31G(d) basis set. Configuration sets are divided by number of atoms per structure. |
10.60732/ac84253d |
C, Cl, F, H, N, O, S |
Kate Huddleston, Roman Zubatyuk, Justin Smith, Adrian Roitberg, Olexandr Isayev, Ignacio Pickering, Christian Devereux, Kipton Barros |
9650934 |
146725202 |
7 |
DFT-ωB97X |
Gaussian 09 |
128 |
||||
1090 structures uniformly selected from the MD/tfMC simulation during the training process of CGM-MLPs. This dataset was one of the datasets used in testing screening parameters during the process of producing an active learning dataset for Cu-C interactions for the purposes of exploring substrate-catalyzed deposition as a means of controllable synthesis of carbon nanomaterials. The combined dataset includes structures from the Carbon_GAP_20 dataset and additional configurations of carbon clusters on a Cu(111) surface. |
10.60732/535052eb |
C, Cu |
Di Zhang, Peiyun Yi, Xinmin Lai, Linfa Peng, Hao Li |
1091 |
362898 |
2 |
DFT-PBE+D3 |
CP2K |
128 |
||||
Configurations of Mg from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K. |
10.60732/f4d2f0b8 |
Mg |
Christopher M. Andolina, Wissam A. Saidi |
2938 |
57353 |
1 |
DFT-PBE |
VASP |
128 |
||||
A comprehensive DFT data set was generated for six elements - Li, Mo, Ni, Cu, Si, and Ge. These elements were chosen to span a variety of chemistries (main group metal, transition metal, and semiconductor), crystal structures (bcc, fcc, and diamond) and bonding types (metallic and covalent). This dataset comprises only the Ge configurations |
10.60732/4552d3fd |
Ge |
Yunxing Zuo, Chi Chen, Xiangguo Li, Zhi Deng, Yiming Chen, Jörg Behler, Gábor Csányi, Alexander V. Shapeev, Aidan P. Thompson, Mitchell A. Wood, Shyue Ping Ong |
228 |
14072 |
1 |
DFT-PBE |
VASP |
126 |
||||
The rattled-1000 training split of OMat24 (Open Materials 2024). OMat24 is a large-scale open dataset of density functional theory (DFT) calculations. The dataset is available in sub-datasets and subsampled sub-datasets based on the structure generation strategy used. There are two main splits in OMat24: train and validation, each divided into the aforementioned subsampling and sub-datasets. |
10.60732/6994f9f0 |
Ac, Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr |
Luis Barroso-Luque, Muhammed Shuaibi, Xiang Fu, Brandon M. Wood, Misko Dzamba, Meng Gao, Ammar Rizvi, C. Lawrence Zitnick, Zachary W. Ulissi |
11388475 |
161511768 |
89 |
DFT-PBE+U |
VASP |
126 |
||||
The Tropolone set of the QM-22 datasets. QM-22 consists of CHON molecules of 4-15 atoms, developed in counterpoint to the MD17 dataset, run at higher total energies (above 500 K) and with a broader configuration space. |
10.60732/b76ce2d6 |
C, H, O |
Joel M. Bowman, Chen Qu, Riccardo Conte, Apurba Nandi, Paul L. Houston, Qi Yu |
6768 |
101520 |
3 |
DFT-B3LYP |
Gaussian 16 |
125 |
||||
The aimd-from-PBE-1000-npt training split of OMat24 (Open Materials 2024). OMat24 is a large-scale open dataset of density functional theory (DFT) calculations. The dataset is available in subdatasets and subsampled sub-datasets based on the structure generation strategy used. There are two main splits in OMat24: train and validation, each divided into the aforementioned subsampling and sub-datasets. |
10.60732/25f16f85 |
Ac, Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr |
Luis Barroso-Luque, Muhammed Shuaibi, Xiang Fu, Brandon M. Wood, Misko Dzamba, Meng Gao, Ammar Rizvi, C. Lawrence Zitnick, Zachary W. Ulissi |
21269486 |
179930890 |
89 |
DFT-PBE+U |
VASP |
125 |
||||
Training set for magnetic Moment Tensor Potentials (mMTPs) that fit to magnetic forces for the bcc Fe-Al system. Contains 2632 configurations of 16-atom Fe-Al supercells with collinear atomic magnetic moments and magnetic forces (negative derivatives of energy with respect to magnetic moments, in eV/mu_B; zero for equilibrium magnetic moments). Configurations generated using constrained DFT (cDFT) with ABINIT and PAW PBE pseudopotentials with a 6x6x6 k-point mesh and 25 Hartree plane-wave cutoff energy. Fitting to magnetic forces is demonstrated to improve reliability of the fitted mMTPs compared to fitting only to energies and forces. mMTP ensembles with 2, 3, and 4 magnetic basis functions are evaluated for predicting Fe-Al properties at 0 K and lattice parameters at 300 K. Note: ColabFit dataset contains energy, atomic forces, and stress. Refer to the original files for per-atom magnetic moment and magnetic force data. |
Al, Fe |
Alexey S. Kotykhov, Konstantin Gubaev, Vadim Sotskov, Christian Tantardini, Max Hodapp, Alexander V. Shapeev, Ivan S. Novikov |
1018 |
42096 |
2 |
DFT-PBE |
ABINIT |
125 |
|||||
MSR-ACC/TAE25 (Microsoft Research Accurate Chemistry Collection, Total Atomization Energies 2025) provides 73,040 total atomization energies (TAEs) at the CCSD(T)/CBS level obtained with the W1-F12 composite wavefunction protocol implemented in Molpro 2024.1. This is the canonical training split comprising 71,871 molecules (99% of molecules remaining after removing overlap with the W4-17 and GMTKN55 benchmark sets).The dataset covers the chemical space of closed-shell, charge-neutral, covalently bound equilibrium molecular structures containing up to 5 non-hydrogen atoms drawn from elements H through Ar, excluding rare gases. Molecular structures were generated by exhaustive graph enumeration and degree-sequence sampling, then optimized through a cascade of GFN2-xTB, r2SCAN-3c, and B3LYP-D3(BJ)/def2-TZVPP levels of theory (ORCA). Structures were filtered to exclude those with significant multireference character (%TAE[(T)] > 6% at CCSD(T)/6-31G*), triplet electronic ground states, or dissociated fragments. The W1-F12 protocol includes Hartree-Fock extrapolation to the complete basis set limit (cc-pVDZ-F12 and cc-pVTZ-F12, alpha=5), CCSD-F12b correlation, perturbative triples delta(T) using jul-cc-pV(D+d)Z and jul-cc-pV(T+d)Z basis sets (alpha=3.22), and a core-valence correction using cc-pwCVTZ. The dataset spans 45.1% organic and 54.9% inorganic molecules and provides broader chemical diversity than comparable datasets such as GDB-9 or VQM24/DMC. Additional data available in the source files, including DFT atomization energies at approximately 90 levels of theory, singlet-triplet gaps, %TAE[(T)] multireference diagnostics, and W1-F12 energy components, can be downloaded from ColabFit Exchange. |
Al, B, Be, C, Cl, F, H, Li, Mg, N, Na, O, P, S, Si |
Sebastian Ehlert, Jan Hermann, Thijs Vogels, Victor Garcia Satorras, Stephanie Lanius, Marwin Segler, Klaas J.H. Giesbertz, Derk P. Kooi, Kenji Takeda, Chin-Wei Huang, Giulia Luise, Rianne van den Berg, Paola Gori-Giorgi, Amir Karton |
71871 |
532242 |
15 |
W1-F12/CCSD(T)-CBS |
Molpro 2024.1 |
124 |
|||||
A comprehensive DFT data set was generated for six elements - Li, Mo, Ni, Cu, Si, and Ge. These elements were chosen to span a variety of chemistries (main group metal, transition metal, and semiconductor), crystal structures (bcc, fcc, and diamond) and bonding types (metallic and covalent). This dataset comprises only the Cu configurations |
10.60732/49de06ae |
Cu |
Yunxing Zuo, Chi Chen, Xiangguo Li, Zhi Deng, Yiming Chen, Jörg Behler, Gábor Csányi, Alexander V. Shapeev, Aidan P. Thompson, Mitchell A. Wood, Shyue Ping Ong |
262 |
27416 |
1 |
DFT-PBE |
VASP |
124 |
||||
Configurations of azobenzene featuring a cis to trans thermal inversion through three channels: inversion, rotation, and rotation assisted by inversion; and configurations of glycine as a simpler comparison molecule. All calculations were performed in FHI-aims software using the Perdew-Burke-Ernzerhof (PBE) exchange-correlation functional with the Tkatchenko-Scheffler (TS) method to account for van der Waals (vdW) interactions. The azobenzene sets contain calculations from several different MD simulations, including two long simulations initialized at 300 K; short simulations (300 steps) initialized at 300 K and shorter (.5fs) timestep; four simulations, two starting from each of cis and trans isomer, at 750 K (initialized at 3000 K); and simulations at 50 K (initialized at 300 K). The glycine isomerization set was built using one MD simulation starting from each of two different minima. Initializatin and simulation temperature were 500 K. |
10.60732/71f8031b |
C, H, N, O |
Valentin Vassilev-Galindo, Gregory Fonseca, Igor Poltavsky, Alexandre Tkatchenko |
69174 |
1520162 |
4 |
DFT-PBE |
FHI-aims |
124 |
||||
A reference set of configurations of hydrogenated liquid and amorphous silicon from the datasets for Si-H-GAP. These configurations were used to evaluate training on a GAP model. |
10.60732/54bc0a93 |
H, Si |
Davis Unruh, Reza Vatan Meidanshahi, Stephen M. Goodnick, Gábor Csányi, Gergely T. Zimányi |
114 |
24895 |
2 |
DFT-PBE |
Quantum ESPRESSO |
118 |
||||
ANI-2x-wB97MV-def2TZVPP is a portion of the ANI-2x dataset, which includes DFT-calculated energies for structures from 2 to 63 atoms in size containing H, C, N, O, S, F, and Cl. This portion of ANI-2x was calculated at the WB97MV level of theory using the def2TZVPP basis set. Configuration sets are divided by number of atoms per structure. |
10.60732/5209eb00 |
C, Cl, F, H, N, O, S |
Kate Huddleston, Roman Zubatyuk, Justin Smith, Adrian Roitberg, Olexandr Isayev, Ignacio Pickering, Christian Devereux, Kipton Barros |
9649797 |
146703867 |
7 |
DFT-ωB97M-V |
ORCA 4.2.1 |
118 |
||||
COMP6v2-wB97MV-def2TZVPP is the portion of COMP6v2 calculated at the wB97MV/def2TZVPP level of theory. COmprehensive Machine-learning Potential (COMP6) Benchmark Suite version 2.0 is an extension of the COMP6 benchmark found in the following repository: https://github.com/isayev/COMP6. COMP6v2 is a data set of density functional properties for molecules containing H, C, N, O, S, F, and Cl. It is available at the following levels of theory: wB97X/631Gd (data used to train model in the ANI-2x paper); wB97MD3BJ/def2TZVPP; wB97MV/def2TZVPP; B973c/def2mTZVP. The 6 subsets from COMP6 (ANI-MD, DrugBank, GDB07to09, GDB10to13 Tripeptides, and s66x8) are contained in each of the COMP6v2 datasets corresponding to the above levels of theory. |
10.60732/e98b76e7 |
C, Cl, F, H, N, O, S |
Kate Huddleston, Roman Zubatyuk, Justin Smith, Adrian Roitberg, Olexandr Isayev, Ignacio Pickering, Christian Devereux, Kipton Barros |
156338 |
3786615 |
7 |
DFT-ωB97M-V |
ORCA 4.2.1 |
117 |
||||
This dataset was generated using the following active learning scheme: 1) candidate structures were relaxed by a partially-trained MTP model, 2) structures for which the MTP had to perform extrapolation were passed to DFT to be re-computed, 3) the MTP was retrained, including the structures that were re-computed with DFT, 4) steps 1-3 were repeated until the MTP no longer extrapolated on any of the original candidate structures. The original candidate structures for this dataset included about 375,000 binary and ternary structures, enumerating all possible unit cells with different symmetries (BCC, FCC, and HCP) and different number of atoms. |
10.60732/7b56ca82 |
Al, Ni, Ti |
Konstantin Gubaev, Evgeny V. Podryabinkin, Gus L.W. Hart, Alexander V. Shapeev |
2666 |
24851 |
3 |
DFT-undefined |
VASP |
117 |
||||
Configurations of nitrophenol from WS22. The WS22 database combines Wigner sampling with geometry interpolation to generate 1.18 million molecular geometries equally distributed into 10 independent datasets of flexible organic molecules with varying sizes and chemical complexity. In addition to the potential energy and forces required to construct potential energy surfaces, the WS22 database provides several other quantum chemical properties, all obtained via single-point calculations for each molecular geometry. All quantum chemical calculations were performed with the Gaussian 09 program. |
10.60732/d91bf8fd |
C, H, N, O |
Max Pinheiro Jr, Shuang Zhang, Pavlo O. Dral, Mario Barbatti |
119995 |
1799925 |
4 |
DFT-PBE0 |
Gaussian 09 |
116 |
||||
The aimd-from-PBE-1000-nvt training split of OMat24 (Open Materials 2024). OMat24 is a large-scale open dataset of density functional theory (DFT) calculations. The dataset is available in sub-datasets and subsampled sub-datasets based on the structure generation strategy used. There are two main splits in OMat24: train and validation, each divided into the aforementioned subsampling and sub-datasets. |
10.60732/de0e0690 |
Ac, Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr |
Luis Barroso-Luque, Muhammed Shuaibi, Xiang Fu, Brandon M. Wood, Misko Dzamba, Meng Gao, Ammar Rizvi, C. Lawrence Zitnick, Zachary W. Ulissi |
20256650 |
169879539 |
86 |
DFT-PBE+U |
VASP |
115 |
||||
A comprehensive DFT data set was generated for six elements - Li, Mo, Ni, Cu, Si, and Ge. These elements were chosen to span a variety of chemistries (main group metal, transition metal, and semiconductor), crystal structures (bcc, fcc, and diamond) and bonding types (metallic and covalent). This dataset comprises only the Si configurations |
10.60732/e16c3975 |
Si |
Yunxing Zuo, Chi Chen, Xiangguo Li, Zhi Deng, Yiming Chen, Jörg Behler, Gábor Csányi, Alexander V. Shapeev, Aidan P. Thompson, Mitchell A. Wood, Shyue Ping Ong |
25 |
1525 |
1 |
DFT-PBE |
VASP |
113 |
||||
ANI-2x-wB97MD3BJ-def2TZVPP is a portion of the ANI-2x dataset, which includes DFT-calculated energies for structures from 2 to 63 atoms in size containing H, C, N, O, S, F, and Cl. This portion of ANI-2x was calculated in ORCA at the wB97M level of theory with D3 and BJ energy corrections, using the def2-TZVPP basis set. Configuration sets are divided by number of atoms per structure. Uncorrected SCF energy values and dipoles are recorded in the metadata. |
10.60732/5bd01ed9 |
C, Cl, F, H, N, O, S |
Kate Huddleston, Roman Zubatyuk, Justin Smith, Adrian Roitberg, Olexandr Isayev, Ignacio Pickering, Christian Devereux, Kipton Barros |
9649788 |
146703426 |
7 |
DFT-ωB97M+D3(BJ) |
ORCA 4.2.1 |
113 |
||||
This dataset was created to investigate the role of surface water and hydroxyl groups in facilitating spontaneous CO₂ activation at Cu⁺ sites and the formation of monodentate formate species in the context of using CO2 hydrogenation to produce methanol. |
10.60732/cea60472 |
C, Cu, H, Mg, O |
Estefanía Fernández Villanueva, Pablo Germán Lustemberg, Minjie Zhao, Jose Soriano, Patricia Concepción, María Verónica Ganduglia Pirovano |
14955 |
1043206 |
5 |
DFT-PBE+D3 |
VASP 6.3.0 |
112 |
||||
Approximately 7,600 configurations of Ag used as part of a training dataset for a DP-GEN-based ML model for a Ag-Au nanoalloy potential. |
10.60732/93adbc95 |
Ag |
Yinan Wang, Xiaoyang Wang, Linfeng Zhang, Ben Xu, Han Wang |
7589 |
152114 |
1 |
DFT-PBE+D3 |
VASP |
111 |
||||
Configurations of Au from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K. |
10.60732/7c0dfbc8 |
Au |
Christopher M. Andolina, Wissam A. Saidi |
3585 |
89006 |
1 |
DFT-PBE |
VASP |
111 |
||||
MSR-ACC/TAE25 (Microsoft Research Accurate Chemistry Collection, Total Atomization Energies 2025) provides 73,040 total atomization energies (TAEs) at the CCSD(T)/CBS level obtained with the W1-F12 composite wavefunction protocol implemented in Molpro 2024.1. This is the canonical validation split comprising 730 molecules (1% of molecules remaining after removing overlap with the W4-17 and GMTKN55 benchmark sets).The dataset covers the chemical space of closed-shell, charge-neutral, covalently bound equilibrium molecular structures containing up to 5 non-hydrogen atoms drawn from elements H through Ar, excluding rare gases. Molecular structures were generated by exhaustive graph enumeration and degree-sequence sampling, then optimized through a cascade of GFN2-xTB, r2SCAN-3c, and B3LYP-D3(BJ)/def2-TZVPP levels of theory (ORCA). Structures were filtered to exclude those with significant multireference character (%TAE[(T)] > 6% at CCSD(T)/6-31G*), triplet electronic ground states, or dissociated fragments. The W1-F12 protocol includes Hartree-Fock extrapolation to the complete basis set limit (cc-pVDZ-F12 and cc-pVTZ-F12, alpha=5), CCSD-F12b correlation, perturbative triples delta(T) using jul-cc-pV(D+d)Z and jul-cc-pV(T+d)Z basis sets (alpha=3.22), and a core-valence correction using cc-pwCVTZ. The dataset spans 45.1% organic and 54.9% inorganic molecules and provides broader chemical diversity than comparable datasets such as GDB-9 or VQM24/DMC. Additional data available in the source files, including DFT atomization energies at approximately 90 levels of theory, singlet-triplet gaps, %TAE[(T)] multireference diagnostics, and W1-F12 energy components, can be downloaded from ColabFit Exchange. |
Al, B, Be, C, Cl, F, H, Li, Mg, N, Na, O, P, S, Si |
Sebastian Ehlert, Jan Hermann, Thijs Vogels, Victor Garcia Satorras, Stephanie Lanius, Marwin Segler, Klaas J.H. Giesbertz, Derk P. Kooi, Kenji Takeda, Chin-Wei Huang, Giulia Luise, Rianne van den Berg, Paola Gori-Giorgi, Amir Karton |
730 |
5444 |
15 |
W1-F12/CCSD(T)-CBS |
Molpro 2024.1 |
110 |
|||||
OC20_S2EF_train_200K is the 200K training split of the OC20 Structure to Energy and Forces (S2EF) task. |
10.60732/6ccdeb1d |
Ag, Al, As, Au, B, Bi, C, Ca, Cd, Cl, Co, Cr, Cs, Cu, Fe, Ga, Ge, H, Hf, Hg, In, Ir, K, Mn, Mo, N, Na, Nb, Ni, O, Os, P, Pb, Pd, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Tc, Te, Ti, Tl, V, W, Y, Zn, Zr |
Lowik Chanussot, Abhishek Das, Siddharth Goyal, Thibaut Lavril, Muhammed Shuaibi, Morgane Riviere, Kevin Tran, Javier Heras-Domingo, Caleb Ho, Weihua Hu, Aini Palizhati, Anuroop Sriram, Brandon Wood, Junwoong Yoon, Devi Parikh, C. Lawrence Zitnick, Zachary Ulissi |
200000 |
14631937 |
56 |
DFT-rPBE |
VASP |
109 |
||||
Configurations of Sr from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K. |
10.60732/69de7b6b |
Sr |
Christopher M. Andolina, Wissam A. Saidi |
3037 |
48387 |
1 |
DFT-PBE |
VASP |
109 |
||||
The Acetaldehyde (triplet) set of the QM-22 datasets, with energies calculated at the CCSD(T) level of theory. QM-22 consists of CHON molecules of 4-15 atoms, developed in counterpoint to the MD17 dataset, run at higher total energies (above 500 K) and with a broader configuration space. |
10.60732/d66c9888 |
C, H, O |
Bina Fu, Yong-Chang Han, Joel M. Bowman, Luca Angelucci, Nadia Balucani, Francesca Leonori, Piergiorgio Casavecchia |
51530 |
360710 |
3 |
CCSD(T) |
MOLPRO |
108 |
||||
ANI-2x-wB97X-def2TZVPP is a portion of the ANI-2x dataset, which includes DFT-calculated energies for structures from 2 to 63 atoms in size containing H, C, N, O, S, F, and Cl. This portion of ANI-2x was calculated in ORCA at the wB97X level of theory using the def2TZVPP basis set. Configuration sets are divided by number of atoms per structure. Dipoles are recorded in the metadata. |
10.60732/61569e2c |
C, Cl, F, H, N, O, S |
Christian Devereux, Justin S. Smith, Kate K. Huddleston, Kipton Barros, Roman Zubatyuk, Olexandr Isayev, Adrian E. Roitberg |
8481522 |
127828812 |
7 |
DFT-ωB97X |
ORCA 4.2.1 |
107 |
||||
A comprehensive DFT data set was generated for six elements - Li, Mo, Ni, Cu, Si, and Ge. These elements were chosen to span a variety of chemistries (main group metal, transition metal, and semiconductor), crystal structures (bcc, fcc, and diamond) and bonding types (metallic and covalent). This dataset comprises only the Cu configurations |
10.60732/7c69274d |
Cu |
Yunxing Zuo, Chi Chen, Xiangguo Li, Zhi Deng, Yiming Chen, Jörg Behler, Gábor Csányi, Alexander V. Shapeev, Aidan P. Thompson, Mitchell A. Wood, Shyue Ping Ong |
31 |
3178 |
1 |
DFT-PBE |
VASP |
107 |
||||
Configurations of Cu from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K. |
10.60732/e0a72dd8 |
Cu |
Christopher M. Andolina, Wissam A. Saidi |
3355 |
96328 |
1 |
DFT-PBE |
VASP |
107 |
||||
Approximately 46,000 configurations of copper, including small and bulk structures, surfaces, interfaces, point defects, and randomly modified variants. Also includes structures with displaced or missing atoms. |
10.60732/c712b78a |
Cu |
Yury Lysogorskiy, Cas van der Oord, Anton Bochkarev, Sarath Menon, Matteo Rinaldi, Thomas Hammerschmidt, Matous Mrovec, Aidan Thompson, Gábor Csányi, Christoph Ortner, Ralf Drautz |
46327 |
307430 |
1 |
DFT-PBE |
FHI-aims |
107 |
||||
Configurations of Pb from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K. |
10.60732/bddd3245 |
Pb |
Christopher M. Andolina, Wissam A. Saidi |
5254 |
117186 |
1 |
DFT-PBE |
VASP |
106 |
||||
129 molecules of composition C7O2H10 from the QM9 dataset with 5000 conformational geometries apiece. Molecular dynamics data was simulated using the Fritz-Haber Institute ab initio simulation software. |
10.60732/ad0a0039 |
C, H, O |
Jonathan Vandermause, Yu Xie, Jin Soo Lim, Cameron J. Owen, Boris Kozinsky |
640791 |
12175029 |
3 |
DFT-PBE+TS |
FHI-aims |
105 |
||||
AIMNet2(2025) is the extended training dataset for the AIMNet2 (second generation atoms-in-molecules network) neural network interatomic potential, curated to improve the model's description of noncovalent interactions (NCIs) including hydrogen bonding, pi-pi stacking, dispersion, sigma-hole, ionic, and electrostatic contacts. The dataset covers neutral and charged closed-shell molecular systems composed of up to 14 non-metal elements (H, B, C, N, O, F, Si, P, S, Cl, As, Se, Br, I) with up to 193 atoms per system. Structures were drawn from three complementary sources: (a) molecular geometries from SPICE v2.0.1 (solvated systems, amino acid-ligand pairs, water clusters) and the CREMP dataset (macrocyclic peptides); (b) small neutral and charged molecules from PubChem sampled via normal mode sampling and metadynamics-guided geometry exploration; (c) dimer geometries assembled from Cambridge Structural Database (CSD) monomers (up to 14 supported elements, fewer than 200 atoms) and pre-optimized with AIMNet2-wB97M-D3(2023) to remove steric clashes while preserving configurational diversity. All quantum chemical calculations used ORCA 6.0.1 with the composite B97-3c DFT functional under restricted Kohn-Sham (RKS) formalism. SCF convergence was enforced with TightSCF and SlowConv; RIJCOSX integral acceleration and DEFGRID2 integration grid were applied throughout. AIMNet2(2025) was initialized from AIMNet2(2023) weights and continually pretrained on this dataset without weight freezing or regularization, using a multi-task loss over energy (w=1.0), forces (w=0.2), and Hirshfeld partial charges (w=0.5). |
As, B, Br, C, Cl, F, H, I, N, O, P, S, Se, Si |
Kamal Singh Nayal, Ilkwon Cho, Runtian Nick Gao, Peikun Zheng, Olexandr Isayev |
3764666 |
130288462 |
14 |
DFT-B97-3c |
ORCA 6.0.1 |
105 |
|||||
The H2CO/HCOH set of the QM-22 datasets, representing the isomerization of formaldehyde to cis and trans-hydroxycarbene, with energies calculated at the CCSD(T) level of theory. QM-22 consists of CHON molecules of 4-15 atoms, developed in counterpoint to the MD17 dataset, run at higher total energies (above 500 K) and with a broader configuration space. |
10.60732/c04a4e90 |
C, H, O |
Chen Qu, Qi Yu, Brian L. Van Hoozen Jr, Joel M. Bowman, Rodrigo A. Vargas-Hernández |
34750 |
139000 |
3 |
MRCI |
MOLPRO |
103 |
||||
The Hydronium set of the QM-22 datasets, with energies calculated at the CCSD(T)/MRCI level of theory. QM-22 consists of CHON molecules of 4-15 atoms, developed in counterpoint to the MD17 dataset, run at higher total energies (above 500 K) and with a broader configuration space. |
10.60732/cd74ffdf |
H, O |
Chen Qu, Qi Yu, Brian L. Van Hoozen Jr, Joel M. Bowman, Rodrigo A. Vargas-Hernández |
32141 |
128564 |
2 |
CCSD(T), MRCI |
MOLPRO |
103 |
||||
The train split of the dataset Alex_MP-20. This dataset contains structures from the Alexandria (Schmidt et al. 2022) and MP-20 (Materials Project 2020) datasets. Data has been modified as follows: Exclude structures containing the elements Tc, Pm, or any element with atomic number 84 or higher. Relax structures with DFT using a PBE functional in order to have consistent energies. For the training set, remove any structure with more than 20 atoms inside the unit cell. For the training set, remove any structure with energy above the hull higher than 0.1 eV/atom. |
10.60732/8d6afc67 |
Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, Hf, Hg, Ho, I, In, Ir, K, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, O, Os, P, Pb, Pd, Pr, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Te, Ti, Tl, Tm, V, W, Y, Yb, Zn, Zr |
Claudio Zeni, Robert Pinsler, Daniel Zügner, Andrew Fowler, Matthew Horton, Xiang Fu, Zilong Wang, Aliaksandra Shysheya, Jonathan Crabbé, Shoko Ueda, Roberto Sordillo, Lixin Sun, Jake Smith, Bichlien Nguyen, Hannes Schulz, Sarah Lewis, Chin-Wei Huang, Ziheng Lu, Yichi Zhou, Han Yang, Hongxia Hao, Jielan Li, Chunlei Yang, Wenjie Li, Ryota Tomioka, Tian Xie |
540162 |
5184565 |
76 |
DFT-PBE |
VASP |
103 |
||||
Configurations of Ti from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K. |
10.60732/f08eba7c |
Ti |
Christopher M. Andolina, Wissam A. Saidi |
5436 |
148209 |
1 |
DFT-PBE |
VASP |
103 |
||||
The dataset consists of energies and forces for monolayer graphene, bilayer graphene, graphite, and diamond in various states, including strained static structures and configurations drawn from ab initio MD trajectories. A total number of 4788 configurations was generated from DFT calculations using the Vienna Ab initio Simulation Package (VASP). The energies and forces are stored in the extended XYZ format. One file for each configuration. |
10.60732/e65112ef |
C |
Mingjian Wen, Ellad B. Tadmor |
4769 |
228396 |
1 |
DFT-PBE+MBD |
VASP |
102 |
||||
The JARVIS-C2DB dataset is part of the joint automated repository for various integrated simulations (JARVIS) database. This subset contains configurations from the Computational 2D Database (C2DB), which contains a variety of properties for 2-dimensional materials across more than 30 differentcrystal structures. JARVIS is a set of tools and datasets built to meet current materials design challenges. |
10.60732/37c26dae |
Ag, Al, As, Au, B, Ba, Bi, Br, C, Ca, Cd, Cl, Co, Cr, Cs, Cu, F, Fe, Ga, Ge, H, Hf, Hg, I, In, Ir, K, Li, Mg, Mn, Mo, N, Na, Nb, Ni, O, Os, P, Pb, Pd, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Te, Ti, Tl, V, W, Y, Zn, Zr |
Sten Haastrup, Mikkel Strange, Mohnish Pandey, Thorsten Deilmann, Per S Schmidt, Nicki F Hinsche, Morten N Gjerding, Daniele Torelli, Peter M Larsen, Anders C Riis-Jensen, Jakob Gath, Karsten W Jacobsen, Jens Jørgen Mortensen, Thomas Olsen, Kristian S Thygesen |
3520 |
17990 |
61 |
DFT-PBE |
GPAW |
102 |
||||
Training set for magnetic Moment Tensor Potentials (mMTPs) for the bcc Fe-Al system. Contains 2012 configurations of 16-atom Fe-Al supercells with collinear atomic magnetic moments. Configurations were generated using constrained DFT (cDFT) with ABINIT and PAW PBE pseudopotentials with a 6x6x6 k-point mesh and 25 Hartree plane-wave cutoff energy. The fitted mMTPs (with 2 magnetic basis functions) predict formation energy, lattice parameters, and total magnetic moments of bcc Fe-Al at 0 K across varying Al concentrations. Note: ColabFit dataset contains energy, atomic forces, and stress. Refer to the original files for per-atom magnetic moment data. |
Al, Fe |
Alexey S. Kotykhov, Konstantin Gubaev, Max Hodapp, Christian Tantardini, Alexander V. Shapeev, Ivan S. Novikov |
434 |
32192 |
2 |
DFT-PBE |
ABINIT |
101 |
|||||
The Malonaldehyde set of the QM-22 datasets, with energies calculated at the CCSD(T) level of theory. QM-22 consists of CHON molecules of 4-15 atoms, developed in counterpoint to the MD17 dataset, run at higher total energies (above 500 K) and with a broader configuration space. |
10.60732/e77ca63e |
C, H, O |
Yimin Wang, Bastiaan J. Braams, Joel M. Bowman, Stuart Carter, David P. Tew |
11145 |
100305 |
3 |
CCSD(T) |
MOLPRO |
101 |
||||
127,000 configurations from a dataset used to benchmark and train a modified DeePMD model called DeepPot-SE, or Deep Potential - Smooth Edition |
10.60732/d5518670 |
Al, C, Co, Cr, Cu, Fe, Ge, H, Mn, Mo, N, Ni, O, Pt, S, Si, Ti |
Linfeng Zhang, Jiequn Han, Han Wang, Wissam A. Saidi, Roberto Car, Weinan E |
126631 |
26210897 |
17 |
DFT-PBE |
CP2K, Quantum ESPRESSO |
101 |
||||
The Ethanol set of the QM-22 datasets. QM-22 consists of CHON molecules of 4-15 atoms, developed in counterpoint to the MD17 dataset, run at higher total energies (above 500 K) and with a broader configuration space. |
10.60732/b52743ef |
C, H, O |
Joel M. Bowman, Chen Qu, Riccardo Conte, Apurba Nandi, Paul L. Houston, Qi Yu |
11011 |
99099 |
3 |
DFT-B3LYP |
Gaussian 16 |
100 |
||||
Validation configurations of bulk water from HO_LiMoNiTi_NPJCM_2020 used in the training of an ANN, whereby total energy is extrapolated by a Taylor expansion as a means of reducing computational costs. |
10.60732/142b62c8 |
H, O |
April M. Cooper, Johannes Kästner, Alexander Urban, Nongnuch Artrith |
2112 |
405504 |
2 |
DFT-revPBE+D3 |
VASP |
100 |
||||
ANI-1xBB is a dataset of approximately 13.1 million nonequilibrium conformers of small organic molecules (H, C, N, O only; up to 7 heavy atoms; up to 23 atoms total), designed to support the training of reactive machine learning interatomic potentials. Single-point quantum chemistry properties were computed at three electronic temperatures (T_el = 0, 1000, and 5000 K) using B97-3c composite DFT in ORCA 4.2.1 via finite-temperature DFT (Fermi smearing). All geometries were treated as closed-shell (charge = 0, mult = 1); Fermi smearing at T_el = 5000 K approximates the superposition of closed- and open-shell states during bond dissociation and is the primary labeling scheme used for model training in the associated publication. This dataset contains the T_el = 5000 K (b973c_etemp5000) energies and forces; data at T_el = 0 K and 1000 K are available in the original source files. Configuration sets represent: constrained geometry optimization steps (snap_source='opt', ~9% of data) and fixed-distance NVT MD snapshots (snap_source='md', ~91% of data). |
C, H, N, O |
Shuhao Zhang, Roman Zubatyuk, Yinuo Yang, Adrian Roitberg, Olexandr Isayev |
13144877 |
184872744 |
4 |
DFT-B97-3c |
ORCA 4.2.1 |
100 |
|||||
Training and testing configurations of bulk water from HO_LiMoNiTi_NPJCM_2020 used in the training of an ANN, whereby total energy is extrapolated by a Taylor expansion as a means of reducing computational costs. |
10.60732/7f3ffd0b |
H, O |
April M. Cooper, Johannes Kästner, Alexander Urban, Nongnuch Artrith |
700 |
134400 |
2 |
DFT-revPBE+D3 |
VASP |
99 |
||||
158,000 diverse atomic environments of elemental tungsten.Includes DFT-PBE energies, forces and stresses for tungsten; periodic unit cells in the range of 1-135 atoms, including bcc primitive cell, 128-atom bcc cell, vacancies, low index surfaces, gamma-surfaces, and dislocation cores. |
10.60732/8d093f34 |
W |
Wojciech J. Szlachta, Albert P. Bartók, Gábor Csányi |
9471 |
158304 |
1 |
DFT-PBE |
CASTEP 6.01 |
99 |
||||
COMP6v2-wB97MD3BJ-def2TZVPP is the portion of COMP6v2 calculated at the wB97MD3BJ/def2TZVPP level of theory. COmprehensive Machine-learning Potential (COMP6) Benchmark Suite version 2.0 is an extension of the COMP6 benchmark found in the following repository: https://github.com/isayev/COMP6. COMP6v2 is a data set of density functional properties for molecules containing H, C, N, O, S, F, and Cl. It is available at the following levels of theory: wB97X/631Gd (data used to train model in the ANI-2x paper); wB97MD3BJ/def2TZVPP; wB97MV/def2TZVPP; B973c/def2mTZVP. The 6 subsets from COMP6 (ANI-MD, DrugBank, GDB07to09, GDB10to13 Tripeptides, and s66x8) are contained in each of the COMP6v2 datasets corresponding to the above levels of theory. |
10.60732/19db27ec |
C, Cl, F, H, N, O, S |
Kate Huddleston, Roman Zubatyuk, Justin Smith, Adrian Roitberg, Olexandr Isayev, Ignacio Pickering, Christian Devereux, Kipton Barros |
156353 |
3787055 |
7 |
DFT-ωB97M-V |
ORCA 4.2.1 |
98 |
||||
Configurations of acrolein from WS22. The WS22 database combines Wigner sampling with geometry interpolation to generate 1.18 million molecular geometries equally distributed into 10 independent datasets of flexible organic molecules with varying sizes and chemical complexity. In addition to the potential energy and forces required to construct potential energy surfaces, the WS22 database provides several other quantum chemical properties, all obtained via single-point calculations for each molecular geometry. All quantum chemical calculations were performed with the Gaussian 09 program. |
10.60732/0f9d02a8 |
C, H, O |
Max Pinheiro Jr, Shuang Zhang, Pavlo O. Dral, Mario Barbatti |
119993 |
959944 |
3 |
DFT-PBE0 |
Gaussian 09 |
96 |
||||
The N-methyl acetamide set of the QM-22 datasets. QM-22 consists of CHON molecules of 4-15 atoms, developed in counterpoint to the MD17 dataset, run at higher total energies (above 500 K) and with a broader configuration space. |
10.60732/89997b6f |
C, H, N, O |
Apurba Nandi, Chen Qu, Joel M. Bowman |
6607 |
79284 |
4 |
DFT-B3LYP |
MOLPRO |
95 |
||||
The test set of a train/test pair from the aspirin dataset from sGDML. To create the coupled cluster datasets, the data used for training the models were created by running abinitio MD in the NVT ensemble using the Nosé-Hoover thermostat at 500 K during a 200 ps simulation with a resolution of 0.5 fs. Energies and forces were recalculated by all-electron coupled cluster with single, double and perturbative triple excitations (CCSD(T)). The Dunning correlation-consistent basis set CCSD/cc-pVDZ was used for aspirin. All calculations were performed with the Psi4 software suite. |
10.60732/083a6253 |
C, H, O |
Stefan Chmiela, Huziel E. Sauceda, Klaus-Robert Müller, Alexandre Tkatchenko |
500 |
10500 |
3 |
CCSD |
Psi4 |
95 |
||||
A comprehensive DFT data set was generated for six elements - Li, Mo, Ni, Cu, Si, and Ge. These elements were chosen to span a variety of chemistries (main group metal, transition metal, and semiconductor), crystal structures (bcc, fcc, and diamond) and bonding types (metallic and covalent). This dataset comprises only the Si configurations |
10.60732/c2471ffc |
Si |
Yunxing Zuo, Chi Chen, Xiangguo Li, Zhi Deng, Yiming Chen, Jörg Behler, Gábor Csányi, Alexander V. Shapeev, Aidan P. Thompson, Mitchell A. Wood, Shyue Ping Ong |
214 |
13233 |
1 |
DFT-PBE |
VASP |
94 |
||||
The JARVIS-QM9-DGL dataset is part of the joint automated repository for various integrated simulations (JARVIS) database. This dataset contains configurations from the QM9 dataset, originally created as part of the datasets at quantum-machine.org, as implemented with the Deep Graph Library (DGL) Python package. Units for r2 (electronic spatial extent) are a0^2; for alpha (isotropic polarizability), a0^3; for mu (dipole moment), D; for Cv (heat capacity), cal/mol K. Units for all other properties are eV. JARVIS is a set of tools and collected datasets built to meet current materials design challenges. |
10.60732/403cd4f2 |
C, F, H, N, O |
Raghunathan Ramakrishnan, Pavlo O. Dral, Matthias Rupp, O. Anatole von Lilienfeld |
130831 |
2358210 |
5 |
DFT-B3LYP |
Gaussian 09 |
93 |
||||
The OCHCO cation set of the QM-22 datasets, with energies calculated at the CCSD(T) level of theory. QM-22 consists of CHON molecules of 4-15 atoms, developed in counterpoint to the MD17 dataset, run at higher total energies (above 500 K) and with a broader configuration space. |
10.60732/8f92dba5 |
C, H, O |
Chen Qu, Qi Yu, Brian L. Van Hoozen Jr, Joel M. Bowman, Rodrigo A. Vargas-Hernández |
7800 |
39000 |
3 |
CCSD(T) |
MOLPRO |
92 |
||||
This is the dataset from npj Comp. Mater 7, 12 (2021), 'Predicting stable crystalline compounds using chemical similarity'. Stable crystal structure compositions of up to 12 atoms were gathered from the Materials Project database. These structures were mutated by replacing all of a given element with a similar element (see publication for details). |
10.60732/b9e7eedf |
Ac, Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, Hf, Hg, Ho, I, In, Ir, K, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr |
Hai-Chen Wang, Silvana Botti, Miguel A. L. Marques |
219310 |
1711271 |
85 |
DFT-PBE |
VASP |
92 |
||||
A comprehensive DFT data set was generated for six elements - Li, Mo, Ni, Cu, Si, and Ge. These elements were chosen to span a variety of chemistries (main group metal, transition metal, and semiconductor), crystal structures (bcc, fcc, and diamond) and bonding types (metallic and covalent). This dataset comprises only the Li configurations |
10.60732/d8a6d50c |
Li |
Yunxing Zuo, Chi Chen, Xiangguo Li, Zhi Deng, Yiming Chen, Jörg Behler, Gábor Csányi, Alexander V. Shapeev, Aidan P. Thompson, Mitchell A. Wood, Shyue Ping Ong |
29 |
1320 |
1 |
DFT-PBE |
VASP |
91 |
||||
Training set from COLL. Consists of configurations taken from molecular collisions of different small organic molecules. Energies and forces for 140,000 random snapshots taken from these trajectories were recomputed with density functional theory (DFT). These calculations were performed with the revPBE functional and def2-TZVP basis, including D3 dispersion corrections |
10.60732/f95867ef |
C, H, O |
Johannes Gasteiger, Shankari Giri, Johannes T. Margraf, Stephan Günnemann |
119965 |
1225234 |
3 |
DFT-revPBE+D3 |
ORCA |
91 |
||||
This dataset was originally designed to fit a GAP model for the Mo-Nb-Ta-V-W quinary system that was used to study segregation and defects in the body-centered-cubic refractory high-entropy alloy MoNbTaVW. |
10.60732/00dc545a |
Mo, Nb, Ta, V, W |
Jesper Byggmästar, Kai Nordlund, Flyura Djurabekova |
2329 |
127913 |
5 |
DFT-PBE |
VASP |
91 |
||||
The Methane set of the QM-22 datasets. QM-22 consists of CHON molecules of 4-15 atoms, developed in counterpoint to the MD17 dataset, run at higher total energies (above 500 K) and with a broader configuration space. |
10.60732/ca55415d |
C, H |
Apurba Nandi, Chen Qu, Joel M. Bowman |
9000 |
45000 |
2 |
DFT-B3LYP |
MOLPRO |
89 |
||||
A comprehensive DFT data set was generated for six elements - Li, Mo, Ni, Cu, Si, and Ge. These elements were chosen to span a variety of chemistries (main group metal, transition metal, and semiconductor), crystal structures (bcc, fcc, and diamond) and bonding types (metallic and covalent). This dataset comprises only the Mo configurations |
10.60732/3db3283a |
Mo |
Yunxing Zuo, Chi Chen, Xiangguo Li, Zhi Deng, Yiming Chen, Jörg Behler, Gábor Csányi, Alexander V. Shapeev, Aidan P. Thompson, Mitchell A. Wood, Shyue Ping Ong |
23 |
1189 |
1 |
DFT-PBE |
VASP |
89 |
||||
The JARVIS_OMDB dataset is part of the joint automated repository for various integrated simulations (JARVIS) database. This dataset contains configurations from the Organic Materials Database (OMDB): a dataset of 12,500 crystal materials for the purpose of training models for the prediction of properties for complex and lattice-periodic organic crystals with large numbers of atoms per unit cell. Dataset covers 69 space groups, 65 elements; averages 82 atoms per unit cell. This dataset also includes classical force-field inspired descriptors (CFID) for each configuration. JARVIS is a set of tools and collected datasets built to meet current materials design challenges. |
10.60732/a375b3dc |
Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Cl, Co, Cr, Cs, Cu, F, Fe, Ga, Ge, H, Hf, Hg, I, In, Ir, K, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Ni, O, Os, P, Pb, Pd, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Te, Ti, Tl, U, V, W, Y, Zn, Zr |
Bart Olsthoorn, R. Matthias Geilhufe, Stanislav S. Borysov, Alexander V. Balatsky |
12497 |
1061362 |
65 |
DFT-PBE |
VASP |
89 |
||||
A comprehensive DFT data set was generated for six elements - Li, Mo, Ni, Cu, Si, and Ge. These elements were chosen to span a variety of chemistries (main group metal, transition metal, and semiconductor), crystal structures (bcc, fcc, and diamond) and bonding types (metallic and covalent). This dataset comprises only the Li configurations |
10.60732/63ab9206 |
Li |
Yunxing Zuo, Chi Chen, Xiangguo Li, Zhi Deng, Yiming Chen, Jörg Behler, Gábor Csányi, Alexander V. Shapeev, Aidan P. Thompson, Mitchell A. Wood, Shyue Ping Ong |
241 |
11576 |
1 |
DFT-PBE |
VASP |
88 |
||||
Dataset from "Surface segregation in high-entropy alloys from alchemical machine learning: dataset HEA25S". Includes 10000 bulk HEA structures (Dataset O), 2640 HEA surface slabs (Dataset A), together with 1000 bulk and 1000 surface slabs snapshots from the molecular dynamics (MD) runs (Datasets B and C), and 500 MD snapshots of the 25 elements Cantor-style alloy surface slabs. These splits, along with their respective train, test, and validation splits, are included as configuration sets. |
10.60732/3c5c6e72 |
Ag, Au, Co, Cr, Cu, Fe, Hf, Ir, Lu, Mn, Mo, Nb, Ni, Pd, Pt, Rh, Ru, Sc, Ta, Ti, V, W, Y, Zn, Zr |
Arslan Mazitov, Maximilian A. Springer, Nataliya Lopanitsyna, Guillaume Fraux, Sandip De, Michele Ceriotti |
15004 |
633387 |
25 |
DFT-PBEsol |
VASP |
88 |
||||
The validation set from OMol25. From the dataset creator: OMol25 represents the largest high quality molecular DFT dataset spanning biomolecules, metal complexes, electrolytes, and community datasets. OMol25 was generated at the ω B97M-V/def2-TZVPD level of theory. |
10.60732/8baea040 |
Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, O, Os, P, Pb, Pd, Pm, Pr, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Ti, Tl, Tm, V, W, Xe, Y, Yb, Zn, Zr |
Daniel S. Levine, Muhammed Shuaibi, Evan Walter Clark Spotte-Smith, Michael G. Taylor, Muhammad R. Hasyim, Kyle Michel, Ilyes Batatia, Gábor Csányi, Misko Dzamba, Peter Eastman, Nathan C. Frey, Xiang Fu, Vahe Gharakhanyan, Aditi S. Krishnapriyan, Joshua A. Rackers, Sanjeev Raja, Ammar Rizvi, Andrew S. Rosen, Zachary Ulissi, Santiago Vargas, C. Lawrence Zitnick, Samuel M. Blau, Brandon M. Wood |
2762021 |
283298012 |
83 |
DFT-ωB97M-V |
ORCA |
87 |
||||
Hessian QM9 is the first database of equilibrium configurations and numerical Hessian matrices, consisting of 41,645 molecules from the QM9 dataset at the wB97x/6-31G* level. Molecular Hessians were calculated in vacuum, as well as in water, tetrahydrofuran, and toluene using an implicit solvation model. |
10.60732/e8c8e0eb |
C, F, H, N, O |
Nicholas J. Williams, Lara Kabalan, Ljiljana Stojanovic, Viktor Zólyomi, Edward O. Pyzer-Knapp |
166580 |
3063848 |
5 |
DFT-ωB97X |
NWChem |
86 |
||||
The test set from the doped CsPbI3 energetics dataset. This dataset was created to explore the effect of Cd and Pb substitutions on the structural stability of inorganic lead halide perovskite CsPbI3. CsPbI3 undergoes a direct to indirect band-gap phase transition at room temperature. The dataset contains configurations of CsPbI3 with low levels of Cd and Zn, which were used to train a GNN model to predict the energetics of structures with higher levels of substitutions. |
10.60732/e2e38c83 |
Cd, Cs, I, Pb, Zn |
Roman A. Eremin, Innokentiy S. Humonen, Alexey A. Kazakov, Vladimir D. Lazarev, Anatoly P. Pushkarev, Semen A. Budennyy |
60 |
9600 |
5 |
DFT-PBE |
VASP |
86 |
||||
The full trajectories from the VASP runs used to generate the 23-Single-Element-DNPs training sets. Configuration sets are available for each element. |
10.60732/a4e0fea6 |
Ag, Al, Au, Co, Cu, Ge, I, Kr, Li, Mg, Mo, Nb, Ni, Os, Pb, Pd, Pt, Re, Sb, Sr, Ti, Zn, Zr |
Christopher M. Andolina, Wissam A. Saidi |
108644 |
2352424 |
23 |
DFT-PBE |
Quantum ESPRESSO |
86 |
||||
Test configurations with fixed value for dihedral beta in alpha-gamma plane of 120 degreesfrom 3BPA dataset. Used to showcase the performance of linear atomic cluster expansion (ACE) force fields in a machine learning model to predict the potential energy surfaces of organic molecules. |
10.60732/09d00e4e |
C, H, N, O |
Dávid Péter Kovács, Cas van der Oord, Jiri Kucera, Alice E. A. Allen, Daniel J. Cole, Christoph Ortner, Gábor Csányi |
2347 |
63369 |
4 |
DFT-ωB97X |
ORCA |
86 |
||||
A comprehensive DFT data set was generated for six elements - Li, Mo, Ni, Cu, Si, and Ge. These elements were chosen to span a variety of chemistries (main group metal, transition metal, and semiconductor), crystal structures (bcc, fcc, and diamond) and bonding types (metallic and covalent). This dataset comprises only the Ni configurations |
10.60732/9a25df21 |
Ni |
Yunxing Zuo, Chi Chen, Xiangguo Li, Zhi Deng, Yiming Chen, Jörg Behler, Gábor Csányi, Alexander V. Shapeev, Aidan P. Thompson, Mitchell A. Wood, Shyue Ping Ong |
263 |
27420 |
1 |
DFT-PBE |
VASP |
85 |
||||
A comprehensive DFT data set was generated for six elements - Li, Mo, Ni, Cu, Si, and Ge. These elements were chosen to span a variety of chemistries (main group metal, transition metal, and semiconductor), crystal structures (bcc, fcc, and diamond) and bonding types (metallic and covalent). This dataset comprises only the Ge configurations |
10.60732/1a1e4a52 |
Ge |
Yunxing Zuo, Chi Chen, Xiangguo Li, Zhi Deng, Yiming Chen, Jörg Behler, Gábor Csányi, Alexander V. Shapeev, Aidan P. Thompson, Mitchell A. Wood, Shyue Ping Ong |
25 |
1568 |
1 |
DFT-PBE |
VASP |
84 |
||||
A comprehensive DFT data set was generated for six elements - Li, Mo, Ni, Cu, Si, and Ge. These elements were chosen to span a variety of chemistries (main group metal, transition metal, and semiconductor), crystal structures (bcc, fcc, and diamond) and bonding types (metallic and covalent). This dataset comprises only the Mo configurations |
10.60732/3827e5e1 |
Mo |
Yunxing Zuo, Chi Chen, Xiangguo Li, Zhi Deng, Yiming Chen, Jörg Behler, Gábor Csányi, Alexander V. Shapeev, Aidan P. Thompson, Mitchell A. Wood, Shyue Ping Ong |
194 |
10087 |
1 |
DFT-PBE |
VASP |
84 |
||||
The JARVIS_Materials_Project_2020 dataset is part of the joint automated repository for various integrated simulations (JARVIS) DFT database. This subset contains 127,000 configurations of 3D materials from the Materials Project database. JARVIS is a set of tools and datasets built to meet current materials design challenges. |
10.60732/8122ca50 |
Ac, Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr |
Anubhav Jain, Shyue Ping Ong, Geoffroy Hautier, Wei Chen, William Davidson Richards, Stephen Dacek, Shreyas Cholia, Dan Gunter, David Skinner, Gerbrand Ceder, Kristin A. Persson |
126335 |
3725727 |
89 |
DFT-undefined |
VASP |
84 |
||||
Dataset containing MD trajectories of AT-AT DNA base pairs from the MD22 benchmark set. {DESC} |
10.60732/3e801453 |
C, H, N, O |
Stefan Chmiela, Valentin Vassilev-Galindo, Oliver T. Unke, Adil Kabylda, Huziel E. Sauceda, Alexandre Tkatchenko, Klaus-Robert Müller |
19999 |
1199940 |
4 |
DFT-PBE+MBE |
FHI-aims |
84 |
||||
Configurations of Os from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K. |
10.60732/1ec0df98 |
Os |
Christopher M. Andolina, Wissam A. Saidi |
4624 |
114840 |
1 |
DFT-PBE |
VASP |
83 |
||||
Out-of-domain validation configurations for the structure to total energy and forces (S2EF) task of OC22. Open Catalyst 2022 (OC22) is a database of training trajectories for predicting catalytic reactions on oxide surfaces meant to complement OC20, which did not contain oxide surfaces. |
10.60732/71142b0d |
Au, Ba, Be, Bi, C, Ca, Cd, Ce, Co, Cr, Cs, Cu, Fe, Ga, Ge, H, Hf, Hg, In, Ir, K, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Ni, O, Os, Pb, Pd, Pt, Rb, Re, Rh, Ru, Sb, Sc, Se, Si, Sn, Sr, Ta, Ti, Tl, V, W, Zn, Zr |
Richard Tran, Janice Lan, Muhammed Shuaibi, Brandon M. Wood, Siddharth Goyal, Abhishek Das, Javier Heras-Domingo, Adeesh Kolluru, Ammar Rizvi, Nima Shoghi, Anuroop Sriram, Felix Therrien, Jehad Abed, Oleksandr Voznyy, Edward H. Sargent, Zachary Ulissi, C. Lawrence Zitnick |
457249 |
36937329 |
52 |
DFT-PBE+U |
VASP |
82 |
||||
A comprehensive DFT data set was generated for six elements - Li, Mo, Ni, Cu, Si, and Ge. These elements were chosen to span a variety of chemistries (main group metal, transition metal, and semiconductor), crystal structures (bcc, fcc, and diamond) and bonding types (metallic and covalent). This dataset comprises only the Ni configurations |
10.60732/ef83b761 |
Ni |
Yunxing Zuo, Chi Chen, Xiangguo Li, Zhi Deng, Yiming Chen, Jörg Behler, Gábor Csányi, Alexander V. Shapeev, Aidan P. Thompson, Mitchell A. Wood, Shyue Ping Ong |
31 |
3158 |
1 |
DFT-PBE |
VASP |
81 |
||||
This data set was originally used to generate a linear SNAP potential for solid and liquid tantalum as published in Thompson, A.P. et. al, J. Comp. Phys. 285 (2015) 316-330. |
10.60732/da9afef7 |
Ta |
Aidan P. Thompson, Laura P. Swiler, Christian R. Trott, Stephen M. Foiles, Garritt J. Tucker |
363 |
4224 |
1 |
DFT-PBE |
VASP |
80 |
||||
Dataset containing MD trajectories of the double-walled nanotube supramolecule from the MD22 benchmark set. MD22 represents a collection of datasets in a benchmark that can be considered an updated version of the MD17 benchmark datasets, including more challenges with respect to system size, flexibility and degree of non-locality. The datasets in MD22 include MD trajectories of the protein Ac-Ala3-NHMe; the lipid DHA (docosahexaenoic acid); the carbohydrate stachyose; nucleic acids AT-AT and AT-AT-CG-CG; and the buckyball catcher and double-walled nanotube supramolecules. Each of these is included here in a separate dataset, as represented on sgdml.org. Calculations were performed using FHI-aims and i-Pi software at the DFT-PBE+MBD level of theory. Trajectories were sampled at temperatures between 400-500 K at 1 fs resolution. |
10.60732/fce214af |
C, H |
Stefan Chmiela, Valentin Vassilev-Galindo, Oliver T. Unke, Adil Kabylda, Huziel E. Sauceda, Alexandre Tkatchenko, Klaus-Robert Müller |
5032 |
1861840 |
2 |
DFT-PBE+MBE |
FHI-aims |
80 |
||||
A subset of the MAD-1.5 (Massive Atomic Diversity version 1.5) structures recomputed with the PBE GGA functional, covering the MAD-1 subsets (MC3D, MC3D-rattled, MC3D-random, MC3D-surface, MC3D-cluster, MC2D, SHIFTML-molcrys, SHIFTML-molfrags) plus monomers and MC3D-random-extended from the new MAD-1.5 subsets. All DFT settings are consistent with the r2SCAN calculations: FHI-aims (version 250806) all-electron code with tight NAO basis sets (species defaults 2020), 8 Angstrom^-1 k-point density for periodic systems, Gaussian smearing of 0.05 eV, and SCF convergence thresholds of 1e-6 eV (energy), 1e-4 eV/Angstrom (forces), and 1e-5 e*a0^-3 (electron density). Cross-validation splits are consistent with the r2SCAN train/val/test splits; this file contains all three splits combined. PBE targets were used in PET-MAD-1.5 model training with separate prediction heads alongside r2SCAN targets, improving force accuracy by approximately 25% relative to r2SCAN-only training. As a lower level of theory, this dataset is less carefully curated than the primary r2SCAN dataset; PBE heads are discarded from the final released models. |
Ac, Ag, Al, Am, Ar, As, At, Au, B, Ba, Be, Bi, Bk, Br, C, Ca, Cd, Ce, Cf, Cl, Cm, Co, Cr, Cs, Cu, Dy, Er, Es, Eu, F, Fe, Fm, Fr, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Md, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, No, Np, O, Os, P, Pa, Pb, Pd, Pm, Po, Pr, Pt, Pu, Ra, Rb, Re, Rh, Rn, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr |
Cesare Malosso, Filippo Bigi, Paolo Pegolo, Joseph W. Abbott, Philip Loche, Mariana Rossi, Michele Ceriotti, Arslan Mazitov |
101493 |
2620250 |
102 |
DFT-PBE |
FHI-aims v250806 |
80 |
|||||
The main training dataset for GST_GAP_22, calculated using the PBEsol functional. GST-GAP-22 contains configurations of phase-change materials on the quasi-binary GeTe-Sb2Te3 (GST) line of chemical compositions. Data was used for training a machine learning interatomic potential to simulate a range of germanium-antimony-tellurium compositions under realistic device conditions. |
10.60732/f2d6e02c |
Ge, Sb, Te |
Yuxing Zhou, Wei Zhang, Evan Ma, Volker L. Deringer |
2690 |
341004 |
3 |
DFT-PBEsol |
CASTEP |
79 |
||||
Training configurations from the SAIT_semiconductors_ACS_2023_HfO dataset. This dataset contains HfO configurations from the SAIT semiconductors datasets. SAIT semiconductors datasets comprise two rich datasets for the important semiconductor thin film materials silicon nitride (SiN) and hafnium oxide (HfO), gathered for the development of MLFFs. DFT simulations were conducted under various conditions that include differing initial structures, stoichiometry, temperature, strain, and defects. |
10.60732/495b736b |
Hf, O |
Geonu Kim, Byunggook Na, Gunhee Kim, Hyuntae Cho, Seung-Jin Kang, Hee Sun Lee, Saerom Choi, Heejae Kim, Seungwon Lee, Yongdeok Kim |
27958 |
2683968 |
2 |
DFT-PBE |
VASP |
79 |
||||
MatPES (Materials Potential Energy Surface) is a foundational PES dataset developed collaboratively by the Materials Virtual Lab and the Materials Project. The v2025.2 PBE release contains 433,189 structures sampled via the DIRECT method from 300 K NpT molecular dynamics simulations seeded from Materials Project entries. Static DFT calculations were performed using VASP with the PBE functional and MatPESStaticSet convergence settings optimized for energy, force, and stress calculations. v2025.2 removes a small number of duplicated structures present in v2025.1, and the original files add Bader charges and Bader magnetic moments per atom. The previous version of this dataset (MatPES-PBE-2025.1) is available from ColabFit. There is a companion dataset calculated with the r2SCAN functional (MatPES-R2SCAN-2025.2). |
Ac, Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr |
Aaron D. Kaplan, Runze Liu, Ji Qi, Tsz Wai Ko, Bowen Deng, Janosh Riebesell, Gerbrand Ceder, Kristin A. Persson, Shyue Ping Ong |
433163 |
3867177 |
89 |
DFT-PBE |
VASP 6.4.x |
78 |
|||||
MSR-ACC/TAE25 (Microsoft Research Accurate Chemistry Collection, Total Atomization Energies 2025) provides 73,040 total atomization energies (TAEs) at the CCSD(T)/CBS level obtained with the W1-F12 composite wavefunction protocol implemented in Molpro 2024.1. This is the complete MSR-ACC/TAE25 dataset of 73,040 molecules, comprising all structures prior to partitioning into canonical train and validation splits. The dataset covers the chemical space of closed-shell, charge-neutral, covalently bound equilibrium molecular structures containing up to 5 non-hydrogen atoms drawn from elements H through Ar, excluding rare gases. Molecular structures were generated by exhaustive graph enumeration and degree-sequence sampling, then optimized through a cascade of GFN2-xTB, r2SCAN-3c, and B3LYP-D3(BJ)/def2-TZVPP levels of theory (ORCA). Structures were filtered to exclude those with significant multireference character (%TAE[(T)] > 6% at CCSD(T)/6-31G*), triplet electronic ground states, or dissociated fragments. The W1-F12 protocol includes Hartree-Fock extrapolation to the complete basis set limit (cc-pVDZ-F12 and cc-pVTZ-F12, alpha=5), CCSD-F12b correlation, perturbative triples delta(T) using jul-cc-pV(D+d)Z and jul-cc-pV(T+d)Z basis sets (alpha=3.22), and a core-valence correction using cc-pwCVTZ. The dataset spans 45.1% organic and 54.9% inorganic molecules and provides broader chemical diversity than comparable datasets such as GDB-9 or VQM24/DMC. Additional data available in the source files, including DFT atomization energies at approximately 90 levels of theory, singlet-triplet gaps, %TAE[(T)] multireference diagnostics, and W1-F12 energy components, can be downloaded from ColabFit Exchange. It includes molecules overlapping with the W4-17 and GMTKN55 benchmark sets that are excluded from the train and validation splits. |
Al, B, Be, C, Cl, F, H, Li, Mg, N, Na, O, P, S, Si |
Sebastian Ehlert, Jan Hermann, Thijs Vogels, Victor Garcia Satorras, Stephanie Lanius, Marwin Segler, Klaas J.H. Giesbertz, Derk P. Kooi, Kenji Takeda, Chin-Wei Huang, Giulia Luise, Rianne van den Berg, Paola Gori-Giorgi, Amir Karton |
73040 |
540810 |
15 |
W1-F12/CCSD(T)-CBS |
Molpro 2024.1 |
76 |
|||||
The test split of the Transition1x dataset. Transition1x is a benchmark dataset containing 9.6 million Density Functional Theory (DFT) calculations of forces and energies of molecular configurations on and around reaction pathways at the ωB97x/6-31 G(d) level of theory. The configurations contained in this dataset allow a better representation of features in transition state regions when compared to other benchmark datasets -- in particular QM9 and ANI1x. |
10.60732/f26a9f60 |
C, H, N, O |
Mathias Schreiner, Arghya Bhowmik, Tejs Vegge, Jonas Busk, Ole Winther |
190261 |
2106595 |
4 |
DFT-ωB97X |
ORCA 5.0.2 |
76 |
||||
Test set from COLL. Consists of configurations taken from molecular collisions of different small organic molecules. Energies and forces for 140,000 random snapshots taken from these trajectories were recomputed with density functional theory (DFT). These calculations were performed with the revPBE functional and def2-TZVP basis, including D3 dispersion corrections |
10.60732/7b135132 |
C, H, O |
Johannes Gasteiger, Shankari Giri, Johannes T. Margraf, Stephan Günnemann |
9480 |
97886 |
3 |
DFT-revPBE+D3 |
ORCA |
76 |
||||
This is the filtered validation split of ODAC25. Open Direct Air Capture 2025 (ODAC25) is the largest high-quality DFT dataset for Direct Air Capture, containing over 15,000 Metal-Organic Frameworks (MOFs), including experimental, defective, synthetic, and amine-functionalized MOFs, with 4 adsorbates: CO2, H2O, N2, and O2. ODAC25 significantly improves upon ODAC23 by adding functionalized MOFs, new adsorbates (N2 and O2), higher k-point convergence, and re-relaxations of empty MOFs. The dataset contains three partitions: (1) mof_plus_adsorbate includes full DFT relaxations of different adsorbates on various MOFs; (2) mof includes re-relaxations of empty MOFs; (3) gcmc includes DFT single points of configurations derived from Grand Canonical Monte Carlo (GCMC) simulations. MOFs deemed problematic by Jin et al. (2025) have been excluded (see https://zenodo.org/records/14802658). |
Ag, Al, C, Cd, Cl, Co, Cu, Eu, F, Fe, Gd, H, I, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, O, P, Pr, S, Si, Sm, Sr, Tb, Y, Zn, Zr |
Anuroop Sriram, Logan M. Brabson, Xiaohan Yu, Sihoon Choi, Kareem Abdelmaqsoud, Elias Moubarak, Pim de Haan, Sindy Löwe, Johann Brehmer, John R. Kitchin, Max Welling, C. Lawrence Zitnick, Zachary Ulissi, Andrew J. Medford, David S. Sholl |
783702 |
171139115 |
32 |
DFT-PBE+D3 |
VASP 6.3 |
75 |
|||||
A dataset consisting of the energies of supercells containing from 1 to 250 atoms. The supercells represent energy-volume relations for 8 crystal structures of Ta, 5 uniform deformation paths between pairs of structures, vacancies, interstitials, surfaces with low-index orientations, 4 symmetrical tilt grain boundaries, γ-surfaces on the (110) and (211) fault planes, a [111] screw dislocation, liquid Ta, and several isolated clusters containing from 2 to 51 atoms. Some of the supercells contain static atomic configurations. However, most are snapshots of ab initio MD simulations at different densities, and temperatures ranging from 293 K to 3300 K. The BCC structure was sampled in the greatest detail, including a wide range of isotropic and uniaxial deformations. |
10.60732/7f6cac29 |
Ta |
Yi-Shen Lin, Ganga P. Purja Pun, Yuri Mishin |
3191 |
135706 |
1 |
DFT-PBE |
VASP |
75 |
||||
Configurations of Kr from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K. |
10.60732/6a060a77 |
Kr |
Christopher M. Andolina, Wissam A. Saidi |
2875 |
95033 |
1 |
DFT-PBE |
VASP |
75 |
||||
Approximately 7,400 configurations of titanium used for training a deep potential using the DeePMD-kit molecular dynamics package and DP-GEN training scheme. |
10.60732/85e47ff3 |
Ti |
Tongqi Wen, Rui Wang, Lingyu Zhu, Linfeng Zhang, Han Wang, David J. Srolovitz, Zhaoxuan Wu |
7376 |
143792 |
1 |
DFT-PBE |
VASP |
75 |
||||
Structures from the SAIT_semiconductors_ACS_2023_HfO dataset, separated into crystal, out-of-domain, and random (generated by randomly distributing 32 Hf and 64 O atoms within the unit cells of the HfO2 crystals) configuration sets. This dataset contains HfO configurations from the SAIT semiconductors datasets. SAIT semiconductors datasets comprise two rich datasets for the important semiconductor thin film materials silicon nitride (SiN) and hafnium oxide (HfO), gathered for the development of MLFFs. DFT simulations were conducted under various conditions that include differing initial structures, stoichiometry, temperature, strain, and defects. |
10.60732/186c10bf |
Hf, O |
Geonu Kim, Byunggook Na, Gunhee Kim, Hyuntae Cho, Seung-Jin Kang, Hee Sun Lee, Saerom Choi, Heejae Kim, Seungwon Lee, Yongdeok Kim |
191973 |
18429408 |
2 |
DFT-PBE |
VASP |
74 |
||||
Test configurations with fixed value for dihedral beta in alpha-gamma plane of 180 degreesfrom 3BPA dataset. Used to showcase the performance of linear atomic cluster expansion (ACE) force fields in a machine learning model to predict the potential energy surfaces of organic molecules. |
10.60732/e9fb7e4a |
C, H, N, O |
Dávid Péter Kovács, Cas van der Oord, Jiri Kucera, Alice E. A. Allen, Daniel J. Cole, Christoph Ortner, Gábor Csányi |
2350 |
63450 |
4 |
DFT-ωB97X |
ORCA |
74 |
||||
The training set from HME21. The high-temperature multi-element 2021 (HME21) dataset comprises approximately 25,000 configurations, including 37 elements, used in the training of a universal NNP called PreFerential Potential (PFP). The dataset specifically contains disordered and unstable structures, and structures that include irregular substitutions, as well as varied temperature and density. |
10.60732/845cc1b5 |
Ag, Al, Au, Ba, C, Ca, Cl, Co, Cr, Cu, F, Fe, H, In, Ir, K, Li, Mg, Mn, Mo, N, Na, Ni, O, P, Pb, Pd, Pt, Rh, Ru, S, Sc, Si, Sn, Ti, V, Zn |
So Takamoto, Chikashi Shinagawa, Daisuke Motoki, Kosuke Nakago, Wenwen Li, Iori Kurata, Taku Watanabe, Yoshihiro Yayama, Hiroki Iriguchi, Yusuke Asano, Tasuku Onodera, Takafumi Ishii, Takao Kudo, Hideki Ono, Ryohto Sawada, Ryuichiro Ishitani, Marc Ong, Taiki Yamaguchi, Toshiki Kataoka, Akihide Hayashi, Nontawat Charoenphakdee, Takeshi Ibuka |
19954 |
554986 |
37 |
DFT-PBE |
VASP 5.4.4 |
73 |
||||
Dataset from "Stress-dependence of generalized stacking fault energies":DFT calculations of generalized stacking fault energies (GSFE) for Al, Cu, and Mg. |
10.60732/861da1bc |
Al, Cu, Mg |
Binglun Yin, Predrag Andric, W. A. Curtin |
272 |
3264 |
3 |
DFT-PBE |
VASP |
72 |
||||
This dataset contains structures of Cu, including Cu(111), Cu(100), Cu(110), and Cu(211). Slab settings are as follows: 3 x 3, 6-layered slabs for Cu(111), (100), and (110) surfaces; 1 x 3, 6-layered slabs for Cu(211) surface. Includes some structures representing interation of H2 with one of the Cu surfaces and some structures of Cu sampled at different temperatures. |
10.60732/d0801836 |
Cu, H |
Wojciech G. Stark, Julia Westermayr, Oscar A. Douglas-Gallardo, James Gardner, Scott Habershon, Reinhard J. Maurer |
3413 |
191104 |
2 |
DFT-SRP48 |
FHI-aims |
72 |
||||
The validation split of the Transition1x dataset. Transition1x is a benchmark dataset containing 9.6 million Density Functional Theory (DFT) calculations of forces and energies of molecular configurations on and around reaction pathways at the ωB97x/6-31 G(d) level of theory. The configurations contained in this dataset allow a better representation of features in transition state regions when compared to other benchmark datasets -- in particular QM9 and ANI1x. |
10.60732/8e8402d6 |
C, H, N, O |
Mathias Schreiner, Arghya Bhowmik, Tejs Vegge, Jonas Busk, Ole Winther |
264972 |
3743153 |
4 |
DFT-ωB97X |
ORCA 5.0.2 |
71 |
||||
The validation split of the Open Catalyst 2025 (OC25) dataset for solid-liquid interfaces. OC25 consists of single-point DFT calculations of catalyst/solvent/ion/adsorbate structures, covering 88 elements, 8 solvents (water, methanol, CCl4, DMSO, benzene, hexane, THF, diethyl ether), 9 ionic species (Cs+, OH-, Li+, SO4^2-, Ca^2+, [Me4N]+, HCO3-, H+, F-), and adsorbates from the OC20 set plus reactive intermediates. Surfaces are derived from 39,821 Materials Project bulk structures with miller indices <= 3. Structures are highly off-equilibrium, sampled from short ab initio molecular dynamics simulations (10-50 steps, 1000K, NVT) or short DFT relaxations (5 ionic steps). The validation split contains 203,630 structures representing out-of-distribution (OOD) bulk-solvent combinations (approximately 2.5% of ~260,000 unique pairings held out). Validation calculations used tighter DFT convergence (EDIFF=1e-6 eV) compared to the training set to provide higher-quality force labels. All DFT calculations used VASP 6.3.2 with the non-spin-polarized RPBE functional supplemented with D3 dispersion correction (zero damping), plane wave cutoff 400 eV, k-point reciprocal density of 40, and a dipole correction in the z-direction. |
Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, F, Fe, Ga, Ge, H, Hf, Hg, I, In, Ir, K, La, Li, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, O, Os, P, Pb, Pd, Pm, Pr, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Tc, Te, Ti, Tl, V, W, Xe, Y, Zn, Zr |
Sushree Jagriti Sahoo, Mikael Maroschin, Daniel S. Levine, Zachary Ulissi, C. Lawrence Zitnick, Joel B Varley, Joseph A. Gauthier, Nitish Govindarajan, Muhammed Shuaibi |
203630 |
29341418 |
70 |
DFT-RPBE+D3 |
VASP 6.3.2 |
71 |
|||||
The validation split of OMC25. Open Molecular Crystals 2025 (OMC25) is a molecular crystal dataset produced by Meta. The OE62 dataset was used as a source for sampling molecules; crystals were generated with Genarris 3.0; from these, relaxation trajectories were generated and sampled to create the final dataset. See the publication for details. |
B, Br, C, Cl, F, H, I, N, O, P, S, Si |
Vahe Gharakhanyan, Luis Barroso-Luque, Yi Yang, Muhammed Shuaibi, Kyle Michel, Daniel S. Levine, Misko Dzamba, Xiang Fu, Meng Gao, Xingyu Liu, Haoran Ni, Keian Noori, Brandon M. Wood, Matt Uyttendaele, Arman Boromand, C. Lawrence Zitnick, Noa Marom, Zachary W. Ulissi, Anuroop Sriram |
1386816 |
178106924 |
12 |
DFT-PBE |
VASP 6.3 |
71 |
|||||
MatPES (Materials Potential Energy Surface) is a foundational PES dataset developed collaboratively by the Materials Virtual Lab and the Materials Project. The v2025.2 r2SCAN release contains 386,544 structures sampled via the DIRECT method from 300 K NpT molecular dynamics simulations seeded from Materials Project entries. Static DFT calculations were performed using VASP with the r2SCAN meta-GGA functional and MatPESStaticSet convergence settings optimized for energy, force, and stress calculations. v2025.2 removes a small number of duplicated structures present in v2025.1, and the original files add Bader charges and Bader magnetic moments per atom. The previous version of this dataset (MatPES-R2SCAN-2025.1) is available from ColabFit. There is a companion dataset calculated with the PBE functional (MatPES-PBE-2025.2). |
Ac, Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr |
Aaron D. Kaplan, Runze Liu, Ji Qi, Tsz Wai Ko, Bowen Deng, Janosh Riebesell, Gerbrand Ceder, Kristin A. Persson, Shyue Ping Ong |
386520 |
3049029 |
89 |
DFT-R2SCAN |
VASP 6.4.x |
71 |
|||||
Dataset containing MD trajectories of DHA (docosahexaenoic acid) from the MD22 benchmark set. MD22 represents a collection of datasets in a benchmark that can be considered an updated version of the MD17 benchmark datasets, including more challenges with respect to system size, flexibility and degree of non-locality. The datasets in MD22 include MD trajectories of the protein Ac-Ala3-NHMe; the lipid DHA (docosahexaenoic acid); the carbohydrate stachyose; nucleic acids AT-AT and AT-AT-CG-CG; and the buckyball catcher and double-walled nanotube supramolecules. Each of these is included here in a separate dataset, as represented on sgdml.org. Calculations were performed using FHI-aims and i-Pi software at the DFT-PBE+MBD level of theory. Trajectories were sampled at temperatures between 400-500 K at 1 fs resolution. |
10.60732/9d9083b8 |
C, H, O |
Stefan Chmiela, Valentin Vassilev-Galindo, Oliver T. Unke, Adil Kabylda, Huziel E. Sauceda, Alexandre Tkatchenko, Klaus-Robert Müller |
69744 |
3905664 |
3 |
DFT-PBE+MBE |
FHI-aims |
70 |
||||
Test configurations from the 'scaffold' split of Chig-AIMD. This dataset covers the conformational space of chignolin with DFT-level precision. We sequentially applied replica exchange molecular dynamics (REMD), conventional MD, and ab initio MD (AIMD) simulations on a 10 amino acid protein, Chignolin, and finally collected 2 million biomolecule structures with quantum level energy and force records. |
10.60732/e9f3507f |
C, H, N, O |
Tong Wang, Xinheng He, Mingyu Li, Bin Shao, Tie-Yan Liu |
198977 |
33030182 |
4 |
DFT-M06-2X |
ORCA 4.2.1 |
70 |
||||
The amorphous carbon dataset was generated using ab initio calculations with VASP software. We utilized the LDA exchange-correlation functional and the PAW potential for carbon. Melt-quench simulations were performed to create amorphous and liquid-state structures. A simple cubic lattice of 216 carbon atoms was chosen as the initial state. Simulations were conducted at densities of 1.5, 1.7, 2.0, 2.2, 2.4, 2.6, 2.8, 3.0, 3.2, 3.4, and 3.5 g/cm3 to produce a variety of structures. The NVT ensemble was employed for all melt-quench simulations, and the density was adjusted by modifying the size of the simulation cell. A time step of 1 fs was used for the simulations. For all densities, only the Γ points were sampled in the k-space. To increase structural diversity, six independent simulations were performed.In the melt-quench simulations, the temperature was raised from 300 K to 9000 K over 2 ps to melt carbon. Equilibrium molecular dynamics (MD) was conducted at 9000 K for 3 ps to create a liquid state, followed by a decrease in temperature to 5000 K over 2 ps, with the system equilibrating at that temperature for 2 ps. Finally, the temperature was lowered from 5000 K to 300 K over 2 ps to generate an amorphous structure.During the melt-quench simulation, 30 snapshots were taken from the equilibrium MD trajectory at 9000 K, 100 from the cooling process between 9000 and 5000 K, 25 from the equilibrium MD trajectory at 5000 K, and 100 from the cooling process between 5000 and 300 K. This yielded a total of 16,830 data points.Data for diamond structures containing 216 atoms at densities of 2.4, 2.6, 2.8, 3.0, 3.2, 3.4, and 3.5 g/cm3 were also prepared. Further data on the diamond structure were obtained from 80 snapshots taken from the 2 ps equilibrium MD trajectory at 300 K, resulting in 560 data points.To validate predictions for larger structures, we generated data for 512-atom systems using the same procedure as for the 216-atom systems. A single simulation was conducted for each density. The number of data points was 2,805 for amorphous and liquid states |
10.60732/eeb61a0d |
C |
Emi Minamitani, Ippei Obayashi, Koji Shimizu, Satoshi Watanabe |
20194 |
5191888 |
1 |
DFT-LDA |
VASP |
69 |
||||
Dihedral scan about one of the C-C bonds of the conjugated system. Acetylacetone dataset generated from a long molecular dynamics simulation at 300 K using a Langevin thermostat at the semi-empirical GFN2-xTB level of theory. Configurations were sampled at an interval of 1 ps and the resulting set of configurations were recomputed with density functional theory using the PBE exchange-correlation functional with D3 dispersion correction and def2-SVP basis set and VeryTightSCF convergence settings using the ORCA electronic structure package. |
10.60732/b03a4349 |
C, H, O |
Ilyes Batatia, Simon Batzner, Dávid Péter Kovács, Albert Musaelian, Gregor N. C. Simm, Ralf Drautz, Christoph Ortner, Boris Kozinsky, Gábor Csányi |
45 |
675 |
3 |
DFT-PBE+D3 |
ORCA 5.0 |
68 |
||||
The test set of a train and test set pair.The combined datasets comprise approximately 275 configurations of monolayer quasi-hexagonal-phase fullerene (qHPF) membrane used to train and test an NEP model. |
10.60732/f1e6e9fa |
C |
Penghua Ying |
39 |
4680 |
1 |
DFT-PBE |
VASP |
67 |
||||
Out-of-domain configurations from the SAIT_semiconductors_ACS_2023_SiN dataset. This dataset contains SiN, Si and N configurations from the SAIT semiconductors datasets. SAIT semiconductors datasets comprise two rich datasets for the important semiconductor thin film materials silicon nitride (SiN) and hafnium oxide (HfO), gathered for the development of MLFFs. DFT simulations were conducted under various conditions that include differing initial structures, stoichiometry, temperature, strain, and defects. |
10.60732/c2179a59 |
N, Si |
Geonu Kim, Byunggook Na, Gunhee Kim, Hyuntae Cho, Seung-Jin Kang, Hee Sun Lee, Saerom Choi, Heejae Kim, Seungwon Lee, Yongdeok Kim |
1234 |
129570 |
2 |
DFT-PBE |
VASP |
67 |
||||
Configurations of Re from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K. |
10.60732/1e0997f8 |
Re |
Christopher M. Andolina, Wissam A. Saidi |
5011 |
100839 |
1 |
DFT-PBE |
VASP |
67 |
||||
Validation split of the MAD-1.5 (Massive Atomic Diversity version 1.5) dataset, a highly curated collection designed for training broadly applicable atomistic machine-learning models across the full periodic table. MAD-1.5 extends the original MAD dataset with targeted enrichment strategies covering 102 chemical elements (all isotopes with half-life above one day). All 216,803 structures are computed with a single standardized all-electron DFT workflow using the r2SCAN meta-GGA functional in FHI-aims (version 250806), with tight basis sets, 8 Angstrom^-1 k-point density, Gaussian smearing of 0.05 eV, and SCF convergence thresholds of 1e-6 eV (energy), 1e-4 eV/Angstrom (forces), and 1e-5 e*a0^-3 (electron density). The dataset spans molecules (monomers, dimers, trimers, molecular crystals), bulk crystals, surfaces, nanoclusters, and low-dimensional structures organized into 14 subsets. Quality is ensured by two-step outlier removal: heuristic filtering of structures with forces >100 eV/Angstrom, followed by LLPR uncertainty-based filtering. The validation split (~10% of cleaned data) uses a stratified split method consistent with the training and test splits. A companion PBE-functional dataset (Massive_Atomic_Diversity_MAD-1.5_PBE) was used during model training with separate prediction heads. |
Ac, Ag, Al, Am, Ar, As, At, Au, B, Ba, Be, Bi, Bk, Br, C, Ca, Cd, Ce, Cf, Cl, Cm, Co, Cr, Cs, Cu, Dy, Er, Es, Eu, F, Fe, Fm, Fr, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Md, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, No, Np, O, Os, P, Pa, Pb, Pd, Pm, Po, Pr, Pt, Pu, Ra, Rb, Re, Rh, Rn, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr |
Cesare Malosso, Filippo Bigi, Paolo Pegolo, Joseph W. Abbott, Philip Loche, Mariana Rossi, Michele Ceriotti, Arslan Mazitov |
18305 |
320218 |
102 |
DFT-r2SCAN |
FHI-aims v250806 |
66 |
|||||
Validation set of the Open Polymers 2026 (OPoly26) dataset. OPoly26 contains over 6.57 million density functional theory (DFT) calculations on cluster fragments of up to 360 atoms derived from polymeric systems. The dataset encompasses variations in monomer composition, polymerization degree, chain architectures, and solvation environments to improve machine learning model performance for polymer property prediction. Calculations were performed at the B97M-V/def2-SVP level of theory using ORCA. |
Al, B, Br, C, Ca, Cl, Co, Cs, Cu, F, Fe, H, I, K, La, Li, Mg, N, Na, Ni, O, P, S, Sr, Zn |
Daniel S. Levine, Nicholas Liesen, Lauren Chua, James Diffenderfer, Helgi I. Ingolfsson, Matthew P. Kroonblawd, Nitesh Kumar, Amitesh Maiti, Supun S. Mohottalalage, Muhammed Shuaibi, Brian Van Essen, Brandon M. Wood, C. Lawrence Zitnick, Samuel M. Blau, Evan R. Antoniuk |
210302 |
37298046 |
25 |
DFT-ωB97M-V |
ORCA |
66 |
|||||
The test set of a train/test pair from the toluene dataset from sGDML. To create the coupled cluster datasets, the data used for training the models were created by running ab initio MD in the NVT ensemble using the Nosé-Hoover thermostat at 500 K during a 200 ps simulation with a resolution of 0.5 fs. Energies and forces were recalculated using all-electron coupled cluster with single, double and perturbative triple excitations (CCSD(T)). The Dunning correlation-consistent basis set cc-pVDZ was used for malonaldehyde. All calculations were performed with the Psi4 software suite. |
10.60732/52a54ab9 |
C, H |
Stefan Chmiela, Huziel E. Sauceda, Klaus-Robert Müller, Alexandre Tkatchenko |
501 |
7515 |
2 |
CCSD(T) |
Psi4 |
65 |
||||
Test set of decorrelated geometries sampled from 600 K xTB MD. Acetylacetone dataset generated from a long molecular dynamics simulation at 300 K using a Langevin thermostat at the semi-empirical GFN2-xTB level of theory. Configurations were sampled at an interval of 1 ps and the resulting set of configurations were recomputed with density functional theory using the PBE exchange-correlation functional with D3 dispersion correction and def2-SVP basis set and VeryTightSCF convergence settings using the ORCA electronic structure package. |
10.60732/c83f94bc |
C, H, O |
Ilyes Batatia, Simon Batzner, Dávid Péter Kovács, Albert Musaelian, Gregor N. C. Simm, Ralf Drautz, Christoph Ortner, Boris Kozinsky, Gábor Csányi |
650 |
9750 |
3 |
DFT-PBE+D3 |
ORCA 5.0 |
64 |
||||
Dataset from "Modeling high-entropy transition-metal alloys with alchemical compression". Includes 25,000 structures utilized for fitting the aforementioned potential, with a focus on 25 d-block transition metals, excluding Tc, Cd, Re, Os and Hg. Each configuration includes a "class" field, indicating the crystal class of the structure. The class represents the following: 1: perfect crystals; 3-8 elements per structure, 2: shuffled positions (standard deviation 0.2\AA ); 3-8 elements per structure, 3: shuffled positions (standard deviation 0.5\AA ); 3-8 elements per structure, 4: shuffled positions (standard deviation 0.2\AA ); 3-25 elements per structure. Configuration sets include divisions into fcc and bcc crystals, further split by class as described above. |
10.60732/7766f043 |
Ag, Au, Co, Cr, Cu, Fe, Hf, Ir, Lu, Mn, Mo, Nb, Ni, Pd, Pt, Rh, Ru, Sc, Ta, Ti, V, W, Y, Zn, Zr |
Nataliya Lopanitsyna, Guillaume Fraux, Maximilian A. Springer, Sandip De, Michele Ceriotti |
25625 |
1063584 |
25 |
DFT-PBEsol |
VASP |
64 |
||||
The JARVIS-MEGNet dataset is part of the joint automated repository for various integrated simulations (JARVIS) database. This subset contains configurations with 3D materials properties from the 2018 version of Materials Project, as used in the training of the MEGNet ML model. JARVIS is a set of tools and datasets built to meet current materials design challenges. |
10.60732/b88c7676 |
Ac, Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr |
Chi Chen, Weike Ye, Yunxing Zuo, Chen Zheng, Shyue Ping Ong |
69215 |
2070556 |
89 |
DFT-PBE |
VASP |
63 |
||||
Approximately 2800 configurations from a test dataset–one of a pair of train/test datasets of aluminum in crystal and melt phases, used for training and testing an ANI neural network model. |
10.60732/d1e27447 |
Al |
Justin S. Smith, Benjamin Nebgen, Nithin Mathew, Jie Chen, Nicholas Lubbers, Leonid Burakovsky, Sergei Tretiak, Hai Ah Nam, Timothy Germann, Saryu Fensin, Kipton Barros |
2769 |
357851 |
1 |
DFT-PBE |
Quantum ESPRESSO |
63 |
||||
Out-of-domain configurations from the SAIT_semiconductors_ACS_2023_HfO dataset. This dataset contains HfO configurations from the SAIT semiconductors datasets. SAIT semiconductors datasets comprise two rich datasets for the important semiconductor thin film materials silicon nitride (SiN) and hafnium oxide (HfO), gathered for the development of MLFFs. DFT simulations were conducted under various conditions that include differing initial structures, stoichiometry, temperature, strain, and defects. |
10.60732/83a90e9c |
Hf, O |
Geonu Kim, Byunggook Na, Gunhee Kim, Hyuntae Cho, Seung-Jin Kang, Hee Sun Lee, Saerom Choi, Heejae Kim, Seungwon Lee, Yongdeok Kim |
6996 |
671616 |
2 |
DFT-PBE |
VASP |
63 |
||||
The amorphous LiSi data set comprises 45,169 atomic structures with compositions Li(x)Si (0.0≤x≤4.75) and the corresponding energies and interatomic forces, which were generated using an iterative approach based on an evolutionary algorithm and subsequent refinement, as described in detail in reference [15]. The data includes bulk, surface, and cluster structures with system sizes of up to 608 atoms. The energies and forces of the LiSi structures were obtained from DFT calculations using the Perdew-Burke-Ernzerhof [10] exchange-correlation functional and projector-augmented wave pseudopotentials [16], as implemented in the Vienna Ab-Initio Simulation Package (VASP) [17,18]. We employed a plane-wave basis set with an energy cutoff of 520 eV for the representation of the wavefunctions and a uniform gamma-centered k-point grid for the Brillouin zone integration, with a mesh density corresponding to a number of k points of at least 1000 divided by the number of atoms. The atomic positions and lattice parameters of all structures were optimized until residual forces were below 20 meV/Å. This dataset was also used for the construction of the ANN potential in Ref. [15] and [19]. [10] J. P. Perdew, K. Burke, and M. Ernzerhof, Phys. Rev. Lett. 77, 3865 (1996). [15] N. Artrith, A. Urban, G. Ceder, J. Chem. Phys. 148 (2018) 241711. [16] P. E. Blöchl, Phys. Rev. B 50, 17953–17979 (1994). [17] G. Kresse, J. Furthmüller, Phys. Rev. B 54, 11169–11186 (1996). [18] Kresse, J. Furthmüller, Comput. Mater. Sci. 6, 15–50 (1996). [19] N. Artrith, A. Urban, Y. Wang, G. Ceder, arXiv:1901.09272, https://arxiv.org/pdf/1901.09272.pdf |
10.60732/ea8fd398 |
Li, Si |
Michael S. Chen, Tobias Morawietz, Thomas E. Markland, Nongnuch Artrith |
44651 |
5741119 |
2 |
DFT-PBE |
VASP |
63 |
||||
Configurations of o-hbdi from WS22. The WS22 database combines Wigner sampling with geometry interpolation to generate 1.18 million molecular geometries equally distributed into 10 independent datasets of flexible organic molecules with varying sizes and chemical complexity. In addition to the potential energy and forces required to construct potential energy surfaces, the WS22 database provides several other quantum chemical properties, all obtained via single-point calculations for each molecular geometry. All quantum chemical calculations were performed with the Gaussian 09 program. |
10.60732/538deb26 |
C, H, N, O |
Max Pinheiro Jr, Shuang Zhang, Pavlo O. Dral, Mario Barbatti |
119988 |
1799820 |
4 |
DFT-PBE0 |
Gaussian 09 |
62 |
||||
10,000 configurations of organosilicon compounds with energies predicted by an improved GFN-xTB Hamiltonian parameterization, using revPBE. |
10.60732/029be1b1 |
Br, C, Cl, F, H, N, O, P, S, Si |
Leonid Komissarov, Toon Verstraelen |
157348 |
4021653 |
10 |
DFT-revPBE |
ADF |
62 |
||||
Training configurations from the SAIT_semiconductors_ACS_2023_SiN dataset. This dataset contains SiN, Si and N configurations from the SAIT semiconductors datasets. SAIT semiconductors datasets comprise two rich datasets for the important semiconductor thin film materials silicon nitride (SiN) and hafnium oxide (HfO), gathered for the development of MLFFs. DFT simulations were conducted under various conditions that include differing initial structures, stoichiometry, temperature, strain, and defects. |
10.60732/dbe982a6 |
N, Si |
Geonu Kim, Byunggook Na, Gunhee Kim, Hyuntae Cho, Seung-Jin Kang, Hee Sun Lee, Saerom Choi, Heejae Kim, Seungwon Lee, Yongdeok Kim |
22494 |
1283591 |
2 |
DFT-PBE |
VASP |
62 |
||||
This dataset is a companion dataset to Carbon-24 Unique. Carbon X contains 480 carbon structures of duplicates which have the same cell shape and same number of atoms per unit cell (N=6), with different translations (X) of the fractional coordinates. Carbon_X has been cultivated from Carbon-24 (Pickard 2020, doi: 10.24435/materialscloud:2020.0026/v1). Material IDs from the original dataset are included in the metadata as 'original_id'. |
C |
Maya M. Martirossyan, Thomas Egg, Philipp Hoellmer, George Karypis, Mark Transtrum, Adrian Roitberg, Mingjie Liu, Richard G. Hennig, Ellad B. Tadmor, Stefano Martiniani |
480 |
2880 |
1 |
DFT-PBE |
CASTEP |
62 |
|||||
The train set of a train/test pair from the toluene dataset from sGDML. To create the coupled cluster datasets, the data used for training the models were created by running ab initio MD in the NVT ensemble using the Nosé-Hoover thermostat at 500 K during a 200 ps simulation with a resolution of 0.5 fs. Energies and forces were recalculated using all-electron coupled cluster with single, double and perturbative triple excitations (CCSD(T)). The Dunning correlation-consistent basis set cc-pVDZ was used for malonaldehyde. All calculations were performed with the Psi4 software suite. |
10.60732/05ec452e |
C, H |
Stefan Chmiela, Huziel E. Sauceda, Klaus-Robert Müller, Alexandre Tkatchenko |
997 |
14955 |
2 |
CCSD(T) |
Psi4 |
61 |
||||
Validation configurations from the 'scaffold' split of Chig-AIMD. This dataset covers the conformational space of chignolin with DFT-level precision. We sequentially applied replica exchange molecular dynamics (REMD), conventional MD, and ab initio MD (AIMD) simulations on a 10 amino acid protein, Chignolin, and finally collected 2 million biomolecule structures with quantum level energy and force records. |
10.60732/c182723b |
C, H, N, O |
Tong Wang, Xinheng He, Mingyu Li, Bin Shao, Tie-Yan Liu |
198978 |
33030348 |
4 |
DFT-M06-2X |
ORCA 4.2.1 |
61 |
||||
Approximately 50,000 configurations of Au, Ag and AuAg used as part of a training dataset for a DP-GEN-based ML model for a Ag-Au nanoalloy potential. |
10.60732/a222a0f6 |
Ag, Au |
Yinan Wang, Xiaoyang Wang, Linfeng Zhang, Ben Xu, Han Wang |
51702 |
1186478 |
2 |
DFT-PBE+D3 |
VASP |
61 |
||||
The JARVIS_QMOF dataset is part of the joint automated repository for various integrated simulations (JARVIS) database. This dataset contains configurations from the Quantum Metal-Organic Frameworks (QMOF) dataset, comprising quantum-chemical properties for >14,000 experimentally synthesized MOFs. QMOF contains "DFT-ready" data: filtered to remove omitted, overlapping, unbonded or deleted atoms, along with other kinds of problematic structures commented on in the literature. Data were generated via high-throughput DFT workflow, at the PBE-D3(BJ) level of theory using VASP software. JARVIS is a set of tools and collected datasets built to meet current materials design challenges. |
10.60732/67cd629a |
Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, Hf, Hg, Ho, I, In, Ir, K, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, Np, O, P, Pb, Pd, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Y, Yb, Zn, Zr |
Andrew S. Rosen, Shaelyn M. Iyer, Debmalya Ray, Zhenpeng Yao, Alán Aspuru-Guzik, Laura Gagliardi, Justin M. Notestein, Randall Q. Snurr |
20425 |
2321633 |
79 |
DFT-PBE+D3(BJ) |
VASP 5.4.4 |
61 |
||||
The train set of a train/test pair from the benzene dataset from sGDML. To create the coupled cluster datasets, the data used for training the models were created by running ab initio MD in the NVT ensemble using the Nosé-Hoover thermostat at 500 K during a 200 ps simulation with a resolution of 0.5 fs. Energies and forces were recalculated using all-electron coupled cluster with single , double and perturbative triple excitations (CCSD(T)). The Dunning correlation-consistent basis set cc-pVDZ was used for benzene. All calculations were performed with the Psi4 software suite. |
10.60732/a3ca9725 |
C, H |
Stefan Chmiela, Huziel E. Sauceda, Klaus-Robert Müller, Alexandre Tkatchenko |
999 |
11988 |
2 |
CCSD(T) |
Psi4 |
60 |
||||
Approximately 115,000 configurations of carbon with 200 atoms, with simulated melt, quench, reheat, then annealing at the noted temperature. Includes a variety of carbon structures. |
10.60732/8ecd90ee |
C |
John L. A. Gardner, Zoé Faure Beaulieu, Volker L. Deringer |
115199 |
23039800 |
1 |
IP-C-GAP-17 |
LAMMPS |
60 |
||||
Training dataset from xxMD-CASSCF. The xxMD (Extended Excited-state Molecular Dynamics) dataset is a comprehensive collection of non-adiabatic trajectories encompassing several photo-sensitive molecules. This dataset challenges existing Neural Force Field (NFF) models with broader nuclear configuration spaces that span reactant, transition state, product, and conical intersection regions, making it more chemically representative than its contemporaries. xxMD is divided into two datasets, each with corresponding train, test and validation splits. xxMD-CASSCF contains calculations generated using state-averaged complete active state self-consistent field (SA-CASSCF) electronic theory. xxMD-DFT contains recalculated single-point spin-polarized (unrestricted) DFT values. |
10.60732/3fb520e9 |
C, H, N, O, S |
Zihan Pengmei, Yinan Shu, Junyu Liu |
43393 |
807456 |
5 |
SA-CASSCF |
OpenMolcas 22.06 |
60 |
||||
The JARVIS_QM9_STD_JCTC dataset is part of the joint automated repository for various integrated simulations (JARVIS) database. This dataset contains configurations from the QM9 dataset, originally created as part of the datasets at quantum-machine.org. Units for r2 (electronic spatial extent) are a ^2; for alpha (isotropic polarizability), a ^3; for mu (dipole moment), D; for Cv (heat capacity), cal/mol K. Units for all other properties are eV. JARVIS is a set of tools and collected datasets built to meet current materials design challenges.For the first iteration of DFT calculations, Gaussian 09's default electronic and geometry thresholds have been used for all molecules. For those molecules which failed to reach SCF convergence ultrafine grids have been invoked within a second iteration for evaluating the XC energy contributions. Within a third iteration on the remaining unconverged molecules, we identified those which had relaxed to saddle points, and further tightened the SCF criteria using the keyword scf(maxcycle=200, verytight). All those molecules which still featured imaginary frequencies entered the fourth iteration using keywords, opt(calcfc, maxstep=5, maxcycles=1000). calcfc constructs a Hessian in the first step of the geometry relaxation for eigenvector following. Within the fifth and final iteration, all molecules which still failed to reach convergence, have subsequently been converged using opt(calcall, maxstep=1, maxcycles=1000) |
10.60732/5935fa4d |
C, F, H, N, O |
Raghunathan Ramakrishnan, Pavlo O. Dral, Matthias Rupp, O. Anatole von Lilienfeld |
130829 |
2359192 |
5 |
DFT-B3LYP |
Gaussian 09 |
60 |
||||
The rattled-1000-subsampled training split of OMat24 (Open Materials 2024). OMat24 is a large-scale open dataset of density functional theory (DFT) calculations. The dataset is available in subdatasets and subsampled sub-datasets based on the structure generation strategy used. There are two main splits in OMat24: train and validation, each divided into the aforementioned subsampling and sub-datasets. |
10.60732/ea43e8f5 |
Ac, Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr |
Luis Barroso-Luque, Muhammed Shuaibi, Xiang Fu, Brandon M. Wood, Misko Dzamba, Meng Gao, Ammar Rizvi, C. Lawrence Zitnick, Zachary W. Ulissi |
3879731 |
55648760 |
89 |
DFT-PBE+U |
VASP |
60 |
||||
A set of validation configurations of hydrogenated liquid and amorphous silicon from the datasets for Si-H-GAP. These configurations served to augment the reference set as a final benchmark for NEP model performance. |
10.60732/d2f86b68 |
H, Si |
Davis Unruh, Reza Vatan Meidanshahi, Stephen M. Goodnick, Gábor Csányi, Gergely T. Zimányi |
150 |
23000 |
2 |
DFT-PBE |
Quantum ESPRESSO |
59 |
||||
Validation configurations from the 'random' split of Chig-AIMD. This dataset covers the conformational space of chignolin with DFT-level precision. We sequentially applied replica exchange molecular dynamics (REMD), conventional MD, and ab initio MD (AIMD) simulations on a 10 amino acid protein, Chignolin, and finally collected 2 million biomolecule structures with quantum level energy and force records. |
10.60732/28c9171d |
C, H, N, O |
Tong Wang, Xinheng He, Mingyu Li, Bin Shao, Tie-Yan Liu |
198985 |
33031510 |
4 |
DFT-M06-2X |
ORCA 4.2.1 |
59 |
||||
MatPES (Materials Potential Energy Surface) is a foundational PES dataset developed collaboratively by the Materials Virtual Lab and Materials Project. The v2025.1 r2SCAN release contains structures sampled via the DIRECT method from 300 K NpT molecular dynamics simulations seeded from Materials Project entries. Static DFT calculations were performed using VASP with the r2SCAN meta-GGA functional and MatPESStaticSet convergence settings optimized for energy, force, and stress calculations. There is a companion dataset calculated with the PBE functional (MatPES-PBE-2025.1). |
Ac, Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr |
Aaron D. Kaplan, Runze Liu, Ji Qi, Tsz Wai Ko, Bowen Deng, Janosh Riebesell, Gerbrand Ceder, Kristin A. Persson, Shyue Ping Ong |
387856 |
3059679 |
89 |
DFT-R2SCAN |
VASP 6.4.x |
58 |
|||||
Approximately 145,000 configurations of alkane, aspirin, alpha-glucose and uracil, partly taken from the MD-17 dataset, used in training an 'Atomic Neural Net' model. |
10.60732/82344f5c |
C, H, N, O |
Hao Li, Musen Zhou, Jessalyn Sebastian, Jianzhong Wu, Mengyang Gu |
143756 |
1911045 |
4 |
DFT-PBE-vdW-TS |
Q-Chem |
58 |
||||
The val_aimd-from-PBE-1000-npt validation split of OMat24 (Open Materials 2024). OMat24 is a large-scale open dataset of density functional theory (DFT) calculations. The dataset is available in subdatasets and subsampled sub-datasets based on the structure generation strategy used. There are two main splits in OMat24: train and validation, each divided into the aforementioned subsampling and sub-datasets. |
10.60732/cdd647d5 |
Ac, Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Y, Yb, Zn, Zr |
Luis Barroso-Luque, Muhammed Shuaibi, Xiang Fu, Brandon M. Wood, Misko Dzamba, Meng Gao, Ammar Rizvi, C. Lawrence Zitnick, Zachary W. Ulissi |
202758 |
1710254 |
85 |
DFT-PBE+U |
VASP |
57 |
||||
Over 300,000 configurations in an expanded dataset of 19 hydrogen combustion reaction channels. Intrinsic reaction coordinate calculations (IRC) are combined with ab initio simulations (AIMD) and normal mode displacement (NM) calculations. |
10.60732/ebb9ca58 |
H, O |
Xingyi Guan, Akshaya Das, Christopher J. Stein, Farnaz Heidar-Zadeh, Luke Bertels, Meili Liu, Mojtaba Haghighatlari, Jie Li, Oufan Zhang, Hongxia Hao, Itai Leven, Martin Head-Gordon, Teresa Head-Gordon |
315943 |
1399037 |
2 |
DFT-ωB97X-V |
Q-Chem |
57 |
||||
The original DFT training data for the general-purpose silicon interatomic potential described in the associated publication. The kinds of configuration that we include are chosen using intuition and past experience to guide what needs to be included to obtain good coverage pertaining to a range of properties. |
10.60732/8e9bc5b0 |
Si |
Albert P. Bartók, James Kermode, Noam Bernstein, Gábor Csányi |
2231 |
162365 |
1 |
DFT-PW91, DFT-PBE |
CASTEP |
57 |
||||
Structures from discrepencies_and_error_metrics_NPJ_2023 test set; these include an interstitial. The full discrepencies_and_error_metrics_NPJ_2023 dataset includes the original mlearn_Si_train dataset, modified with the purpose of developing models with better diffusivity scores by replacing ~54% of the data with structures containing migrating interstitials. The enhanced validation set contains 50 total structures, consisting of 20 structures randomly selected from the 120 replaced structures of the original training dataset, 11 snapshots with vacancy rare events (RE) from AIMD simulations, and 19 snapshots with interstitial RE from AIMD simulations. We also construct interstitial-RE and vacancy-RE testing sets, each consisting of 100 snapshots of atomic configurations with a single migrating vacancy or interstitial, respectively, from AIMD simulations at 1230 K. |
10.60732/81a7ca9e |
Si |
Yunsheng Liu, Xingfeng He, Yifei Mo |
100 |
6500 |
1 |
DFT-PBE |
VASP 5.4.4 |
56 |
||||
The JARVIS-MEGNet2 dataset is part of the joint automated repository for various integrated simulations (JARVIS) database. This subset contains 133K materials with formation energy from the Materials Project, as used in the training of the MEGNet ML model. JARVIS is a set of tools and datasets built to meet current materials design challenges. |
10.60732/419ba77a |
Ac, Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr |
Chi Chen, Weike Ye, Yunxing Zuo, Chen Zheng, Shyue Ping Ong |
133407 |
3880004 |
89 |
DFT-PBE |
VASP |
56 |
||||
COMP6v2-wB97X-631Gd is the portion of COMP6v2 calculated at the wB97X/631Gd level of theory. COmprehensive Machine-learning Potential (COMP6) Benchmark Suite version 2.0 is an extension of the COMP6 benchmark found in the following repository: https://github.com/isayev/COMP6. COMP6v2 is a data set of density functional properties for molecules containing H, C, N, O, S, F, and Cl. It is available at the following levels of theory: wB97X/631Gd (data used to train model in the ANI-2x paper); wB97MD3BJ/def2TZVPP; wB97MV/def2TZVPP; B973c/def2mTZVP. The 6 subsets from COMP6 (ANI-MD, DrugBank, GDB07to09, GDB10to13 Tripeptides, and s66x8) are contained in each of the COMP6v2 datasets corresponding to the above levels of theory. |
10.60732/cbced4c5 |
C, Cl, F, H, N, O, S |
Kate Huddleston, Roman Zubatyuk, Justin Smith, Adrian Roitberg, Olexandr Isayev, Ignacio Pickering, Christian Devereux, Kipton Barros |
157718 |
3897748 |
7 |
DFT-ωB97X |
Gaussian 09 |
56 |
||||
Train split from the 216-atom amorphous portion of the aC_JCP_2023 dataset. The amorphous carbon dataset was generated using ab initio calculations with VASP software. We utilized the LDA exchange-correlation functional and the PAW potential for carbon. Melt-quench simulations were performed to create amorphous and liquid-state structures. A simple cubic lattice of 216 carbon atoms was chosen as the initial state. Simulations were conducted at densities of 1.5, 1.7, 2.0, 2.2, 2.4, 2.6, 2.8, 3.0, 3.2, 3.4, and 3.5 g/cm3 to produce a variety of structures. The NVT ensemble was employed for all melt-quench simulations, and the density was adjusted by modifying the size of the simulation cell. A time step of 1 fs was used for the simulations. For all densities, only the Γ points were sampled in the k-space. To increase structural diversity, six independent simulations were performed.In the melt-quench simulations, the temperature was raised from 300 K to 9000 K over 2 ps to melt carbon. Equilibrium molecular dynamics (MD) was conducted at 9000 K for 3 ps to create a liquid state, followed by a decrease in temperature to 5000 K over 2 ps, with the system equilibrating at that temperature for 2 ps. Finally, the temperature was lowered from 5000 K to 300 K over 2 ps to generate an amorphous structure.During the melt-quench simulation, 30 snapshots were taken from the equilibrium MD trajectory at 9000 K, 100 from the cooling process between 9000 and 5000 K, 25 from the equilibrium MD trajectory at 5000 K, and 100 from the cooling process between 5000 and 300 K. This yielded a total of 16,830 data points.Data for diamond structures containing 216 atoms at densities of 2.4, 2.6, 2.8, 3.0, 3.2, 3.4, and 3.5 g/cm3 were also prepared. Further data on the diamond structure were obtained from 80 snapshots taken from the 2 ps equilibrium MD trajectory at 300 K, resulting in 560 data points.To validate predictions for larger structures, we generated data for 512-atom systems using the same procedure as for the 216-atom systems. A single simulation was conducted for each density. The number of data points was 2,805 for amorphous and liquid states |
10.60732/ee630a62 |
C |
Emi Minamitani, Ippei Obayashi, Koji Shimizu, Satoshi Watanabe |
13462 |
2907792 |
1 |
DFT-LDA |
VASP |
56 |
||||
This dataset is a companion dataset to Carbon-24 Unique, containing enantiomorph pairs discovered within the Carbon-24 dataset. Carbon-24_Unique_with_Enantiomorphs has been cultivated from Carbon-24 (Pickard 2020, doi: 10.24435/materialscloud:2020.0026/v1). Contains 4,330 entries of unique carbon structures, where enantiomorphs are treated as distinct. The metadata column indicates the index of the respective enantiomorph pair, if any, as well as the original id from Carbon-24. |
C |
Maya M. Martirossyan, Thomas Egg, Philipp Hoellmer, George Karypis, Mark Transtrum, Adrian Roitberg, Mingjie Liu, Richard G. Hennig, Ellad B. Tadmor, Stefano Martiniani |
4330 |
48260 |
1 |
DFT-PBE |
CASTEP |
56 |
|||||
The val_aimd-from-PBE-1000-nvt validation split of OMat24 (Open Materials 2024). OMat24 is a large-scale open dataset of density functional theory (DFT) calculations. The dataset is available in subdatasets and subsampled sub-datasets based on the structure generation strategy used. There are two main splits in OMat24: train and validation, each divided into the aforementioned subsampling and sub-datasets. |
10.60732/65323852 |
Ac, Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, Hf, Hg, Ho, I, In, Ir, K, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr |
Luis Barroso-Luque, Muhammed Shuaibi, Xiang Fu, Brandon M. Wood, Misko Dzamba, Meng Gao, Ammar Rizvi, C. Lawrence Zitnick, Zachary W. Ulissi |
195575 |
1643554 |
85 |
DFT-PBE+U |
VASP |
55 |
||||
Approximately 7,000 distinct configurations of 2D-silicene, silicon, and PbTe. Silicon data used from http://dx.doi.org/10.1103/PhysRevX.8.041048. Dataset includes predicted force, potential energy and virial values. |
10.60732/7cc0df9e |
Pb, Si, Te |
Zheyong Fan |
7077 |
528999 |
3 |
DFT-PW91, DFT-PBE |
CASTEP, VASP, Quantum ESPRESSO |
55 |
||||
The JARVIS_CFID_3D_8_18_2022 dataset is part of the joint automated repository for various integrated simulations (JARVIS) DFT database. This subset contains configurations of 3D materials. JARVIS is a set of tools and datasets built to meet current materials design challenges. |
10.60732/82106853 |
Ac, Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr |
Kamal Choudhary, Kevin F. Garrity, Andrew C. E. Reid, Brian DeCost, Adam J. Biacchi, Angela R. Hight Walker, Zachary Trautt, Jason Hattrick-Simpers, A. Gilad Kusne, Andrea Centrone, Albert Davydov, Jie Jiang, Ruth Pachter, Gowoon Cheon, Evan Reed, Ankit Agrawal, Xiaofeng Qian, Vinit Sharma, Houlong Zhuang, Sergei V. Kalinin, Bobby G. Sumpter, Ghanshyam Pilania, Pinar Acar, Subhasish Mandal, Kristjan Haule, David Vanderbilt, Karin Rabe, Francesca Tavazza |
55581 |
561509 |
89 |
DFT-optB88-vdW, DFT-TBmBJ |
VASP |
55 |
||||
Approximately 2800 configurations from a train dataset–one of a pair of train/test datasets of aluminum in crystal and melt phases, used for training and testing an ANI neural network model. |
10.60732/af254882 |
Al |
Justin S. Smith, Benjamin Nebgen, Nithin Mathew, Jie Chen, Nicholas Lubbers, Leonid Burakovsky, Sergei Tretiak, Hai Ah Nam, Timothy Germann, Saryu Fensin, Kipton Barros |
2779 |
363129 |
1 |
DFT-PBE |
Quantum ESPRESSO |
54 |
||||
Approximately 5,000 configurations of GeTe used in training of a non-von Neumann multiplication-less DNN model. |
10.60732/c741fb0f |
Ge, Te |
Pinghui Mo, Chang Li, Dan Zhao, Yujia Zhang, Mengchao Shi, Junhua Li, Jie Liu |
5025 |
321600 |
2 |
DFT-GGA |
SIESTA |
54 |
||||
NENCI-2021 is a database of approximately 8000 benchmark Non-Equilibirum Non-Covalent Interaction (NENCI) energies performed on molecular dimers;intermolecular complexes of biological and chemical relevance with a particular emphasis on close intermolecular contacts. Based on dimersfrom the S101 database. |
10.60732/5d2a1ceb |
Br, C, Cl, F, H, Li, N, Na, O, P, S |
Zachary M. Sparrow, Brian G. Ernst, Paul T. Joo, Ka Un Lao, Robert A. DiStasio, Jr |
7763 |
129402 |
11 |
CCSD(T), SAPT2+, MP2 |
Psi4 |
52 |
||||
The test set for UNEP-v1 (version 1 of Unified NeuroEvolution Potential), a model implemented in GPUMD. |
10.60732/b459b5f2 |
Ag, Al, Au, Cr, Cu, Mg, Mo, Ni, Pb, Pd, Pt, Ta, Ti, V, W, Zr |
Keke Song, Rui Zhao, Jiahui Liu, Yanzhou Wang, Eric Lindgren, Yong Wang, Shunda Chen, Ke Xu, Ting Liang, Penghua Ying, Nan Xu, Zhiqiang Zhao, Jiuyang Shi, Junjie Wang, Shuang Lyu, Zezhu Zeng, Shirong Liang, Haikuan Dong, Ligang Sun, Yue Chen, Zhuhua Zhang, Wanlin Guo, Ping Qian, Jian Sun, Paul Erhart, Tapio Ala-Nissila, Yanjing Su, Zheyong Fan |
4411 |
318910 |
16 |
DFT-PBE |
VASP |
52 |
||||
The training set for UNEP-v1 (version 1 of Unified NeuroEvolution Potential), a model implemented in GPUMD. |
10.60732/23c88dd7 |
Ag, Al, Au, Cr, Cu, Mg, Mo, Ni, Pb, Pd, Pt, Ta, Ti, V, W, Zr |
Keke Song, Rui Zhao, Jiahui Liu, Yanzhou Wang, Eric Lindgren, Yong Wang, Shunda Chen, Ke Xu, Ting Liang, Penghua Ying, Nan Xu, Zhiqiang Zhao, Jiuyang Shi, Junjie Wang, Shuang Lyu, Zezhu Zeng, Shirong Liang, Haikuan Dong, Ligang Sun, Yue Chen, Zhuhua Zhang, Wanlin Guo, Ping Qian, Jian Sun, Paul Erhart, Tapio Ala-Nissila, Yanjing Su, Zheyong Fan |
104799 |
6840534 |
16 |
DFT-PBE |
VASP |
51 |
||||
SPICE (Small-Molecule/Protein Interaction Chemical Energies) is a collection of quantum mechanical data for training potential functions. The emphasis is particularly on simulating drug-like small molecules interacting with proteins. Subsets of the dataset include the following: dipeptides: these provide comprehensive sampling of the covalent interactions found in proteins; solvated amino acids: these provide sampling of protein-water and water-water interactions; PubChem molecules: These sample a very wide variety of drug-like small molecules; monomer and dimer structures from DES370K: these provide sampling of a wide variety of non-covalent interactions; ion pairs: these provide further sampling of Coulomb interactions over a range of distances. |
10.60732/a613a175 |
Br, C, Ca, Cl, F, H, I, K, Li, N, Na, O, P, S |
Peter Eastman, Pavan Kumar Behara, David L. Dotson, Raimondas Galvelis, John E. Herr, Josh T. Horton, Yuezhi Mao, John D. Chodera, Benjamin P. Pritchard, Yuanqing Wang, Gianni De Fabritiis, Thomas E. Markland |
116504 |
3382829 |
14 |
DFT-ωB97M+D3(BJ) |
Psi4 1.4.1 |
51 |
||||
Configurations of urea from WS22. The WS22 database combines Wigner sampling with geometry interpolation to generate 1.18 million molecular geometries equally distributed into 10 independent datasets of flexible organic molecules with varying sizes and chemical complexity. In addition to the potential energy and forces required to construct potential energy surfaces, the WS22 database provides several other quantum chemical properties, all obtained via single-point calculations for each molecular geometry. All quantum chemical calculations were performed with the Gaussian 09 program. |
10.60732/8f44aef0 |
C, H, N, O |
Max Pinheiro Jr, Shuang Zhang, Pavlo O. Dral, Mario Barbatti |
119992 |
959936 |
4 |
DFT-PBE0 |
Gaussian 09 |
51 |
||||
The SN2 dataset was generated as a partner benchmark dataset, along with the 'solvated protein fragments' dataset, for measuring the performance of machine learning models, in particular PhysNet, at describing chemical reactions, long-range interactions, and condensed phase systems. SN2 probes chemical reactions of methyl halides with halide anions, i.e. X- + CH3Y -> CH3X + Y-, and contains structures, for all possible combinations of X,Y = F, Cl, Br, I. The dataset also includes various structures for several smaller molecules that can be formed in fragmentation reactions, such as CH3X, HX, CHX or CH2X- as well as geometries for H2, CH2, CH3+ and XY interhalogen compounds. In total, the dataset provides reference energies, forces, and dipole moments for 452709 structurescalculated at the DSD-BLYP-D3(BJ)/def2-TZVP level of theory using ORCA 4.0.1. |
10.60732/31df6835 |
Br, C, Cl, F, H, I |
Oliver T. Unke, Markus Meuwly |
394653 |
2194070 |
6 |
DFT-DSD-BLYP+D3(BJ) |
ORCA 4.0.1 |
51 |
||||
In-domain validation configurations for the initial structure to relaxed total energy (IS2RE) task of OC22. Open Catalyst 2022 (OC22) is a database of training trajectories for predicting catalytic reactions on oxide surfaces meant to complement OC20, which did not contain oxide surfaces. |
10.60732/ced227e5 |
Ag, Al, As, Au, Ba, Be, Bi, C, Ca, Cd, Ce, Co, Cr, Cs, Cu, Fe, Ga, Ge, H, Hf, Hg, In, Ir, K, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Ni, O, Os, Pb, Pd, Pt, Rb, Re, Rh, Ru, Sb, Sc, Se, Si, Sn, Sr, Ta, Te, Ti, Tl, V, W, Y, Zn, Zr |
Richard Tran, Janice Lan, Muhammed Shuaibi, Brandon M. Wood, Siddharth Goyal, Abhishek Das, Javier Heras-Domingo, Adeesh Kolluru, Ammar Rizvi, Nima Shoghi, Anuroop Sriram, Felix Therrien, Jehad Abed, Oleksandr Voznyy, Edward H. Sargent, Zachary Ulissi, C. Lawrence Zitnick |
441623 |
35243458 |
57 |
DFT-PBE+U |
VASP |
51 |
||||
The JARVIS_Open_Catalyst_10K dataset is part of the joint automated repository for various integrated simulations (JARVIS) DFT database. This subset contains configurations from the 10K training, rest validation and test dataset from the Open Catalyst Project (OCP). JARVIS is a set of tools and datasets built to meet current materials design challenges. |
10.60732/b10d497c |
Ag, Al, As, Au, B, Bi, C, Ca, Cd, Cl, Co, Cr, Cs, Cu, Fe, Ga, Ge, H, Hf, Hg, In, Ir, K, Mn, Mo, N, Na, Nb, Ni, O, Os, P, Pb, Pd, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Tc, Te, Ti, Tl, V, W, Y, Zn, Zr |
Lowik Chanussot, Abhishek Das, Siddharth Goyal, Thibaut Lavril, Muhammed Shuaibi, Morgane Riviere, Kevin Tran, Javier Heras-Domingo, Caleb Ho, Weihua Hu, Aini Palizhati, Anuroop Sriram, Brandon Wood, Junwoong Yoon, Devi Parikh, C. Lawrence Zitnick, Zachary Ulissi |
34938 |
2719837 |
56 |
DFT-rPBE |
VASP |
51 |
||||
Benzene training PBE-TS dataset from "Semi-local and hybrid functional DFT data for thermalised snapshots of polymorphs of benzene, succinic acid, and glycine". DFT reference energies and forces were calculated using Quantum Espresso v6.3. The calculations were performed with the semi-local PBE xc functional, Tkatchenko-Scheffler dispersion correction, optimised norm-conserving Vanderbilt pseudopotentials, a Monkhorst-Pack k-point grid with a maximum spacing of 0.06 x 2π A^-1, and a plane-wave energy cut-off of 100 Ry for the wavefunction. |
10.60732/6b905ba8 |
C, H |
Venkat Kapil, Edgar A. Engel |
54990 |
1601760 |
2 |
DFT-PBE+TS |
Quantum ESPRESSO v6.3 |
51 |
||||
This dataset is a companion dataset to Carbon-24 Unique. Carbon NXL is intended for use in training of minimal “overfitting” testing cases. Contains 353 carbon structures of duplicates which have different numbers of atoms per unit cell (N=6—16), different cell shapes L, and different translations X of the fractional coordinates. Carbon_NXL has been cultivated from Carbon-24 (Pickard 2020, doi: 10.24435/materialscloud:2020.0026/v1). Material IDs from the original dataset are included in the metadata as 'original_id'. Please cite Martirossyan et al. (https://arxiv.org/abs/2509.12178) if your work utilizes this dataset. |
C |
Maya M. Martirossyan, Thomas Egg, Philipp Hoellmer, George Karypis, Mark Transtrum, Adrian Roitberg, Mingjie Liu, Richard G. Hennig, Ellad B. Tadmor, Stefano Martiniani |
353 |
2540 |
1 |
DFT-PBE |
CASTEP |
51 |
|||||
Carolina Materials contains structures used to train several machine learning models for the efficient generation of hypothetical inorganic materials. The database is built using structures from OQMD, Materials Project and ICSD, as well as ML generated structures validated by DFT. |
10.60732/f2f98394 |
Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Cl, Co, Cr, Cs, Cu, F, Fe, Ga, Ge, H, Hf, Hg, I, In, Ir, K, Li, Mg, Mn, Mo, N, Na, Nb, Ni, O, Os, P, Pb, Pd, Po, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Tc, Te, Ti, Tl, V, W, Y, Zn, Zr |
Yong Zhao, Mohammed Al-Fahdi, Ming Hu, Edirisuriya M. D. Siriwardane, Yuqi Song, Alireza Nasiri, Jianjun Hu |
214267 |
3168298 |
64 |
DFT-PBE |
VASP |
50 |
||||
Training simulations from CGM-MLP_natcomm2023 of carbon deposition on a Cr surface. This dataset was one of the datasets used in training during the process of producing an active learning dataset for the purposes of exploring substrate-catalyzed deposition on metal surfaces such as Cu(111), Cr(110), Ti(001), and oxygen-contaminated Cu(111) as a means of controllable synthesis of carbon nanomaterials. The combined dataset includes structures from the Carbon_GAP_20 dataset and additional configurations of carbon clusters on a Cu, Cr and Ti surfaces. |
10.60732/e25bae2e |
C, Cr |
Di Zhang, Peiyun Yi, Xinmin Lai, Linfa Peng, Hao Li |
1192 |
298114 |
2 |
DFT-PBE+D3 |
CP2K |
50 |
||||
Configurations of I from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K. |
10.60732/57c54149 |
I |
Christopher M. Andolina, Wissam A. Saidi |
4436 |
113623 |
1 |
DFT-PBE |
VASP |
50 |
||||
Test configurations with MD simulations performed at 1200K from 3BPA, used to showcase the performance of linear atomic cluster expansion (ACE) force fields in a machine learning model to predict the potential energy surfaces of organic molecules. |
10.60732/397ba16b |
C, H, N, O |
Dávid Péter Kovács, Cas van der Oord, Jiri Kucera, Alice E. A. Allen, Daniel J. Cole, Christoph Ortner, Gábor Csányi |
2139 |
57753 |
4 |
DFT-ωB97X |
ORCA |
50 |
||||
10,000 configurations of SiO2 used as an example for the SIMPLE-NN machine learning model. Dataset includes three types of crystals: quartz, cristobalite and tridymite; amorphous; and liquid phase SiO2. Structures with distortion from compression, monoaxial strain and shear strain were also included in the training set. |
10.60732/9903bf08 |
O, Si |
Kyuhyun Lee, Dongsun Yoo, Wonseok Jeong, Seungwu Han |
9997 |
599820 |
2 |
DFT-PBE |
VASP |
50 |
||||
Training set for a magnetic Moment Tensor Potential (mMTP) for paramagnetic B1-CrN, created via active learning. Contains 2423 configurations of 64-atom CrN supercells with collinear atomic magnetic moments and magnetic forces (negative derivatives of energy with respect to magnetic moments, in eV/mu_B). Configurations generated using constrained DFT (cDFT) with ABINIT and PAW PBE pseudopotentials with a 6x6x6 k-point mesh and 25 Hartree plane-wave cutoff energy. The fitted mMTP accurately reproduces elastic constants, phonon spectrum, linear thermal expansion coefficient, and specific heat capacity of paramagnetic B1-CrN, with thermal properties (quasi-harmonic approximation) in good agreement with experimental results. Note: ColabFit dataset contains energy, atomic forces, and stress. Refer to the original files for per-atom magnetic moment and magnetic force data. |
Cr, N |
Alexey S. Kotykhov, Max Hodapp, Christian Tantardini, Konstantin Kravtsov, Ivan Kruglov, Alexander V. Shapeev, Ivan S. Novikov |
1702 |
150080 |
2 |
DFT-PBE |
ABINIT |
50 |
|||||
588 structures selected from the AIMD simulation of the Cu(111) slab, including both the C1-C18 clusters on the Cu(111) slab. This dataset was one of the datasets used in testing screening parameters during the process of producing an active learning dataset for Cu-C interactions for the purposes of exploring substrate-catalyzed deposition as a means of controllable synthesis of carbon nanomaterials. The combined dataset includes structures from the Carbon_GAP_20 dataset and additional configurations of carbon clusters on a Cu(111) surface. |
10.60732/9f0e607d |
C, Cu |
Di Zhang, Peiyun Yi, Xinmin Lai, Linfa Peng, Hao Li |
588 |
115460 |
2 |
DFT-PBE+D3 |
CP2K |
49 |
||||
Dataset containing DFT calculations of energy and forces for all configurations in the QM9 dataset, recalculated with the ωB97X functional and 6-31G(d) basis set. Recalculating the energy and forces causes a slight shift of the potential energy surface, which results in forces acting on most configurations in the dataset. The data was generated by running Nudged Elastic Band (NEB) calculations with DFT on 10k reactions while saving intermediate calculations. QM9x is used as a benchmarking and comparison dataset for the dataset Transition1x. |
10.60732/1edbb6e0 |
C, F, H, N, O |
Mathias Schreiner, Arghya Bhowmik, Tejs Vegge, Jonas Busk, Ole Winther |
133871 |
2407494 |
5 |
DFT-ωB97X |
ORCA 5.0.2 |
49 |
||||
Configurations of water, acetonitrile and methanol, simulated with ASE and modeled using a variety of software and methods: GAP, SchNet, GDML, ORCA and mbGDML. Forces and potential energy included; metadata includes kinetic energy and velocities. |
10.60732/717087e2 |
C, H, N, O |
Alex M. Maldonado |
24509 |
711324 |
4 |
IP-SchNet, GFN2-xTB, IP-mbGDML, IP-GAP, MP2 |
ORCA |
49 |
||||
Validation dataset from xxMD-CASSCF. The xxMD (Extended Excited-state Molecular Dynamics) dataset is a comprehensive collection of non-adiabatic trajectories encompassing several photo-sensitive molecules. This dataset challenges existing Neural Force Field (NFF) models with broader nuclear configuration spaces that span reactant, transition state, product, and conical intersection regions, making it more chemically representative than its contemporaries. xxMD is divided into two datasets, each with corresponding train, test and validation splits. xxMD-CASSCF contains calculations generated using state-averaged complete active state self-consistent field (SA-CASSCF) electronic theory. xxMD-DFT contains recalculated single-point spin-polarized (unrestricted) DFT values. |
10.60732/cea2a8c1 |
C, H, N, O, S |
Zihan Pengmei, Yinan Shu, Junyu Liu |
21616 |
402369 |
5 |
SA-CASSCF |
OpenMolcas 22.06 |
49 |
||||
Dataset (DFT-10B) contains structures of the 10 binary alloys AgCu, AlFe, AlMg, AlNi, AlTi, CoNi, CuFe, CuNi, FeV, and NbNi. Each alloy system includes all possible unit cells with 1-8 atoms for face-centered cubic (fcc) and body-centered cubic (bcc) crystal types, and all possible unit cells with 2-8 atoms for the hexagonal close-packed (hcp) crystal type. This results in 631 fcc, 631 bcc, and 333 hcp structures, yielding 1595 x 10 = 15,950 unrelaxed structures in total. Lattice parameters for each crystal structure were set according to Vegard's law. Total energies were computed using DFT with projector-augmented wave (PAW) potentials within the generalized gradient approximation (GGA) of Perdew, Burke, and Ernzerhof (PBE) as implemented in the Vienna Ab Initio Simulation Package (VASP). The k-point meshes for sampling the Brillouin zone were constructed using generalized regular grids. |
10.60732/941b9553 |
Ag, Al, Co, Cu, Fe, Mg, Nb, Ni, Ti, V |
Chandramouli Nyshadham, Matthias Rupp, Brayden Bekker, Alexander V. Shapeev, Tim Mueller, Conrad W. Rosenbrock, Gábor Csányi, David W. Wingate, Gus L. W. Hart |
15920 |
116380 |
10 |
DFT-PBE |
VASP |
49 |
||||
Lowest-energy structures with up to 4 heavy atoms from Vector-QM24 (VQM24) with properties calculated using diffusion quantum Monte Carlo (DMC) after DFT optimization. Vector-QM24 is a quantum chemistry dataset of ~836 thousand small organic and inorganic molecules. Dataset covers all possible neutral closed-shell small organic and inorganic molecules with up to five heavy (p-block) atoms: C, N, O, F, Si, P, S, Cl, Br. |
Br, C, Cl, F, H, N, O, P, S, Si |
Danish Khan, Anouar Benali, Scott Y. H. Kim, Guido Falk von Rudorff, O. Anatole von Lilienfeld |
10780 |
79933 |
10 |
DMC-PBE0-ccECP |
QMCPACK |
49 |
|||||
Configurations of Sb from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K. |
10.60732/7980ece8 |
Sb |
Christopher M. Andolina, Wissam A. Saidi |
5107 |
115196 |
1 |
DFT-PBE |
VASP |
48 |
||||
Configurations from a cG-SchNet trained on a subset of the QM9dataset. Model was trained with the intention of providing molecules withspecified functional groups or motifs, relying on sampling of molecularfingerprint data. Relaxation data for the generated molecules is computedusing ORCA software. Configuration sets include raw data fromcG-SchNet-generated configurations, with models trained on several differenttypes of target data and DFT relaxation data as a separate configurationset. Includes approximately 80,000 configurations. |
10.60732/de8af6a2 |
C, F, H, N, O |
Niklas W.A. Gebauer, Michael Gastegger, Stefaan S.P. Hessmann, Klaus-Robert Müller, Kristof T. Schütt |
23632 |
418729 |
5 |
IP-cgSchNet |
ORCA |
48 |
||||
This database contains computationally generated atomic structures of glass-ceramics lithium thiophosphates (gc-LPS) with the general composition (Li2S)x(P2S5)1-x. Total energies and interatomic forces from density-functional theory (DFT) calculations are included. The DFT calculations used projector-augmented-wave (PAW) pseudopotentials and the Perdew-Burke-Ernzerhof (PBE) exchange-correlation functional as implemented in the Vienna Ab Initio Simulation Package (VASP) and a kinetic energy cutoff of 520 eV. The first Brillouin zone was sampled using VASP's fully automatic k-point scheme with a length parameter Rk = 25Å. The gc-LPS structures were generated using a combination of different sampling methods. Initial amorphous structure models were generated with ab initio molecular dynamics (AIMD) simulations of supercells at 1200 K using a Nose-Hoover thermostat with a time step of 1 fs. To obtain near-ground-state structures as reference for the machine-learning potential, 150 evenly spaced snapshots were extracted from the AIMD trajectories that were reoptimized with DFT geometry optimizations at zero Kelvin. Additional structures were generated by scaling the lattice parameters of the crystalline LPS structures (see below) by ±15% and perturbing atomic positions in AIMD simulations as described above.The resulting database was used to train a specialized ANN potential for the sampling of structures along the Li2S-P2S5 composition line with a genetic-algorithm (GA) as implemented in the atomistic evolution (ævo) package, following a previously reported protocol. Starting from supercells of the ideal crystal structures, either Li and S atoms were removed with a ratio of 2:1, or P and S atoms were removed with a ratio of 2:5, and low-energy configurations were determined with GA sampling. A population size of 32 trials and a mutation rate of 10% were employed. The ANN potential was iteratively refined by including additional sampled structures in the training. For each composition, at least 10 lowest energy structure models identified with the ANN-GA approach were selected and fully relaxed with DFT.Also included in the present database are the XSF files of the previously reported crystalline phases LiPS3, Li2PS3, Li4P2S7, Li7P3S11, α-Li3PS4, β-Li3PS4, γ-Li3PS4, and Li48P16S61. The crystal structures were obtained from the Inorganic Crystal Structure Database (ICSD). the Materials Project (MP) database, the Open Quantum Materials Database (OQMD), and the AFLOW database. The configuration names indicate the journal reference and the database. |
10.60732/0a15fe72 |
Li, P, S |
Haoyue Guo, Nongnuch Artrith |
6055 |
264604 |
3 |
DFT-PBE |
VASP |
48 |
||||
Validation configurations from the SAIT_semiconductors_ACS_2023_SiN dataset. This dataset contains SiN, Si and N configurations from the SAIT semiconductors datasets. SAIT semiconductors datasets comprise two rich datasets for the important semiconductor thin film materials silicon nitride (SiN) and hafnium oxide (HfO), gathered for the development of MLFFs. DFT simulations were conducted under various conditions that include differing initial structures, stoichiometry, temperature, strain, and defects. |
10.60732/1eaf36bf |
N, Si |
Geonu Kim, Byunggook Na, Gunhee Kim, Hyuntae Cho, Seung-Jin Kang, Hee Sun Lee, Saerom Choi, Heejae Kim, Seungwon Lee, Yongdeok Kim |
2822 |
159951 |
2 |
DFT-PBE |
VASP |
48 |
||||
A dataset created as part of a combination DFT-ML approach to study three alkali metals (K, Li, Na) in model carbon systems at a range of densities and degrees of disorder. The purpose of the study was to investigate the properties of alkali metals in hard (non-graphitising) and nanoporous carbons as potential anode materials for battery technology. |
10.60732/441f40b7 |
C, K, Li, Na |
Jian-Xing Huang, Gábor Csányi, Jin-Bao Zhao, Jun Cheng, Volker L. Deringer |
1365 |
298050 |
4 |
DFT-optB88-vdW |
VASP 5.4.4 |
48 |
||||
Validation set from COLL. Consists of configurations taken from molecular collisions of different small organic molecules. Energies and forces for 140,000 random snapshots taken from these trajectories were recomputed with density functional theory (DFT). These calculations were performed with the revPBE functional and def2-TZVP basis, including D3 dispersion corrections |
10.60732/a1ccb643 |
C, H, O |
Johannes Gasteiger, Shankari Giri, Johannes T. Margraf, Stephan Günnemann |
9999 |
101829 |
3 |
DFT-revPBE+D3 |
ORCA |
48 |
||||
The dataset consists of energies and forces for pristine and defected monolayer graphene, bilayer graphene, and
graphite in various states. The configurations in the dataset are generated in two ways: (1) crystals with distortions
(compression and stretching of the simulation cell together with random perturbations of atoms), and (2) configura-
tions drawn from ab initio molecular dynamics (AIMD) trajectories at 300, 900, and 1500 K.
For monolayer graphene, the configurations include:
* pristine
- In-plane compressed and stretched monolayers
- AIMD trajectories
* defected
- Configurations from the minimization of a monolayer with a single vacancy
- AIMD trajectories of monolayers with a single vacancy
For bilayer graphene, the configurations include:
* pristine
- AB-stacked bilayers with compression and stretching in the basal plane
- Bilayers with different translational registry (e.g. AA, AB, and SP) at various layer separations
- Twisted bilayers with different twisting angles at various layer separations
- AIMD trajectories of twisted bilayers and bilayers in AB and AA stackings
* defected
- Configurations from the minimization of a bilayer with a single vacancy in each layer
- AIMD trajectories of a bilayer with a single vacancy in one layer and the other layer pristine
- AIMD trajectories of a bilayer with a single vacancy in each layer; Initial configuration without interlayer
bonds
- AIMD trajectories of a bilayer with a single vacancy in each layer; Initial configuration with interlayer
bonds formed
For graphite, the configurations include:
* pristine
- Graphite with compression and stretching in the basal plane
- Graphite with compression and stretching along the c-axis
- AIMD trajectories
|
10.60732/ce311990 |
C |
Mingjian Wen, Ellad B. Tadmor |
14179 |
656204 |
1 |
DFT-PBE |
VASP 5.x.x |
48 |
||||
Energies of the isolated atoms evalauted at the reference DFT settings. Acetylacetone dataset generated from a long molecular dynamics simulation at 300 K using a Langevin thermostat at the semi-empirical GFN2-xTB level of theory. Configurations were sampled at an interval of 1 ps and the resulting set of configurations were recomputed with density functional theory using the PBE exchange-correlation functional with D3 dispersion correction and def2-SVP basis set and VeryTightSCF convergence settings using the ORCA electronic structure package. |
10.60732/1e359db4 |
C, H, O |
Ilyes Batatia, Simon Batzner, Dávid Péter Kovács, Albert Musaelian, Gregor N. C. Simm, Ralf Drautz, Christoph Ortner, Boris Kozinsky, Gábor Csányi |
3 |
3 |
3 |
DFT-PBE+D3 |
ORCA 5.0 |
47 |
||||
The training + validation set from the doped CsPbI3 energetics dataset. This dataset was created to explore the effect of Cd and Pb substitutions on the structural stability of inorganic lead halide perovskite CsPbI3. CsPbI3 undergoes a direct to indirect band-gap phase transition at room temperature. The dataset contains configurations of CsPbI3 with low levels of Cd and Zn, which were used to train a GNN model to predict the energetics of structures with higher levels of substitutions. |
10.60732/16af950e |
Cd, Cs, I, Pb, Zn |
Roman A. Eremin, Innokentiy S. Humonen, Alexey A. Kazakov, Vladimir D. Lazarev, Anatoly P. Pushkarev, Semen A. Budennyy |
140 |
22400 |
5 |
DFT-PBE |
VASP |
47 |
||||
Training configurations with MD simulations performed at 300K from 3BPA, used to showcase the performance of linear atomic cluster expansion (ACE) force fields in a machine learning model to predict the potential energy surfaces of organic molecules. |
10.60732/5f5bae68 |
C, H, N, O |
Dávid Péter Kovács, Cas van der Oord, Jiri Kucera, Alice E. A. Allen, Daniel J. Cole, Christoph Ortner, Gábor Csányi |
500 |
13500 |
4 |
DFT-ωB97X |
ORCA |
47 |
||||
Approximately 6,500 configurations of Sn, including Sn8, Sn16 and Sn32, used in developing a deep potential that predicts the phase diagram of Sn. |
10.60732/7d8a06fe |
Sn |
Tao Chen, Fengbo Yuan, Jianchuan Liu, Huayun Geng, Linfeng Zhang, Han Wang, Mohan Chen |
6612 |
111768 |
1 |
DFT-SCAN |
VASP |
47 |
||||
Configurations of dmabn from WS22. The WS22 database combines Wigner sampling with geometry interpolation to generate 1.18 million molecular geometries equally distributed into 10 independent datasets of flexible organic molecules with varying sizes and chemical complexity. In addition to the potential energy and forces required to construct potential energy surfaces, the WS22 database provides several other quantum chemical properties, all obtained via single-point calculations for each molecular geometry. All quantum chemical calculations were performed with the Gaussian 09 program. |
10.60732/ad4e82a6 |
C, H, N |
Max Pinheiro Jr, Shuang Zhang, Pavlo O. Dral, Mario Barbatti |
119994 |
2519874 |
3 |
DFT-PBE0 |
Gaussian 09 |
47 |
||||
This dataset was used for the training of an MLIP for amorphous alumina (a-AlOx). Two configurations sets correspond to i) the actual training data and ii) additional reference data. Ab initio calculations were performedwith the Vienna Ab initio Simulation Package. The projector augmented wave method was used to treat the atomic core electrons,and the Perdew-Burke-Ernzerhof functional within the generalized gradient approximation was used to describe the electron-electron interactions. The cutoff energy for the plane-wave basis set was set to 550 eV during the ab initio calculation. The obtained reference database includes the DFT energies of 41,203 structures. The supercell size of the AlOx reference structures varied from 24 to 132 atoms. K-point values are given for structures with: Al0, Al12, Al24, Al48 and Al192. |
10.60732/96296d27 |
Al, O |
Wenwen Li, Yasunobu Ando, Satoshi Watanabe |
123560 |
4541194 |
2 |
DFT-PBE |
VASP |
47 |
||||
The JARVIS_Open_Catalyst_100K dataset is part of the joint automated repository for various integrated simulations (JARVIS) DFT database. This subset contains configurations from the 100K training, rest validation and test dataset from the Open Catalyst Project (OCP). JARVIS is a set of tools and datasets built to meet current materials design challenges. |
10.60732/ae1c7e2f |
Ag, Al, As, Au, B, Bi, C, Ca, Cd, Cl, Co, Cr, Cs, Cu, Fe, Ga, Ge, H, Hf, Hg, In, Ir, K, Mn, Mo, N, Na, Nb, Ni, O, Os, P, Pb, Pd, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Tc, Te, Ti, Tl, V, W, Y, Zn, Zr |
Lowik Chanussot, Abhishek Das, Siddharth Goyal, Thibaut Lavril, Muhammed Shuaibi, Morgane Riviere, Kevin Tran, Javier Heras-Domingo, Caleb Ho, Weihua Hu, Aini Palizhati, Anuroop Sriram, Brandon Wood, Junwoong Yoon, Devi Parikh, C. Lawrence Zitnick, Zachary Ulissi |
124929 |
9719646 |
56 |
DFT-rPBE |
VASP |
47 |
||||
TiO2 dataset that was designed to build atom neural network potentials (ANN) by Artrith et al. using the AENET package. This dataset includes various crystalline phases of TiO2 and MD data that are extracted from ab inito calculations. The dataset includes 7815 structures with 165,229 atomic environments in the stochiometric ratio of 66% O to 34% Ti. |
10.60732/861c6a25 |
O, Ti |
Nongnuch Artrith, Alexander Urban |
7809 |
165080 |
2 |
DFT-PBE |
Quantum ESPRESSO |
47 |
||||
This dataset was created for the purpose of training an MLIP for silica (SiO2). For initial DFT computations, GPAW (in combination with ASE) was used with LDA, PBE and PBEsol functionals; and VASP with the SCAN functional. All calculations used the projector augmented-wave method. After comparison, it was found that SCAN performed best, and all values were recalculated using SCAN. An energy cut-off of 900 eV and a k-spacing of 0.23 Å-1 were used. |
10.60732/c2bee5fa |
O, Si |
Linus C. Erhard, Jochen Rohrer, Karsten Albe, Volker L. Deringer |
3074 |
268118 |
2 |
DFT-SCAN |
VASP |
47 |
||||
Reference C, H, O, and N atoms from 3BPA, used to showcase the performance of linear atomic cluster expansion (ACE) force fields in a machine learning model to predict the potential energy surfaces of organic molecules. |
10.60732/bfdb46b7 |
C, H, N, O |
Dávid Péter Kovács, Cas van der Oord, Jiri Kucera, Alice E. A. Allen, Daniel J. Cole, Christoph Ortner, Gábor Csányi |
4 |
4 |
4 |
DFT-ωB97X |
ORCA |
46 |
||||
The JARVIS_EPC_2D dataset is part of the joint automated repository for various integrated simulations (JARVIS) DFT database. This subset contains configurations sourced from the JARVIS-DFT-2D dataset, rerelaxed with Quantum ESPRESSO. JARVIS is a set of tools and datasets built to meet current materials design challenges. |
10.60732/c7d2c9cd |
Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cl, Co, Cr, Cu, F, Fe, Ga, Ge, H, Hf, I, In, Ir, K, La, Li, Mg, Mo, N, Na, Nb, Ni, O, P, Pb, Pd, Pt, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Te, Ti, Tl, V, W, Y, Zn, Zr |
Daniel Wines, Kamal Choudhary, Adam J. Biacchi, evin F. Garrity, Francesca Tavazza |
161 |
788 |
55 |
DFT-PBEsol |
Quantum ESPRESSO |
46 |
||||
Succinic acid test PBE0-MBD dataset from "Semi-local and hybrid functional DFT data for thermalised snapshots of polymorphs of benzene, succinic acid, and glycine". DFT reference energies and forces were calculated using Quantum Espresso v6.3. The calculations were performed with the semi-local PBE xc functional, Tkatchenko-Scheffler dispersion correction, optimised norm-conserving Vanderbilt pseudopotentials, a Monkhorst-Pack k-point grid with a maximum spacing of 0.06 x 2π A^-1, and a plane-wave energy cut-off of 100 Ry for the wavefunction. |
10.60732/cfea7523 |
C, H, O |
Venkat Kapil, Edgar A. Engel |
200 |
5600 |
3 |
DFT-PBE+TS |
Quantum ESPRESSO v6.3 |
46 |
||||
Training data generated for GAP-20. GAP-20 describes the properties of the bulk crystalline and amorphous phases, crystal surfaces, and defect structures with an accuracy approaching that of direct ab initio simulation, but at a significantly reduced cost. The final potential is fitted to reference data computed using the optB88-vdW density functional theory (DFT) functional. |
10.60732/9d095830 |
C |
Patrick Rowe, Volker L. Deringer, Piero Gasparotto, Gábor Csányi, Angelos Michaelides |
6088 |
400275 |
1 |
DFT-optB88-vdW |
VASP |
46 |
||||
Validation dataset from xxMD-DFT. The xxMD (Extended Excited-state Molecular Dynamics) dataset is a comprehensive collection of non-adiabatic trajectories encompassing several photo-sensitive molecules. This dataset challenges existing Neural Force Field (NFF) models with broader nuclear configuration spaces that span reactant, transition state, product, and conical intersection regions, making it more chemically representative than its contemporaries. xxMD is divided into two datasets, each with corresponding train, test and validation splits. xxMD-CASSCF contains calculations generated using state-averaged complete active state self-consistent field (SA-CASSCF) electronic theory. xxMD-DFT contains recalculated single-point spin-polarized (unrestricted) DFT values. |
10.60732/bd646241 |
C, H, N, O, S |
Zihan Pengmei, Yinan Shu, Junyu Liu |
21605 |
402142 |
5 |
DFT-M06 |
Psi4 |
46 |
||||
ANI-1xnr was developed to train the ANI-1xnr model, intended to model reactive chemistry. Specifically, ANI-1xnr is meant to represent carbon solid-phase nucleation, graphene ring formation from acetylene, biofuel additives, combustion of methane and the spontaneous formation of glycine from early earth small molecules. The dataset was generated using an active learning method in which ab initio nanoreactor simulations supplied MLIP training; the MLIP was subsequently tested and new simulations were generated based on structures tested with high uncertainty to supply the next cycle of MLIP training. |
10.60732/ad56ac0a |
C, H, N, O |
Shuhao Zhang, Małgorzata Z. Makoś, Ryan B. Jadrich, Elfi Kraka, Kipton Barros, Benjamin T. Nebgen, Sergei Tretiak, Olexandr Isayev, Nicholas Lubbers, Richard A. Messerly, Justin S. Smith |
196550 |
27209270 |
4 |
KS-DFT-BLYP+D3 |
CP2K |
45 |
||||
Dataset containing MD trajectories of the buckyball-catcher supramolecule from the MD22 benchmark set. MD22 represents a collection of datasets in a benchmark that can be considered an updated version of the MD17 benchmark datasets, including more challenges with respect to system size, flexibility and degree of non-locality. The datasets in MD22 include MD trajectories of the protein Ac-Ala3-NHMe; the lipid DHA (docosahexaenoic acid); the carbohydrate stachyose; nucleic acids AT-AT and AT-AT-CG-CG; and the buckyball catcher and double-walled nanotube supramolecules. Each of these is included here in a separate dataset, as represented on sgdml.org. Calculations were performed using FHI-aims and i-Pi software at the DFT-PBE+MBD level of theory. Trajectories were sampled at temperatures between 400-500 K at 1 fs resolution. |
10.60732/3ac33c6f |
C, H |
Stefan Chmiela, Valentin Vassilev-Galindo, Oliver T. Unke, Adil Kabylda, Huziel E. Sauceda, Alexandre Tkatchenko, Klaus-Robert Müller |
6102 |
903096 |
2 |
DFT-PBE+MBE |
FHI-aims |
45 |
||||
The rattled-300-subsampled training split of OMat24 (Open Materials 2024). OMat24 is a large-scale open dataset of density functional theory (DFT) calculations. The dataset is available in subdatasets and subsampled sub-datasets based on the structure generation strategy used. There are two main splits in OMat24: train and validation, each divided into the aforementioned subsampling and sub-datasets. |
10.60732/42702fb9 |
Ac, Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr |
Luis Barroso-Luque, Muhammed Shuaibi, Xiang Fu, Brandon M. Wood, Misko Dzamba, Meng Gao, Ammar Rizvi, C. Lawrence Zitnick, Zachary W. Ulissi |
3463993 |
49674369 |
88 |
DFT-PBE+U |
VASP |
45 |
||||
This dataset comprises a training dataset for magnetic multi-component machine-learning potentials for Fe-Al systems, including different concentrations of Fe and Al (Al concentrations from 0%-50%), with fully equilibrated and perturbed atomic positions, lattice vectors and magnetic moments represented. |
10.60732/9d635e27 |
Al, Fe |
Alexey S. Kotykhov, Konstantin Gubaev, Max Hodapp, Christian Tantardini, Alexander V. Shapeev, Ivan S. Novikov |
434 |
6944 |
2 |
DFT-PBE |
ABINIT |
45 |
||||
1590 configurations of H2O/water with total energy and forces calculated using a hybrid approach at DFT/revPBE0-D3 level of theory. |
10.60732/07f8deb4 |
H, O |
Bingqing Cheng, Edgar A. Engel, Jörg Behler, Christoph Dellago, Michele Ceriotti |
1588 |
304896 |
2 |
DFT-revPBE0+D3 |
CP2K |
45 |
||||
This dataset was originally designed to fit a GAP potential with a specific focus on properties relevant for simulations of radiation-induced collision cascades and the damage they produce, including a realistic repulsive potential for short-range many-body cascade dynamics and a good description of the liquid phase. |
10.60732/6367ea51 |
W |
Jesper Byggmästar, Ali Hamedani, Kai Nordlund, Flyura Djurabekova |
3528 |
42068 |
1 |
DFT-PBE |
VASP |
45 |
||||
The rattled-300 training split of OMat24 (Open Materials 2024). OMat24 is a large-scale open dataset of density functional theory (DFT) calculations. The dataset is available in subdatasets and subsampled sub-datasets based on the structure generation strategy used. There are two main splits in OMat24: train and validation, each divided into the aforementioned subsampling and sub-datasets. |
10.60732/444965da |
Ac, Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr |
Luis Barroso-Luque, Muhammed Shuaibi, Xiang Fu, Brandon M. Wood, Misko Dzamba, Meng Gao, Ammar Rizvi, C. Lawrence Zitnick, Zachary W. Ulissi |
6319089 |
89791992 |
88 |
DFT-PBE+U |
VASP |
44 |
||||
Binning-binning configurations from CA-9 dataset used during validation step for NNP_BB potential. CA-9 consists of configurations of carbon with curated subsets chosen to test the effects of intentionally choosing dissimilar configurations when training neural network potentials |
10.60732/95c19122 |
C |
Daniel Hedman, Tom Rothe, Gustav Johansson, Fredrik Sandin, J. Andreas Larsson, Yoshiyuki Miyamoto |
4003 |
233034 |
1 |
DFT-PBE |
VASP |
44 |
||||
Training dataset from xxMD-DFT. The xxMD (Extended Excited-state Molecular Dynamics) dataset is a comprehensive collection of non-adiabatic trajectories encompassing several photo-sensitive molecules. This dataset challenges existing Neural Force Field (NFF) models with broader nuclear configuration spaces that span reactant, transition state, product, and conical intersection regions, making it more chemically representative than its contemporaries. xxMD is divided into two datasets, each with corresponding train, test and validation splits. xxMD-CASSCF contains calculations generated using state-averaged complete active state self-consistent field (SA-CASSCF) electronic theory. xxMD-DFT contains recalculated single-point spin-polarized (unrestricted) DFT values. |
10.60732/8e7f6d7c |
C, H, N, O, S |
Zihan Pengmei, Yinan Shu, Junyu Liu |
43385 |
807298 |
5 |
DFT-M06 |
Psi4 |
44 |
||||
Approximately 850 configurations of CoSb3 and Mg3Sb2 generated using a dual adaptive sampling (DAS) method for use with machine learning of interatomic potentials (MLIP). |
10.60732/d28a2c1d |
Mg, Sb |
Hongliang Yang, Yifan Zhu, Erting Dong, Yabei Wu, Jiong Yang, Wenqing Zhang |
846 |
247744 |
2 |
DFT-PBE |
VASP |
43 |
||||
tmQM_wB97MV contains configurations from the tmQM dataset, with several structures from tmQM that were found to be missing hydrogens filtered out, and energies of all other structures recomputed at the wB97M-V/def2-SVPD level of DFT. |
10.60732/4144e554 |
Ag, As, Au, B, Br, C, Cd, Cl, Co, Cr, Cu, F, Fe, H, Hf, Hg, I, Ir, La, Mn, Mo, N, Nb, Ni, O, Os, P, Pd, Pt, Re, Rh, Ru, S, Sc, Se, Si, Ta, Tc, Ti, V, W, Y, Zn, Zr |
Aaron G. Garrison, Javier Heras-Domingo, John R. Kitchin, Gabriel dos Passos Gomes, Zachary W. Ulissi, Samuel M. Blau |
86501 |
5710563 |
44 |
DFT-ωB97M-V |
Q-Chem |
43 |
||||
Test split from the 216-atom amorphous portion of the aC_JCP_2023 dataset. The amorphous carbon dataset was generated using ab initio calculations with VASP software. We utilized the LDA exchange-correlation functional and the PAW potential for carbon. Melt-quench simulations were performed to create amorphous and liquid-state structures. A simple cubic lattice of 216 carbon atoms was chosen as the initial state. Simulations were conducted at densities of 1.5, 1.7, 2.0, 2.2, 2.4, 2.6, 2.8, 3.0, 3.2, 3.4, and 3.5 g/cm3 to produce a variety of structures. The NVT ensemble was employed for all melt-quench simulations, and the density was adjusted by modifying the size of the simulation cell. A time step of 1 fs was used for the simulations. For all densities, only the Γ points were sampled in the k-space. To increase structural diversity, six independent simulations were performed.In the melt-quench simulations, the temperature was raised from 300 K to 9000 K over 2 ps to melt carbon. Equilibrium molecular dynamics (MD) was conducted at 9000 K for 3 ps to create a liquid state, followed by a decrease in temperature to 5000 K over 2 ps, with the system equilibrating at that temperature for 2 ps. Finally, the temperature was lowered from 5000 K to 300 K over 2 ps to generate an amorphous structure.During the melt-quench simulation, 30 snapshots were taken from the equilibrium MD trajectory at 9000 K, 100 from the cooling process between 9000 and 5000 K, 25 from the equilibrium MD trajectory at 5000 K, and 100 from the cooling process between 5000 and 300 K. This yielded a total of 16,830 data points.Data for diamond structures containing 216 atoms at densities of 2.4, 2.6, 2.8, 3.0, 3.2, 3.4, and 3.5 g/cm3 were also prepared. Further data on the diamond structure were obtained from 80 snapshots taken from the 2 ps equilibrium MD trajectory at 300 K, resulting in 560 data points.To validate predictions for larger structures, we generated data for 512-atom systems using the same procedure as for the 216-atom systems. A single simulation was conducted for each density. The number of data points was 2,805 for amorphous and liquid states |
10.60732/4ca1927e |
C |
Emi Minamitani, Ippei Obayashi, Koji Shimizu, Satoshi Watanabe |
3366 |
727056 |
1 |
DFT-LDA |
VASP |
43 |
||||
Structures from the SAIT_semiconductors_ACS_2023_SiN dataset, separated into N-only, Si-only, SiN, and out-of-domain melt, quench and relax configuration sets. This dataset contains SiN, Si and N configurations from the SAIT semiconductors datasets. SAIT semiconductors datasets comprise two rich datasets for the important semiconductor thin film materials silicon nitride (SiN) and hafnium oxide (HfO), gathered for the development of MLFFs. DFT simulations were conducted under various conditions that include differing initial structures, stoichiometry, temperature, strain, and defects. |
10.60732/ef14d3da |
N, Si |
Geonu Kim, Byunggook Na, Gunhee Kim, Hyuntae Cho, Seung-Jin Kang, Hee Sun Lee, Saerom Choi, Heejae Kim, Seungwon Lee, Yongdeok Kim |
88111 |
5201559 |
2 |
DFT-PBE |
VASP |
42 |
||||
The rattled-1000 validation split of OMat24 (Open Materials 2024). OMat24 is a large-scale open dataset of density functional theory (DFT) calculations. The dataset is available in subdatasets and subsampled sub-datasets based on the structure generation strategy used. There are two main splits in OMat24: train and validation, each divided into the aforementioned subsampling and sub-datasets. |
10.60732/4623b5e6 |
Ac, Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Y, Yb, Zn, Zr |
Luis Barroso-Luque, Muhammed Shuaibi, Xiang Fu, Brandon M. Wood, Misko Dzamba, Meng Gao, Ammar Rizvi, C. Lawrence Zitnick, Zachary W. Ulissi |
117004 |
1657765 |
86 |
DFT-PBE+U |
VASP |
42 |
||||
The a-AQUA dataset was generated to address the need for a training set for a water PES that includes 2-body, 3-body and 4-body interactions calculated at the CCSD(T) level of theory. Structures were selected from the existing HBB2-pol and MB-pol datasets. For each water dimer structure, CCSD(T)/aug-cc-pVTZ calculations were performed with an additional 3s3p2d1f basis set; exponents equal to (0.9, 0.3, 0.1) for sp, (0.6, 0.2) for d, and 0.3 for f. This additional basis is placed at the center of mass (COM) of each dimer configuration. The basis set superposition error (BSSE) correction was determined with the counterpoise scheme. CCSD(T)/aug-cc-pVQZ calculations were then performed with the same additional basis set and BSSE correction. Final CCSD(T)/CBS energies were obtained by extrapolation over the CCSD(T)/aug-cc-pVTZ and CCSD(T)/aug-cc-pVQZ 2-b energies. All ab initio calculations were performed using Molpro package.Trimer structures were calculated at CCSD(T)-F12a/aug-cc-pVTZ with BSSE correction. Four-body structure calculations were performed at CCSD(T)-F12 level. |
10.60732/e8b084a9 |
H, O |
Qi Yu, Chen Qu, Paul L. Houston, Riccardo Conte, Apurba Nandi, Joel M. Bowman |
120162 |
877128 |
2 |
CCSD(T)/CBS, CCSD(T)-F12a, CCSD(T)-F12 |
MOLPRO |
42 |
||||
The rattled-500 training split of OMat24 (Open Materials 2024). OMat24 is a large-scale open dataset of density functional theory (DFT) calculations. The dataset is available in subdatasets and subsampled sub-datasets based on the structure generation strategy used. There are two main splits in OMat24: train and validation, each divided into the aforementioned subsampling and sub-datasets. |
10.60732/95b8a22e |
Ac, Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr |
Luis Barroso-Luque, Muhammed Shuaibi, Xiang Fu, Brandon M. Wood, Misko Dzamba, Meng Gao, Ammar Rizvi, C. Lawrence Zitnick, Zachary W. Ulissi |
6922153 |
98860300 |
88 |
DFT-PBE+U |
VASP |
42 |
||||
9,200 configurations of beta-Ga2O3, including two configuration sets. One contains DFT data for 8400 configurations simulated between temperatures of 50K - 600K. The second contains configurations with 0K simulation temperature. |
10.60732/6fd38e1f |
Ga, O |
Ruiyang Li, Zeyu Liu, Andrew Rohskopf, Kiarash Gordiz, Asegun Henry, Eungkyu Lee, Tengfei Luo |
9200 |
2944000 |
2 |
DFT-QUICKSTEP |
CP2K |
42 |
||||
Training simulations from CGM-MLP_natcomm2023 of carbon on a Cu metal surface. This dataset was one of the datasets used in training during the process of producing an active learning dataset for the purposes of exploring substrate-catalyzed deposition on metal surfaces such as Cu(111), Cr(110), Ti(001), and oxygen-contaminated Cu(111) as a means of controllable synthesis of carbon nanomaterials. The combined dataset includes structures from the Carbon_GAP_20 dataset and additional configurations of carbon clusters on a Cu, Cr and Ti surfaces. |
10.60732/76552006 |
C, Cu |
Di Zhang, Peiyun Yi, Xinmin Lai, Linfa Peng, Hao Li |
520 |
122294 |
2 |
DFT-PBE+D3 |
CP2K |
41 |
||||
The rattled-500-subsampled training split of OMat24 (Open Materials 2024). OMat24 is a large-scale open dataset of density functional theory (DFT) calculations. The dataset is available in subdatasets and subsampled sub-datasets based on the structure generation strategy used. There are two main splits in OMat24: train and validation, each divided into the aforementioned subsampling and sub-datasets. |
10.60732/ed9a1102 |
Ac, Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr |
Luis Barroso-Luque, Muhammed Shuaibi, Xiang Fu, Brandon M. Wood, Misko Dzamba, Meng Gao, Ammar Rizvi, C. Lawrence Zitnick, Zachary W. Ulissi |
3975399 |
56846329 |
89 |
DFT-PBE+U |
VASP |
41 |
||||
Dataset for "Interplay between ferroelectricity and metallicity in BaTiO3", exploring properties of ferroelectric barium titanate (BaTiO3), including the effects of electron and hole doping. Includes configuration sets for unit cells and supercells of BaTiO3. |
10.60732/9abdf618 |
Al, Ba, K, La, Nb, O, Sc, Ti, V |
Veronica F. Michel, Tobias Esswein, Nicola A. Spaldin |
1062 |
18715 |
9 |
DFT-PBEsol |
VASP |
41 |
||||
Dataset generated using a committee-based active learning strategy to build a training dataset for modeling complex aqueous systems. |
10.60732/07d278f0 |
B, C, F, H, Mo, N, O, S, Ti |
Christoph Schran, Fabian L. Thiemann, Patrick Rowe, Erich A. Müller, Ondrej Marsalek, Angelos Michaelides |
1786 |
681912 |
9 |
DFT-optB88-vdW, DFT-PBE+D3, DFT-revPBE0+D3, DFT-BLYP+D3 |
CP2K |
41 |
||||
Configurations of Nb from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K. |
10.60732/2146db76 |
Nb |
Christopher M. Andolina, Wissam A. Saidi |
3114 |
54086 |
1 |
DFT-PBE |
VASP |
41 |
||||
Test dataset from xxMD-DFT. The xxMD (Extended Excited-state Molecular Dynamics) dataset is a comprehensive collection of non-adiabatic trajectories encompassing several photo-sensitive molecules. This dataset challenges existing Neural Force Field (NFF) models with broader nuclear configuration spaces that span reactant, transition state, product, and conical intersection regions, making it more chemically representative than its contemporaries. xxMD is divided into two datasets, each with corresponding train, test and validation splits. xxMD-CASSCF contains calculations generated using state-averaged complete active state self-consistent field (SA-CASSCF) electronic theory. xxMD-DFT contains recalculated single-point spin-polarized (unrestricted) DFT values. |
10.60732/690e82cc |
C, H, N, O, S |
Zihan Pengmei, Yinan Shu, Junyu Liu |
21661 |
402856 |
5 |
DFT-M06 |
Psi4 |
41 |
||||
53,841 structures of alpha-brass (less than 40% Zinc). Includes atomic forces and total energy. Calculated using VASP at the DFT level of theory. |
10.60732/f127f7e7 |
Cu, Zn |
Jan Weinreich, Anton Römer, Martín Leandro Paleico, Jörg Behler |
53475 |
2951436 |
2 |
DFT-PBE |
VASP |
41 |
||||
The training split of the Transition1x dataset. Transition1x is a benchmark dataset containing 9.6 million Density Functional Theory (DFT) calculations of forces and energies of molecular configurations on and around reaction pathways at the ωB97x/6-31 G(d) level of theory. The configurations contained in this dataset allow a better representation of features in transition state regions when compared to other benchmark datasets -- in particular QM9 and ANI1x. |
10.60732/b1104cc5 |
C, H, N, O |
Mathias Schreiner, Arghya Bhowmik, Tejs Vegge, Jonas Busk, Ole Winther |
62988 |
535993 |
4 |
DFT-ωB97X |
ORCA 5.0.2 |
41 |
||||
The MAD benchmark dataset, containing a selection of MAD test, MPtrj, Alexandria, SPICE, MD22 and OC2020 datasets, computed with MAD DFT settings. Part of the MAD (Massive Atomic Diversity) dataset family. From the creators: Starting from relatively small sets of stable structures, the dataset is built to contain “massive atomic diversity” (MAD) by aggressively distorting these configurations, with near-complete disregard for the stability of the resulting configurations. The electronic structure details, on the other hand, are chosen to maximize consistency rather than to obtain the most accurate prediction fora given structure, or to minimize computational effort. The MAD dataset we present here, despite containing fewer than 100k structures, has already been shown to enable training universal interatomic potentials that are competitive with models trained on traditional datasets with two to three orders of magnitude more structures. |
10.60732/b1f21e20 |
Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, O, Os, P, Pb, Pd, Pm, Pr, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Ti, Tl, Tm, V, W, Xe, Y, Yb, Zn, Zr |
Arslan Mazitov, Sofiia Chorna, Guillaume Fraux, Marnik Bercx, Giovanni Pizzi, Sandip De, Michele Ceriotti |
1884 |
44748 |
81 |
DFT-PBEsol |
VASP |
40 |
||||
Configurations of urocanic from WS22. The WS22 database combines Wigner sampling with geometry interpolation to generate 1.18 million molecular geometries equally distributed into 10 independent datasets of flexible organic molecules with varying sizes and chemical complexity. In addition to the potential energy and forces required to construct potential energy surfaces, the WS22 database provides several other quantum chemical properties, all obtained via single-point calculations for each molecular geometry. All quantum chemical calculations were performed with the Gaussian 09 program. |
10.60732/4b1f8c83 |
C, H, N, O |
Max Pinheiro Jr, Shuang Zhang, Pavlo O. Dral, Mario Barbatti |
119986 |
1919776 |
4 |
DFT-PBE0 |
Gaussian 09 |
40 |
||||
Glycine validation PBE0-MBD dataset from "Semi-local and hybrid functional DFT data for thermalised snapshots of polymorphs of benzene, succinic acid, and glycine". DFT reference energies and forces were calculated using Quantum Espresso v6.3. The calculations were performed with the semi-local PBE xc functional, Tkatchenko-Scheffler dispersion correction, optimised norm-conserving Vanderbilt pseudopotentials, a Monkhorst-Pack k-point grid with a maximum spacing of 0.06 x 2π A^-1, and a plane-wave energy cut-off of 100 Ry for the wavefunction. |
10.60732/b102ddfd |
C, H, N, O |
Venkat Kapil, Edgar A. Engel |
200 |
7120 |
4 |
DFT-PBE+TS |
Quantum ESPRESSO v6.3 |
39 |
||||
The solvated protein fragments dataset was generated as a partner benchmark dataset, along with SN2, for measuring the performance of machine learning models, in particular PhysNet, at describing chemical reactions, long-range interactions, and condensed phase systems. The dataset contains structures for all possible "amons" (hydrogen-saturated covalently bonded fragments) of up to eight heavy atoms (C, N, O, S) that can be derived from chemical graphs of proteins containing the 20 natural amino acids connected via peptide bonds or disulfide bridges. For amino acids that can occur in different charge states due to (de)protonation (i.e., carboxylic acids that can be negatively charged or amines that can be positively charged), all possible structures with up to a total charge of +-2e are included. In total, the dataset provides reference energies, forces, and dipole moments for 2,731,180 structures calculated at the revPBE-D3(BJ)/def2-TZVP level of theory using ORCA 4.0.1. |
10.60732/c4731f07 |
C, H, N, O, S |
Oliver T. Unke, Markus Meuwly |
2730942 |
58390211 |
5 |
DFT-revPBE+D3(BJ) |
ORCA 4.0.1 |
39 |
||||
The DFT-2D-3-12-2021 dataset is part of the joint automated repository for various integrated simulations (JARVIS) DFT database. This subset contains configurations of 2D materials. JARVIS is a set of tools and datasets built to meet current materials design challenges. |
10.60732/8a437fac |
Ac, Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cu, Dy, Er, F, Fe, Ga, Ge, H, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, Np, O, Os, P, Pa, Pb, Pd, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Y, Yb, Zn, Zr |
Kamal Choudhary, Kevin F. Garrity, Andrew C. E. Reid, Brian DeCost, Adam J. Biacchi, Angela R. Hight Walker, Zachary Trautt, Jason Hattrick-Simpers, A. Gilad Kusne, Andrea Centrone, Albert Davydov, Jie Jiang, Ruth Pachter, Gowoon Cheon, Evan Reed, Ankit Agrawal, Xiaofeng Qian, Vinit Sharma, Houlong Zhuang, Sergei V. Kalinin, Bobby G. Sumpter, Ghanshyam Pilania, Pinar Acar, Subhasish Mandal, Kristjan Haule, David Vanderbilt, Karin Rabe, Francesca Tavazza |
887 |
6230 |
81 |
DFT-optB88-vdW, DFT-TBmBJ |
VASP |
39 |
||||
Dataset for "Analysis of minerals as electrode materials for Ca-based rechargeable batteries". Includes DFT structures of pyroxenes, garnet and carbonates. Dataset was produced to pursue identification of Ca-based high specific energy cathode materials. |
10.60732/134fd579 |
C, Ca, Cr, Mn, O, Si |
M. Elena Arroyo-de Dompablo, Jose Luis Casals |
4726 |
550074 |
6 |
DFT-PBE |
VASP |
39 |
||||
Data from the publication "Enlisting Potential Cathode Materials for Rechargeable Ca Batteries". The development of rechargeable batteries based on a Ca metal anode demands the identification of suitable cathode materials. This work investigates the potential application of a variety of compounds, which are selected from the In-organic Crystal Structural Database (ICSD) considering 3d-transition metal oxysulphides, pyrophosphates, silicates, nitrides, and phosphates with a maximum of four different chemical elements in their composition. Cathode perfor-mance of CaFeSO, CaCoSO, CaNiN, Ca3MnN3, Ca2Fe(Si2O7), CaM(P2O7) (M = V, Cr, Mn, Fe, Co), CaV2(P2O7)2, Ca(VO)2(PO4)2 and α-VOPO4 is evaluated throughout the calculation of operation voltages, volume changes associated to the redox reaction and mobility of Ca2+ ions. Some materials exhibit attractive specific capacities and intercalation voltages combined with energy barriers for Ca migration around 1 eV (CaFeSO, Ca2FeSi2O7 and CaV2(P2O7)2). Based on the DFT results, αI-VOPO4 is identified as a potential Ca-cathode with a maximum theoretical specific capacity of 312 mAh/g, an average intercalation voltage of 2.8 V and calculated energy barriers for Ca migration below 0.65 eV (GGA functional). |
10.60732/49ecd7c5 |
Ca, Co, Fe, Mn, N, Ni, O, P, S, Si, V |
M. Elena Arroyo-de Dompablo, Jose Luis Casals |
10839 |
1034708 |
11 |
DFT-PBE |
VASP |
39 |
||||
The rattled-300 validation split of OMat24 (Open Materials 2024). OMat24 is a large-scale open dataset of density functional theory (DFT) calculations. The dataset is available in subdatasets and subsampled sub-datasets based on the structure generation strategy used. There are two main splits in OMat24: train and validation, each divided into the aforementioned subsampling and sub-datasets. |
10.60732/b3c0c67d |
Ac, Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, Hf, Hg, Ho, I, In, Ir, K, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Zn, Zr |
Luis Barroso-Luque, Muhammed Shuaibi, Xiang Fu, Brandon M. Wood, Misko Dzamba, Meng Gao, Ammar Rizvi, C. Lawrence Zitnick, Zachary W. Ulissi |
62451 |
883431 |
84 |
DFT-PBE+U |
VASP |
39 |
||||
40 graphite structures with different lattice constants ranging from 2.0 to 3.2 Å, with a 0.03 Å increment. This dataset was one of the datasets used in testing screening parameters during the process of producing an active learning dataset for Cu-C interactions for the purposes of exploring substrate-catalyzed deposition as a means of controllable synthesis of carbon nanomaterials. The combined dataset includes structures from the Carbon_GAP_20 dataset and additional configurations of carbon clusters on a Cu(111) surface. |
10.60732/85590078 |
C |
Di Zhang, Peiyun Yi, Xinmin Lai, Linfa Peng, Hao Li |
41 |
1968 |
1 |
DFT-PBE+D3 |
CP2K |
38 |
||||
Data from "On-the-fly assessment of diffusion barriers of disordered transition metal oxyfluorides using local descriptors". The dataset contains the result of 48 Nudged Elastic Band calculations of Li(2-x)VO2F diffusion barriers. The NEB was performed with VASP, using projector augmented-wave (PAW) method to describe electron-ion interaction. The disordered rock salt cells were created using a 3 x 4 x 4 supercell containing 96 atoms (in case of no vacancies). PBE is used as XC functional while a rotationally invariant Hubbard U correction was applied to the d orbital of V with a U value of 3.25 eV. |
10.60732/ada99db2 |
F, Li, O, V |
Jin Hyun Chang, Peter Bjørn Jørgensen, Simon Loftager, Arghya Bhowmik, Juan María García Lastra, Tejs Vegge |
233 |
20670 |
4 |
DFT-PBE+U |
VASP |
38 |
||||
The rattled-500 validation split of OMat24 (Open Materials 2024). OMat24 is a large-scale open dataset of density functional theory (DFT) calculations. The dataset is available in subdatasets and subsampled sub-datasets based on the structure generation strategy used. There are two main splits in OMat24: train and validation, each divided into the aforementioned subsampling and sub-datasets. |
10.60732/906de541 |
Ac, Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Y, Yb, Zn, Zr |
Luis Barroso-Luque, Muhammed Shuaibi, Xiang Fu, Brandon M. Wood, Misko Dzamba, Meng Gao, Ammar Rizvi, C. Lawrence Zitnick, Zachary W. Ulissi |
68830 |
985338 |
85 |
DFT-PBE+U |
VASP |
38 |
||||
The training dataset for GST_GAP_22, recalculated using the PBE functional. GST-GAP-22 contains configurations of phase-change materials on the quasi-binary GeTe-Sb2Te3 (GST) line of chemical compositions. Data was used for training a machine learning interatomic potential to simulate a range of germanium-antimony-tellurium compositions under realistic device conditions. |
10.60732/164f9a70 |
Ge, Sb, Te |
Yuxing Zhou, Wei Zhang, Evan Ma, Volker L. Deringer |
2690 |
341004 |
3 |
DFT-PBE |
CASTEP |
38 |
||||
Glycine training PBE0-MBD dataset from "Semi-local and hybrid functional DFT data for thermalised snapshots of polymorphs of benzene, succinic acid, and glycine". DFT reference energies and forces were calculated using Quantum Espresso v6.3. The calculations were performed with the semi-local PBE xc functional, Tkatchenko-Scheffler dispersion correction, optimised norm-conserving Vanderbilt pseudopotentials, a Monkhorst-Pack k-point grid with a maximum spacing of 0.06 x 2π A^-1, and a plane-wave energy cut-off of 100 Ry for the wavefunction. |
10.60732/358cb5ee |
C, H, N, O |
Venkat Kapil, Edgar A. Engel |
3582 |
109570 |
4 |
DFT-PBE+TS |
Quantum ESPRESSO v6.3 |
38 |
||||
Configurations of Ge from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K. |
10.60732/33767c4f |
Ge |
Christopher M. Andolina, Wissam A. Saidi |
2810 |
188884 |
1 |
DFT-PBE |
VASP |
38 |
||||
Dataset containing MD trajectories of the tetrasaccharide stachyose from the MD22 benchmark set. MD22 represents a collection of datasets in a benchmark that can be considered an updated version of the MD17 benchmark datasets, including more challenges with respect to system size, flexibility and degree of non-locality. The datasets in MD22 include MD trajectories of the protein Ac-Ala3-NHMe; the lipid DHA (docosahexaenoic acid); the carbohydrate stachyose; nucleic acids AT-AT and AT-AT-CG-CG; and the buckyball catcher and double-walled nanotube supramolecules. Each of these is included here in a separate dataset, as represented on sgdml.org. Calculations were performed using FHI-aims and i-Pi software at the DFT-PBE+MBD level of theory. Trajectories were sampled at temperatures between 400-500 K at 1 fs resolution. |
10.60732/e2a66d93 |
C, H, O |
Stefan Chmiela, Valentin Vassilev-Galindo, Oliver T. Unke, Adil Kabylda, Huziel E. Sauceda, Alexandre Tkatchenko, Klaus-Robert Müller |
27272 |
2372664 |
3 |
DFT-PBE+MBE |
FHI-aims |
38 |
||||
The JARVIS_AGRA_CO dataset is part of the joint automated repository for various integrated simulations (JARVIS) DFT database. This dataset contains data from the CO2 reduction reaction (CO2RR) dataset from Chen et al., as used in the automated graph representation algorithm (AGRA) training dataset: a collection of DFT training data for training a graph representation method to extract the local chemical environment of metallic surface adsorption sites. JARVIS is a set of tools and datasets built to meet current materials design challenges. |
10.60732/3e9ff0b1 |
C, Co, Cu, Fe, Mo, Ni, O |
Zhi Wen Chen, Zachary Gariepy, Lixin Chen, Xue Yao, Abu Anand, Szu-Jia Liu, Conrard Giresse Tetsassi Feugmo, Isaac Tamblyn, Chandra Veer Singh |
194 |
12804 |
7 |
DFT-PBE |
VASP |
37 |
||||
This iron nanoparticles database contains dimers; trimers; bcc, fcc, hexagonal close-packed (hcp), simple cubic, and diamond crystalline structures. A wide range of cell parameters, as well as rattled structures, bcc-fcc and bcc-hcp transitional structures, surface slabs cleaved from relaxed bulk structures, nanoparticles and liquid configurations are included. The energy, forces and virials for the atomic structures were computed at the DFT level of theory using VASP with the PBE functional and standard PAW pseudopotentials for Fe (with 8 valence electrons, 4s^23d^6). The kinetic energy cutoff for plane waves was set to 400 eV and the energy threshold for convergence was 10-7 eV. All the DFT calculations were carried out with spin polarization. |
10.60732/20ba88af |
Fe |
Richard Jana, Miguel A. Caro |
198 |
20097 |
1 |
DFT-PBE |
VASP |
37 |
||||
A set of training configurations of hydrogenated liquid and amorphous silicon from the datasets for Si-H-GAP. Includes virial sigmas used for configurations used in the corresponding publication (virial-sigma-paper) as well as an alternate configuration defined by doubled virial sigma prefactors (from 0.025 to 0.05). |
10.60732/43a0cef7 |
H, Si |
Davis Unruh, Reza Vatan Meidanshahi, Stephen M. Goodnick, Gábor Csányi, Gergely T. Zimányi |
392 |
65909 |
2 |
DFT-PBE |
Quantum ESPRESSO |
37 |
||||
133,855 configurations of stable small organic molecules composed of CHONF. A subset of GDB-17, with calculations of energies, dipole moment, polarizability and enthalpy. Calculations performed at B3LYP/6-31G(2df,p) level of theory. |
10.60732/b82731e4 |
C, F, H, N, O |
Raghunathan Ramakrishnan, Pavlo O. Dral, Matthias Rupp, O. Anatole von Lilienfeld |
133877 |
2407626 |
5 |
DFT-B3LYP |
Gaussian 09 |
37 |
||||
OC20_S2EF_val_ood_ads is the out-of-domain validation set of the OC20 Structure to Energy and Forces (S2EF) dataset featuring unseen adsorbate. Features include energy, atomic forces and data from the OC20 mappings file, including adsorbate id, materials project bulk id and miller index. |
10.60732/d820e77c |
Ag, Al, As, Au, B, Bi, C, Ca, Cd, Cl, Co, Cr, Cs, Cu, Fe, Ga, Ge, H, Hf, Hg, In, Ir, K, Mn, Mo, N, Na, Nb, Ni, O, Os, P, Pb, Pd, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Tc, Te, Ti, Tl, V, W, Y, Zn, Zr |
Lowik Chanussot, Abhishek Das, Siddharth Goyal, Thibaut Lavril, Muhammed Shuaibi, Morgane Riviere, Kevin Tran, Javier Heras-Domingo, Caleb Ho, Weihua Hu, Aini Palizhati, Anuroop Sriram, Brandon Wood, Junwoong Yoon, Devi Parikh, C. Lawrence Zitnick, Zachary Ulissi |
999838 |
72858155 |
56 |
DFT-rPBE |
VASP |
37 |
||||
This dataset contains four trajectories of amorphous zeolitic imidazolate frameworks (ZIF-4), liquids calculated at four different volumes and at temperatures of 1500K and 1750K; and three trajectories of the ZIF-4 crystal: one at 300K and two at 1500K. Data was generated at the DFT-PBE-D3 level of theory. |
10.60732/a6b0da5e |
C, H, N, Zn |
Nicolas Castel, Dune Andre, Connor Edwards, Jack D. Evans, Francois-Xavier Coudert |
1189732 |
323607104 |
4 |
DFT-PBE+D3 |
CP2K |
37 |
||||
A dataset of 64-atom silicon configurations in four phases: cubic-diamond, (beta)-tin, R8, and liquid. MD simulations are run at 300, 600 and 900 K for solid phases; up to 2500 K for the L phase. All relaxations performed at zero pressure. Additional configurations prepared by random distortion of crystal structures. VASP was used with a PAW pseudopotential and PBE exchange correlation. k-point mesh was optimized for energy convergence of 0.5 meV/atom and stress convergence of 0.1 kbar. The plane wave energy cutoff was set to 300 eV. To reduce the correlation between data points MD, data were thinned by using one of every 100 consecutive structures from the MD simulations at 300 K and one of every 20 structures from higher temperature MD simulations. |
10.60732/68b1a5ad |
Si |
Ekin D. Cubuk, Brad D. Malone, Berk Onat, Amos Waterland, Efthimios Kaxiras |
1110 |
71040 |
1 |
DFT-PBE |
VASP |
37 |
||||
3,000 Al-Ga-In sesquioxides with energies and band gaps. Relaxed and Vegard's Law geometries with formation energy and band gaps at DFT-PBE level of theory of (Alx-Gay-Inz)2O3 oxides, x+y+z=1. Contains all structures from the NOMAD 2018 Kaggle challenge training and leaderboard data. The formation energy and bandgap energy were computed by using the PBE exchange-correlation DFT functional with the all-electron electronic structure code FHI-aims with tight setting. |
10.60732/e4af85f8 |
Al, Ga, In, O |
Christopher Sutton, Luca M. Ghiringhelli, Takenori Yamamoto, Yury Lysogorskiy, Lars Blumenthal, Thomas Hammerschmidt, Jacek R. Golebiowski, Xiangyue Liu, Angelo Ziletti, Matthias Scheffler |
3000 |
185070 |
4 |
DFT-PBE |
FHI-aims |
37 |
||||
GAP-20 describes the properties of the bulk crystalline and amorphous phases, crystal surfaces, and defect structures with an accuracy approaching that of direct ab initio simulation, but at a significantly reduced cost. The final potential is fitted to reference data computed using the optB88-vdW density functional theory (DFT) functional. |
10.60732/fd1b78a8 |
C |
Patrick Rowe, Volker L. Deringer, Piero Gasparotto, Gábor Csányi, Angelos Michaelides |
16906 |
1270764 |
1 |
DFT-optB88-vdW |
VASP |
37 |
||||
The DFT with D2 vdW corrections split of the Graphene-hBN_and_Graphene-Graphene dataset. This dataset family (see other Graphene-hBN_and_Graphene_Graphene datasets) contains data for Graphene-Graphene and Graphene-hexagonal boron nitride (hBN) ab initio calculations for structures with different interlayer distances and disregistries, calculated using DFT with D2 van der Waals corrections, DFT with D3 van der Waals corrections, and QMC methods. |
B, C, N |
Kittithat Krongchon, Lucas K. Wagner, Tawfiqur Rakib, Daniel Palmer, Elif Ertekin, Harley T. Johnson |
368 |
13248 |
3 |
DFT-PBE+D2 |
Quantum ESPRESSO |
37 |
|||||
Succinic acid test PBE-TS dataset from "Semi-local and hybrid functional DFT data for thermalised snapshots of polymorphs of benzene, succinic acid, and glycine". DFT reference energies and forces were calculated using Quantum Espresso v6.3. The calculations were performed with the semi-local PBE xc functional, Tkatchenko-Scheffler dispersion correction, optimised norm-conserving Vanderbilt pseudopotentials, a Monkhorst-Pack k-point grid with a maximum spacing of 0.06 x 2π A^-1, and a plane-wave energy cut-off of 100 Ry for the wavefunction. |
10.60732/0b56af0a |
C, H, O |
Venkat Kapil, Edgar A. Engel |
500 |
14000 |
3 |
DFT-PBE+TS |
Quantum ESPRESSO v6.3 |
36 |
||||
The JARVIS-2DMatPedia dataset is part of the joint automated repository for various integrated simulations (JARVIS) database. This subset contains configurations with 2D materials from the 2DMatPedia database, generated through two methods: a top-down exfoliation approach, using structures of bulk materials from the Materials Project database; and a bottom-up approach, replacing each element in a 2D material with another from the same group (according to column number). JARVIS is a set of tools and datasets built to meet current materials design challenges. |
10.60732/a2df077f |
Ac, Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, Hf, Hg, Ho, I, In, Ir, K, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, Np, O, Os, P, Pa, Pb, Pd, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr |
Jun Zhou, Lei Shen, Miguel Dias Costa, Kristin A. Persson, Shyue Ping Ong", "Patrick Huck, Yunhao Lu, Xiaoyang Ma, Yiming Chen, Hanmei Tang, Yuan Ping Feng |
6351 |
66295 |
83 |
DFT-optB88-vdW |
VASP |
36 |
||||
Configurations of toluene from WS22. The WS22 database combines Wigner sampling with geometry interpolation to generate 1.18 million molecular geometries equally distributed into 10 independent datasets of flexible organic molecules with varying sizes and chemical complexity. In addition to the potential energy and forces required to construct potential energy surfaces, the WS22 database provides several other quantum chemical properties, all obtained via single-point calculations for each molecular geometry. All quantum chemical calculations were performed with the Gaussian 09 program. |
10.60732/c6e1f25a |
C, H |
Max Pinheiro Jr, Shuang Zhang, Pavlo O. Dral, Mario Barbatti |
99995 |
1499925 |
2 |
DFT-PBE0 |
Gaussian 09 |
36 |
||||
Test configurations from the SAIT_semiconductors_ACS_2023_HfO dataset. This dataset contains HfO configurations from the SAIT semiconductors datasets. SAIT semiconductors datasets comprise two rich datasets for the important semiconductor thin film materials silicon nitride (SiN) and hafnium oxide (HfO), gathered for the development of MLFFs. DFT simulations were conducted under various conditions that include differing initial structures, stoichiometry, temperature, strain, and defects. |
10.60732/e7100354 |
Hf, O |
Geonu Kim, Byunggook Na, Gunhee Kim, Hyuntae Cho, Seung-Jin Kang, Hee Sun Lee, Saerom Choi, Heejae Kim, Seungwon Lee, Yongdeok Kim |
3510 |
336960 |
2 |
DFT-PBE |
VASP |
36 |
||||
Approximately 9,100 configurations of Li10SiP2S12, based on crystal structures from the Materials Project database, material ID mp-696129. One of two LiSiPS datasets from this source. The other uses the PBEsol functional, rather than the PBE functional. |
10.60732/a82feb87 |
Li, P, S, Si |
Jianxing Huang, Linfeng Zhang, Han Wang, Jinbao Zhao, Jun Cheng, Weinan E |
9150 |
2100050 |
4 |
DFT-PBE |
VASP 5.4.4 |
36 |
||||
Test configurations from the SAIT_semiconductors_ACS_2023_SiN dataset. This dataset contains SiN, Si and N configurations from the SAIT semiconductors datasets. SAIT semiconductors datasets comprise two rich datasets for the important semiconductor thin film materials silicon nitride (SiN) and hafnium oxide (HfO), gathered for the development of MLFFs. DFT simulations were conducted under various conditions that include differing initial structures, stoichiometry, temperature, strain, and defects. |
10.60732/821cd3a8 |
N, Si |
Geonu Kim, Byunggook Na, Gunhee Kim, Hyuntae Cho, Seung-Jin Kang, Hee Sun Lee, Saerom Choi, Heejae Kim, Seungwon Lee, Yongdeok Kim |
2866 |
165559 |
2 |
DFT-PBE |
VASP |
36 |
||||
This data set was originally used to generate a multi-component linear SNAP potential for tungsten and beryllium as published in Wood, M. A., et. al. Phys. Rev. B 99 (2019) 184305. This data set was developed for the purpose of studying plasma material interactions in fusion reactors. |
10.60732/7500db4b |
Be, W |
Mitchell A. Wood, Mary Alice Cusentino, Brian D. Wirth, Aidan P. Thompson |
25055 |
524332 |
2 |
DFT-PBE |
VASP |
36 |
||||
Glycine training PBE-TS dataset from "Semi-local and hybrid functional DFT data for thermalised snapshots of polymorphs of benzene, succinic acid, and glycine". DFT reference energies and forces were calculated using Quantum Espresso v6.3. The calculations were performed with the semi-local PBE xc functional, Tkatchenko-Scheffler dispersion correction, optimised norm-conserving Vanderbilt pseudopotentials, a Monkhorst-Pack k-point grid with a maximum spacing of 0.06 x 2π A^-1, and a plane-wave energy cut-off of 100 Ry for the wavefunction. |
10.60732/76aa925f |
C, H, N, O |
Venkat Kapil, Edgar A. Engel |
29067 |
952530 |
4 |
DFT-PBE+TS |
Quantum ESPRESSO v6.3 |
36 |
||||
Random-random configurations from CA-9 dataset used during validation step for NNP_RR potential. CA-9 consists of configurations of carbon with curated subsets chosen to test the effects of intentionally choosing dissimilar configurations when training neural network potentials |
10.60732/1005a764 |
C |
Daniel Hedman, Tom Rothe, Gustav Johansson, Fredrik Sandin, J. Andreas Larsson, Yoshiyuki Miyamoto |
4001 |
218129 |
1 |
DFT-PBE |
VASP |
36 |
||||
Structures from Vector-QM24 (VQM24) that converged to saddle points during relaxation, with properties calculated using DFT. Vector-QM24 is a quantum chemistry dataset of ~836 thousand small organic and inorganic molecules. Dataset covers all possible neutral closed-shell small organic and inorganic molecules with up to five heavy (p-block) atoms: C, N, O, F, Si, P, S, Cl, Br. |
Br, C, Cl, F, H, N, O, P, S, Si |
Danish Khan, Anouar Benali, Scott Y. H. Kim, Guido Falk von Rudorff, O. Anatole von Lilienfeld |
51072 |
524617 |
10 |
DFT-ωB97X+D3 |
Psi4 |
36 |
|||||
Test set from W_LML-retrain dataset, containing bulk tungsten calculations. The W_LML-retrain dataset contains DFT calculations used in testing a linear-in-descriptor machine learning potential that accounts for dislocation-defect interactions in tungsten. Density functional simulations were performed using VASP. The PBE generalised gradient approximation was used to describe effects of electron exchange and correlation together with a projector augmented wave (PAW) basis set with a cut-off energy of 550 eV. Occupancies were smeared with a Methfessel-Paxton scheme of order one with a 0.1 eV smearing width. The Brillouin zone was sampled with a Monkhorst-Pack k-point grid for the 2D cluster simulations periodic along the dislocation line and a single k-point was used for the calculations with 3D spherical QM regions. The values of these parameters were chosen after a series of convergence tests on forces with a tolerance of a few meV/Å. |
10.60732/9d48595f |
W |
Berk Onat, Christoph Ortner, James R. Kermode |
8 |
1996 |
1 |
DFT-PBE |
VASP |
35 |
||||
The JARVIS_AGRA_CHO dataset is part of the joint automated repository for various integrated simulations (JARVIS) DFT database. This dataset contains data from the CO2 reduction reaction (CO2RR) dataset from Chen et al., as used in the automated graph representation algorithm (AGRA) training dataset: a collection of DFT training data for training a graph representation method to extract the local chemical environment of metallic surface adsorption sites. JARVIS is a set of tools and datasets built to meet current materials design challenges. |
10.60732/91b60711 |
C, Co, Cu, Fe, H, Mo, Ni, O |
Zhi Wen Chen, Zachary Gariepy, Lixin Chen, Xue Yao, Abu Anand, Szu-Jia Liu, Conrard Giresse Tetsassi Feugmo, Isaac Tamblyn, Chandra Veer Singh |
216 |
14472 |
8 |
DFT-PBE+D3 |
VASP |
35 |
||||
Configurations of alanine from WS22. The WS22 database combines Wigner sampling with geometry interpolation to generate 1.18 million molecular geometries equally distributed into 10 independent datasets of flexible organic molecules with varying sizes and chemical complexity. In addition to the potential energy and forces required to construct potential energy surfaces, the WS22 database provides several other quantum chemical properties, all obtained via single-point calculations for each molecular geometry. All quantum chemical calculations were performed with the Gaussian 09 program. |
10.60732/f74456fc |
C, H, N, O |
Max Pinheiro Jr, Shuang Zhang, Pavlo O. Dral, Mario Barbatti |
119991 |
1559883 |
4 |
DFT-PBE0 |
Gaussian 09 |
35 |
||||
All DFT single-point calculations for the OrbNet Denali training set were carried out in Entos Qcore version 0.8.17 at the ωB97X-D3/def2-TZVP level of theory using in-core density fitting with the neese=4 DFT integration grid. |
10.60732/b6043e2b |
B, Br, C, Ca, Cl, F, H, I, K, Li, Mg, N, Na, O, P, S, Si |
Anders S. Christensen, Sai Krishna Sirumalla, Zhuoran Qiao, Michael B. OConnor, Daniel G. A. Smith, Feizhi Ding, Peter J. Bygrave, Animashree Anandkumar, Matthew Welborn, Frederick R. Manby, Thomas F. Miller III |
2337230 |
104937852 |
17 |
DFT-ωB97X+D3 |
ENTOS QCORE 0.8.17 |
35 |
||||
Starting from a single reference ab initio simulation, we use active learning to expand into new state points and to describe the quantum nature of the nuclei. The final model, trained on 814 reference calculations, yields excellent results under a range of conditions, from liquid water at ambient and elevated temperatures and pressures to different phases of ice, and the air-water interface — all including nuclear quantum effects. |
10.60732/d1024453 |
H, O |
Christoph Schran, Kyrstof Brezina, Ondrej Marsalek |
8814 |
2304144 |
2 |
DFT-revPBE0+D3 |
CP2K |
35 |
||||
Validation configurations of Li8Mo2Ni7Ti7O32 from HO_LiMoNiTi_NPJCM_2020 used in the training of an ANN, whereby total energy is extrapolated by a Taylor expansion as a means of reducing computational costs. |
10.60732/40d9f4e8 |
Li, Mo, Ni, O, Ti |
April M. Cooper, Johannes Kästner, Alexander Urban, Nongnuch Artrith |
1792 |
100352 |
5 |
DFT-SCAN |
VASP |
35 |
||||
Dataset for H2CO, with and without added noise for testing the effects of noise on quality of fit. Configurations sets are included for clean energy values with different levels of gaussian noise added to atomic forces (including a set with no noise added), and energies perturbed at different levels (including a set with no perturbation). Configuration sets correspond to individual files found at the data link. |
10.60732/76701b84 |
C, H, O |
Sugata Goswami, Silvan Käser, Raymond J. Bemish, Markus Meuwly |
28808 |
115232 |
3 |
MP2 |
Gaussian 09 |
35 |
||||
Test dataset from xxMD-CASSCF. The xxMD (Extended Excited-state Molecular Dynamics) dataset is a comprehensive collection of non-adiabatic trajectories encompassing several photo-sensitive molecules. This dataset challenges existing Neural Force Field (NFF) models with broader nuclear configuration spaces that span reactant, transition state, product, and conical intersection regions, making it more chemically representative than its contemporaries. xxMD is divided into two datasets, each with corresponding train, test and validation splits. xxMD-CASSCF contains calculations generated using state-averaged complete active state self-consistent field (SA-CASSCF) electronic theory. xxMD-DFT contains recalculated single-point spin-polarized (unrestricted) DFT values. |
10.60732/f48ed7f0 |
C, H, N, O, S |
Zihan Pengmei, Yinan Shu, Junyu Liu |
21700 |
403800 |
5 |
SA-CASSCF |
OpenMolcas 22.06 |
35 |
||||
The DFT with D3 vdW corrections split of the Graphene-hBN_and_Graphene-Graphene dataset. This dataset family (see other Graphene-hBN_and_Graphene_Graphene datasets) contains data for Graphene-Graphene and Graphene-hexagonal boron nitride (hBN) ab initio calculations for structures with different interlayer distances and disregistries, calculated using DFT with D2 van der Waals corrections, DFT with D3 van der Waals corrections, and QMC methods. |
B, C, N |
Kittithat Krongchon, Lucas K. Wagner, Tawfiqur Rakib, Daniel Palmer, Elif Ertekin, Harley T. Johnson |
368 |
13248 |
3 |
DFT-PBE+D3 |
Quantum ESPRESSO |
35 |
|||||
The ChIMES C 2.0 Small dataset consists of initial structures of carbon calculated at the DFT level using VASP and trajectories produced using the ChIMES model. See links for the model code and ChIMES simulation evaluation library. |
10.60732/ef8a9926 |
C |
Rebecca K. Lindsey, Nir Goldman, Laurence E. Fried |
601 |
117976 |
1 |
DFT-PBE |
ChIMES |
34 |
||||
Glycine test PBE0-MBD dataset from "Semi-local and hybrid functional DFT data for thermalised snapshots of polymorphs of benzene, succinic acid, and glycine". DFT reference energies and forces were calculated using Quantum Espresso v6.3. The calculations were performed with the semi-local PBE xc functional, Tkatchenko-Scheffler dispersion correction, optimised norm-conserving Vanderbilt pseudopotentials, a Monkhorst-Pack k-point grid with a maximum spacing of 0.06 x 2π A^-1, and a plane-wave energy cut-off of 100 Ry for the wavefunction. |
10.60732/a85ea3a9 |
C, H, N, O |
Venkat Kapil, Edgar A. Engel |
200 |
6880 |
4 |
DFT-PBE+TS |
Quantum ESPRESSO v6.3 |
34 |
||||
Succinic acid validation PBE0-MBD dataset from "Semi-local and hybrid functional DFT data for thermalised snapshots of polymorphs of benzene, succinic acid, and glycine". DFT reference energies and forces were calculated using Quantum Espresso v6.3. The calculations were performed with the semi-local PBE xc functional, Tkatchenko-Scheffler dispersion correction, optimised norm-conserving Vanderbilt pseudopotentials, a Monkhorst-Pack k-point grid with a maximum spacing of 0.06 x 2π A^-1, and a plane-wave energy cut-off of 100 Ry for the wavefunction. |
10.60732/55319022 |
C, H, O |
Venkat Kapil, Edgar A. Engel |
200 |
5600 |
3 |
DFT-PBE+TS |
Quantum ESPRESSO v6.3 |
34 |
||||
This dataset contains structures of materials from the N (15th), O (16th) and F (16th) columns of the periodic table used for generating a 2-body non-bonded vdW potential. |
10.60732/5617dd04 |
As, At, Bi, O, P, Po, S, Sb, Se, Te |
Peng Geng, Sergey Zybin, Saber Naserifar, William A. Goddard, III |
262 |
1494 |
10 |
DFT-PBE |
VASP 5.4.4 |
34 |
||||
This dataset investigates the effect of defects, such as copper and oxygen vacancies, in cuprous oxide films. Structures include oxygen vacancies formed in proximity of a reconstructed Cu2O(111) surface, where the outermost unsaturated copper atoms are removed, thus forming non-stoichiometric surface layers with copper vacancies. Surface and bulk properties are addressed by modelling a thick and symmetric slab consisting of 8 atomic layers and 736 atoms. Configuration sets include bulk, slab, vacancy and oxygen gas. Version v1 |
10.60732/7fd4eb34 |
Cu, O |
Nanchen Dongfang, Marcella Iannuzzi, Yasmine Al-Hamdani |
855 |
604801 |
2 |
DFT-PBE+U+D3 |
CP2K |
34 |
||||
A training dataset of 90,000 configurations with interaction properties between H2 and Pt(111) surfaces. |
10.60732/831d1c4a |
H, Pt |
Jonathan Vandermause, Yu Xie, Jin Soo Lim, Cameron J. Owen, Boris Kozinsky |
90731 |
5705442 |
2 |
DFT-PBE |
VASP |
34 |
||||
Configurations of Zn from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K. |
10.60732/dfa1e792 |
Zn |
Christopher M. Andolina, Wissam A. Saidi |
3852 |
102160 |
1 |
DFT-PBE |
VASP |
34 |
||||
The validation set from HME21. The high-temperature multi-element 2021 (HME21) dataset comprises approximately 25,000 configurations, including 37 elements, used in the training of a universal NNP called PreFerential Potential (PFP). The dataset specifically contains disordered and unstable structures, and structures that include irregular substitutions, as well as varied temperature and density. |
10.60732/5bc5a5cc |
Ag, Al, Au, Ba, C, Ca, Cl, Co, Cr, Cu, F, Fe, H, In, Ir, K, Li, Mg, Mn, Mo, N, Na, Ni, O, P, Pb, Pd, Pt, Rh, Ru, S, Sc, Si, Sn, Ti, V, Zn |
So Takamoto, Chikashi Shinagawa, Daisuke Motoki, Kosuke Nakago, Wenwen Li, Iori Kurata, Taku Watanabe, Yoshihiro Yayama, Hiroki Iriguchi, Yusuke Asano, Tasuku Onodera, Takafumi Ishii, Takao Kudo, Hideki Ono, Ryohto Sawada, Ryuichiro Ishitani, Marc Ong, Taiki Yamaguchi, Toshiki Kataoka, Akihide Hayashi, Nontawat Charoenphakdee, Takeshi Ibuka |
2498 |
69420 |
37 |
DFT-PBE |
VASP 5.4.4 |
34 |
||||
Configurations of Li from Andolina & Saidi, 2023. One of 23 minimalist, curated sets of DFT-calculated properties for individual elements for the purpose of providing input to machine learning of deep neural network potentials (DNPs). Each element set contains on average ~4000 structures with 27 atoms per structure. Configuration metadata includes Materials Project ID where available, as well as temperatures at which MD trajectories were calculated.These temperatures correspond to the melting temperature (MT) and 0.25*MT for elements with MT < 2000K, and MT, 0.6*MT and 0.25*MT for elements with MT > 2000K. |
10.60732/f5cc2a19 |
Li |
Christopher M. Andolina, Wissam A. Saidi |
2531 |
93579 |
1 |
DFT-PBE |
VASP |
34 |
||||
A training dataset of diverse atomic configurations of Zn, varying in aggregation states, crystal structures, defect types, and sizes. The aim was to derive a potential capable of accurately describing a broad spectrum of local atomic configurations in Zn. |
10.60732/54902e18 |
Zn |
Haojie Mei, Luyao Cheng, Liang Chen, Feifei Wang, Jinfu Li, Lingti Kong |
13299 |
276240 |
1 |
DFT-PBE |
VASP |
34 |
||||
Approximately 20,000 configurations of Au used as part of a training dataset for a DP-GEN-based ML model for a Ag-Au nanoalloy potential. |
10.60732/c4492535 |
Au |
Yinan Wang, Xiaoyang Wang, Linfeng Zhang, Ben Xu, Han Wang |
9754 |
161580 |
1 |
DFT-PBE+D3 |
VASP, DP-GEN |
34 |
||||
The training split of the MAD (Massive Atomic Diversity) dataset. From the creators: Starting from relatively small sets of stable structures, the dataset is built to contain “massive atomic diversity” (MAD) by aggressively distorting these configurations, with near-complete disregard for the stability of the resulting configurations. The electronic structure details, on the other hand, are chosen to maximize consistency rather than to obtain the most accurate prediction fora given structure, or to minimize computational effort. The MAD dataset we present here, despite containing fewer than 100k structures, has already been shown to enable training universal interatomic potentials that are competitive with models trained on traditional datasets with two to three orders of magnitude more structures. |
10.60732/f5b6ea1b |
Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, O, Os, P, Pb, Pd, Pm, Po, Pr, Pt, Rb, Re, Rh, Rn, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Ti, Tl, Tm, V, W, Xe, Y, Yb, Zn, Zr |
Arslan Mazitov, Sofiia Chorna, Guillaume Fraux, Marnik Bercx, Giovanni Pizzi, Sandip De, Michele Ceriotti |
76482 |
2064229 |
85 |
DFT-PBEsol |
VASP |
33 |
||||
Training set (DFT output) for CE models and MC simulation output for the manuscript 'Phase behaviour of (Ti:Mo)S2binary alloys arising from electron-lattice coupling'. The DFT calculations are performed using VASP 5.4.3, compiled with intel MPI and Intel MKL support. |
10.60732/864a2df0 |
Mo, S, Ti |
Andrea Silva, Tomas Polcar, Denis Kramer |
259 |
3996 |
3 |
DFT-SCAN+rVV10 |
VASP 5.4.3 |
33 |
||||
This dataset provides DFT (as implemented in VASP) calculations for pure magnesium. Configuration sets include bulk, generalized stacking fault energies, stable stacking fault, decohesion, relaxed surfaces, dimer, corner and rod, and vacancy configurations of Mg. |
10.60732/28f038f2 |
Mg |
Binglun Yin, Markus Stricker, W. A. Curtin |
405 |
10730 |
1 |
DFT-PBE |
VASP |
33 |
||||
Glycine test PBE-TS dataset from "Semi-local and hybrid functional DFT data for thermalised snapshots of polymorphs of benzene, succinic acid, and glycine". DFT reference energies and forces were calculated using Quantum Espresso v6.3. The calculations were performed with the semi-local PBE xc functional, Tkatchenko-Scheffler dispersion correction, optimised norm-conserving Vanderbilt pseudopotentials, a Monkhorst-Pack k-point grid with a maximum spacing of 0.06 x 2π A^-1, and a plane-wave energy cut-off of 100 Ry for the wavefunction. |
10.60732/d68e42bc |
C, H, N, O |
Venkat Kapil, Edgar A. Engel |
500 |
17710 |
4 |
DFT-PBE+TS |
Quantum ESPRESSO v6.3 |
33 |
||||
Training configurations of Li8Mo2Ni7Ti7O32 from HO_LiMoNiTi_NPJCM_2020 used in the training of an ANN, whereby total energy is extrapolated by a Taylor expansion as a means of reducing computational costs. |
10.60732/1c1bf708 |
Li, Mo, Ni, O, Ti |
April M. Cooper, Johannes Kästner, Alexander Urban, Nongnuch Artrith |
824 |
46144 |
5 |
DFT-SCAN |
VASP |
33 |
||||
Carbon_GAP_20 dataset from CGM-MLP_natcomm2023. This dataset was one of the datasets used in training during the process of producing an active learning dataset for the purposes of exploring substrate-catalyzed deposition on metal surfaces such as Cu(111), Cr(110), Ti(001), and oxygen-contaminated Cu(111) as a means of controllable synthesis of carbon nanomaterials. The combined dataset includes structures from the Carbon_GAP_20 dataset and additional configurations of carbon clusters on a Cu, Cr and Ti surfaces. |
10.60732/b996b7e0 |
C, Cu |
Di Zhang, Peiyun Yi, Xinmin Lai, Linfa Peng, Hao Li |
6178 |
400485 |
2 |
DFT-PBE+D3 |
CP2K |
33 |
||||
Configurations of o-hbdi from WS22. The WS22 database combines Wigner sampling with geometry interpolation to generate 1.18 million molecular geometries equally distributed into 10 independent datasets of flexible organic molecules with varying sizes and chemical complexity. In addition to the potential energy and forces required to construct potential energy surfaces, the WS22 database provides several other quantum chemical properties, all obtained via single-point calculations for each molecular geometry. All quantum chemical calculations were performed with the Gaussian 09 program. |
10.60732/5dce8a9a |
C, H, N, O |
Max Pinheiro Jr, Shuang Zhang, Pavlo O. Dral, Mario Barbatti |
119995 |
2639890 |
4 |
DFT-PBE0 |
Gaussian 09 |
33 |
||||
Test configurations from CA-9 dataset used to evaluate trained NNPs.CA-9 consists of configurations of carbon with curated subsets chosen to test the effects of intentionally choosing dissimilar configurations when training neural network potentials |
10.60732/5a57f6ad |
C |
Daniel Hedman, Tom Rothe, Gustav Johansson, Fredrik Sandin, J. Andreas Larsson, Yoshiyuki Miyamoto |
2726 |
206238 |
1 |
DFT-PBE |
VASP |
33 |
||||
The JARVIS_mlearn dataset is part of the joint automated repository for various integrated simulations (JARVIS) database. This dataset contains configurations from the Organic Materials Database (OMDB): a dataset of 12,500 crystal materials for the purpose of training models for the prediction of properties for complex and lattice-periodic organic crystals with large numbers of atoms per unit cell. Dataset covers 69 space groups, 65 elements; averages 82 atoms per unit cell. This dataset also includes classical force-field inspired descriptors (CFID) for each configuration. JARVIS is a set of tools and collected datasets built to meet current materials design challenges. |
10.60732/f3f6ad68 |
Cu, Ge, Li, Mo, Ni, Si |
Yunxing Zuo, Chi Chen, Xiangguo Li, Zhi Deng, Yiming Chen, Jörg Behler, Gábor Csányi, Alexander V. Shapeev, Aidan P. Thompson, Mitchell A. Wood, Shyue Ping Ong |
1566 |
115742 |
6 |
DFT-PBE |
VASP 5.4.1 |
33 |
||||
The test set from HME21. The high-temperature multi-element 2021 (HME21) dataset comprises approximately 25,000 configurations, including 37 elements, used in the training of a universal NNP called PreFerential Potential (PFP). The dataset specifically contains disordered and unstable structures, and structures that include irregular substitutions, as well as varied temperature and density. |
10.60732/bddeac8f |
Ag, Al, Au, Ba, C, Ca, Cl, Co, Cr, Cu, F, Fe, H, In, Ir, K, Li, Mg, Mn, Mo, N, Na, Ni, O, P, Pb, Pd, Pt, Rh, Ru, S, Sc, Si, Sn, Ti, V, Zn |
So Takamoto, Chikashi Shinagawa, Daisuke Motoki, Kosuke Nakago, Wenwen Li, Iori Kurata, Taku Watanabe, Yoshihiro Yayama, Hiroki Iriguchi, Yusuke Asano, Tasuku Onodera, Takafumi Ishii, Takao Kudo, Hideki Ono, Ryohto Sawada, Ryuichiro Ishitani, Marc Ong, Taiki Yamaguchi, Toshiki Kataoka, Akihide Hayashi, Nontawat Charoenphakdee, Takeshi Ibuka |
2495 |
69572 |
37 |
DFT-PBE |
VASP 5.4.4 |
33 |
||||
Approximately 9,850 configurations of CO2 with a movable Ni(100) surface. |
10.60732/b44e9fd6 |
C, Ni, O |
Yaolong Zhang, Junfan Xia, Bin Jiang |
9845 |
383955 |
3 |
DFT-PBE |
VASP |
33 |
||||
The data used for training the DFT models were created running ab initio MD in the NVT ensemble using the Nosé-Hoover thermostat at 500 K during a 200 ps simulation with a resolution of 0.5 fs. Forces and energies were computed using all-electrons at the generalized gradient approximation level of theory with the Perdew-Burke-Ernzerhof (PBE) exchange-correlation functional, treating van der Waals interactions with the Tkatchenko-Scheffler (TS) method. All calculations were performed with FHI-aims. The final training data was generated by subsampling the full trajectory under preservation of the Maxwell-Boltzmann distribution for the energies. |
10.60732/18404d62 |
C, H |
Stefan Chmiela, Huziel E. Sauceda, Klaus-Robert Müller, Alexandre Tkatchenko |
49862 |
598344 |
2 |
DFT-PBE+TS |
FHI-aims |
33 |
||||
DFT dataset consisting of 6828 resampled Pt-Ni alloys used for training an NNP. The energy and forces of each structure in the resampled database are calculated using DFT. All reference DFT calculations for the training set of 6828 Pt-Ni alloy structures have been performed using the Vienna Ab initio Simulation Package (VASP) with the spin-polarized revised Perdew-Burke-Ernzerhof (rPBE) exchange-correlation functional. |
10.60732/9d0ff0eb |
Ni, Pt |
Shuang Han, Giovanni Barcaro, Alessandro Fortunelli, Steen Lysgaard, Tejs Vegge, Heine Anton Hansen |
6820 |
1072856 |
2 |
DFT-rPBE |
VASP |
33 |
||||
Training sets from Si_Al_Ti_Seko_PRB_2019. This dataset is compiled of 10,000 selected structures from the ICSD, divided into training and test sets. The dataset was generated for the purpose of training a MLIP with introduced high-order linearly independent rotational invariants up to the sixth order based on spherical harmonics. DFT calculations were carried out with VASP using the PBE cross-correlation functional and an energy cutoff of 400 eV. |
10.60732/59585f0a |
Al, Si, Ti |
Atsuto Seko, Atsushi Togo, Isao Tanaka |
3989 |
197628 |
3 |
DFT-PBE |
VASP |
33 |
||||
The rattled-1000-subsampled validation split of OMat24 (Open Materials 2024). OMat24 is a large-scale open dataset of density functional theory (DFT) calculations. The dataset is available in subdatasets and subsampled sub-datasets based on the structure generation strategy used. There are two main splits in OMat24: train and validation, each divided into the aforementioned subsampling and sub-datasets. |
10.60732/8e8871be |
Ac, Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr |
Luis Barroso-Luque, Muhammed Shuaibi, Xiang Fu, Brandon M. Wood, Misko Dzamba, Meng Gao, Ammar Rizvi, C. Lawrence Zitnick, Zachary W. Ulissi |
38271 |
549832 |
87 |
DFT-PBE+U |
VASP |
33 |
||||
Approximately 11,500 configurations of In2Se3, including monolayer (20-atom slab) and bulk (30-atom supercell) models. |
10.60732/a8e05a1b |
In, Se |
Jing Wu, Liyi Bai, Jiawei Huang, Liyang Ma, Jian Liu, Shi Liu |
11516 |
248370 |
2 |
DFT-PBE |
VASP |
33 |
||||
This dataset was designed to enable machine-learning of Nb elastic, thermal, and defect properties, as well as surface energetics, melting, and the structure of the liquid phase. The dataset was constructed by starting with the dataset from J. Byggmästar et al., Phys. Rev. B 100, 144105 (2019), then rescaling all of the configurations to the correct lattice spacing and adding in gamma surface configurations. |
10.60732/a90f7f6e |
Nb |
Jesper Byggmästar, Kai Nordlund, Flyura Djurabekova |
3787 |
45641 |
1 |
DFT-PBE |
VASP |
33 |
||||
All structures calculated for Vector-QM24 (VQM24) with properties calculated using DFT. Vector-QM24 is a quantum chemistry dataset of ~836 thousand small organic and inorganic molecules. Dataset covers all possible neutral closed-shell small organic and inorganic molecules with up to five heavy (p-block) atoms: C, N, O, F, Si, P, S, Cl, Br. |
Br, C, Cl, F, H, N, O, P, S, Si |
Danish Khan, Anouar Benali, Scott Y. H. Kim, Guido Falk von Rudorff, O. Anatole von Lilienfeld |
784838 |
8079877 |
10 |
DFT-ωB97X+D3 |
Psi4 |
33 |
|||||
We establish the sign of the linear magnetoelectric (ME) coefficient, α, in chromia, Cr₂O₃. Cr₂O₃ is the prototypical linear ME material, in which an electric (magnetic) field induces a linearly proportional magnetization (polarization), and a single magnetic domain can be selected by annealing in combined magnetic (H) and electric (E) fields. Opposite antiferromagnetic domains have opposite ME responses, and which antiferromagnetic domain corresponds to which sign of response has previously been unclear. We use density functional theory (DFT) to calculate the magnetic response of a single antiferromagnetic domain of Cr₂O₃ to an applied in-plane electric field at 0 K. We find that the domain with nearest neighbor magnetic moments oriented away from (towards) each other has a negative (positive) in-plane ME coefficient, α⊥, at 0 K. We show that this sign is consistent with all other DFT calculations in the literature that specified the domain orientation, independent of the choice of DFT code or functional, the method used to apply the field, and whether the direct (magnetic field) or inverse (electric field) ME response was calculated. Next, we reanalyze our previously published spherical neutron polarimetry data to determine the antiferromagnetic domain produced by annealing in combined E and H fields oriented along the crystallographic symmetry axis at room temperature. We find that the antiferromagnetic domain with nearest-neighbor magnetic moments oriented away from (towards) each other is produced by annealing in (anti-)parallel E and H fields, corresponding to a positive (negative) axial ME coefficient, α∥, at room temperature. Since α⊥ at 0 K and α∥ at room temperature are known to be of opposite sign, our computational and experimental results are consistent. This dataset contains the input data to reproduce the calculation of the magnetoelectric effect as plotted in Fig. 3 of the manuscript, for Elk, Vasp, and Quantum Espresso. |
10.60732/85b7fa44 |
Cr, O |
Eric Bousquet, Eddy Lelièvre-Berna, Navid Qureshi, Jian-Rui Soh, Nicola Ann Spaldin, Andrea Urru, Xanthe Henderike Verbeek, Sophie Francis Weber |
165 |
1650 |
2 |
DFT-LDA |
VASP |
32 |
||||
DFT-optimized geometries and properties for Li-S electrolytes. These make up the Computational Database for Li-S Batteries (ComBat), calculated using Gaussian 16 at the B3LYP/6-31+G* level of theory. |
10.60732/682b12b1 |
C, F, H, Li, N, O, P, S, Si |
Rasha Atwi, Matthew Bliss, Maxim Makeev, Nav Nidhi Rajput |
174 |
4719 |
9 |
DFT-B3LYP |
Gaussian 16 |
32 |
||||
The JARVIS_TinNet dataset is part of the joint automated repository for various integrated simulations (JARVIS) database. This dataset contains configurations from the TinNet-N dataset: a collection assembled to train a machine learning model for the purposes of assisting catalyst design by predicting chemical reactivity of transition-metal surfaces. The adsorption systems contained in this dataset consist of {100}-terminated Pt-based bimetallic surfaces doped with a third element. JARVIS is a set of tools and collected datasets built to meet current materials design challenges. |
10.60732/81faec41 |
Ag, Au, Cd, Co, Cr, Cu, Fe, H, Hf, Ir, Mn, Mo, N, Nb, Ni, O, Os, Pd, Pt, Re, Rh, Ru, Sc, Tc, V, W, Zn |
Shih-Han Wang, Hemanth Somarajan Pillai, Siwen Wang, Luke E. K. Achenie, Hongliang Xin |
329 |
6251 |
27 |
DFT-rPBE |
VASP |
32 |
||||
Dataset for "Appraisal of calcium ferrites as cathodes for calcium rechargeable batteries: DFT, synthesis, characterization and electrochemistry of Ca4Fe9O17" created to explore Fe-based cathode materials for Ca-ion batteries. Structures include CaFe(2+n)O(4+n), where 0 < n < 3. |
10.60732/c8fdee31 |
Ca, Fe, O |
M. Elena Arroyo-de Dompablo, José Luis Casals |
345 |
35462 |
3 |
DFT-PBE |
VASP 4.6.35 |
32 |
||||
The test set of a train/test pair from the malonaldehyde dataset from sGDML. To create the coupled cluster datasets, the data used for training the models were created by running ab initio MD in the NVT ensemble using the Nosé-Hoover thermostat at 500 K during a 200 ps simulation with a resolution of 0.5 fs. Energies and forces were recalculated using all-electron coupled cluster with single, double and perturbative triple excitations (CCSD(T)). The Dunning correlation-consistent basis set cc-pVDZ was used for malonaldehyde. All calculations were performed with the Psi4 software suite. |
10.60732/c459d6f4 |
C, H, O |
Stefan Chmiela, Huziel E. Sauceda, Klaus-Robert Müller, Alexandre Tkatchenko |
500 |
4500 |
3 |
CCSD(T) |
Psi4 |
32 |
||||
The test set of a train/test pair from the ethanol dataset from sGDML. To create the coupled cluster datasets, the data used for training the models were created by running ab initio MD in the NVT ensemble using the Nosé-Hoover thermostat at 500 K during a 200 ps simulation with a resolution of 0.5 fs. Energies and forces were recalculated using all-electron coupled cluster with single, double and perturbative triple excitations (CCSD(T)).The Dunning correlation-consistent basis set cc-pVTZ was used for ethanol. All calculations were performed with the Psi4 software suite. |
10.60732/76c53b98 |
C, H, O |
Stefan Chmiela, Huziel E. Sauceda, Klaus-Robert Müller, Alexandre Tkatchenko |
1000 |
9000 |
3 |
CCSD(T) |
Psi4 |
32 |
||||
The JARVIS_ALIGNN_FF dataset is part of the joint automated repository for various integrated simulations (JARVIS) database. This dataset is a subset of the JARVIS DFT dataset, filtered to contain just the first, last, middle, maximum energy and minimum energy structures. Additionally, calculation run snapshots are filtered for uniqueness, and the dataset contains only perfect structures. DFT energies, stresses and forces in this dataset were used to train an atomisitic line graph neural network (ALIGNN)-based FF model. JARVIS is a set of tools and datasets built to meet current materials design challenges. |
10.60732/45deafd8 |
Ac, Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr |
Kamal Choudhary, Brian DeCost, Lily Major, Keith Butler, Jeyan Thiyagalingam, Francesca Tavazza |
304146 |
3178329 |
89 |
IP-ALIGNN-FF |
VASP |
32 |
||||
A comprehensive database generated using density functional theory simulations, encompassing a wide range of crystal structures, point defects, extended defects, and disordered structure. |
10.60732/41115bd2 |
O, Si |
Karim Zongo, Hao Sun, Claudiane Ouellet-Plamondon, Laurent Karim Beland |
1061 |
71594 |
2 |
DFT-PBE |
Quantum ESPRESSO |
32 |
||||
Training and simulation data from machine learning force field model applied to steps of the hydrogenation of carbon dioxide to methanol over an indium oxide catalyst, with and without platinum doping. |
10.60732/d16f9667 |
C, H, In, O, Pt |
Lars Schaaf, Edvin Fako, Sandip De, Ansgar Schafer, Gabor Csanyi |
1994 |
163746 |
5 |
DFT-PBE |
Quantum ESPRESSO |
32 |
||||
Test configurations with MD simulations performed at 600K from 3BPA, used to showcase the performance of linear atomic cluster expansion (ACE) force fields in a machine learning model to predict the potential energy surfaces of organic molecules. |
10.60732/cf8c3842 |
C, H, N, O |
Dávid Péter Kovács, Cas van der Oord, Jiri Kucera, Alice E. A. Allen, Daniel J. Cole, Christoph Ortner, Gábor Csányi |
2138 |
57726 |
4 |
DFT-ωB97X |
ORCA |
32 |
||||
This dataset contains dimer molecules of Co(II) with potential energy calculations for structures with ferromagnetic and antiferromagnetic spin configurations. Calculations were carried out in Gaussian 16 with the PBE exchange-correlation functional and 6-31+G* basis set. All molecules contain the same atomic core region, consisting of the tetrahedral and octahedral Co centers and the three PO2R2 bridging ligands. The ligand exchange provides a broad range of exchange energies (ΔEJ), from +50 to -200 meV, with 80% of the ligands yielding ΔEJ < 10 meV. |
10.60732/16b96cbc |
C, Cl, Co, H, N, O, P, S |
Sijin Ren, Eric Fonseca, William Perry, Hai-Ping Cheng, Xiao-Guang Zhang, Richard Hennig |
2158 |
188149 |
8 |
DFT-PBE |
Gaussian 16 |
32 |
||||
Validation configurations from the SAIT_semiconductors_ACS_2023_HfO dataset. This dataset contains HfO configurations from the SAIT semiconductors datasets. SAIT semiconductors datasets comprise two rich datasets for the important semiconductor thin film materials silicon nitride (SiN) and hafnium oxide (HfO), gathered for the development of MLFFs. DFT simulations were conducted under various conditions that include differing initial structures, stoichiometry, temperature, strain, and defects. |
10.60732/bde379de |
Hf, O |
Geonu Kim, Byunggook Na, Gunhee Kim, Hyuntae Cho, Seung-Jin Kang, Hee Sun Lee, Saerom Choi, Heejae Kim, Seungwon Lee, Yongdeok Kim |
3510 |
336960 |
2 |
DFT-PBE |
VASP |
32 |
||||
The dataset consists of energies, forces and virials for DFT-VASP-generated Ag-Pd systems. The data was used to fit an active learned dataset which was used to compare MTP- and SOAP-GAP-generated potentials. |
10.60732/b0e39006 |
Ag, Pd |
Conrad W. Rosenbrock, Konstantin Gubaev, Alexander V. Shapeev, Livia B. Pártay, Noam Bernstein, Gábor Csányi, Gus L. W. Hart |
993 |
7260 |
2 |
DFT-PBE |
VASP |
32 |
||||
Succinic acid training PBE0-MBD dataset from "Semi-local and hybrid functional DFT data for thermalised snapshots of polymorphs of benzene, succinic acid, and glycine". DFT reference energies and forces were calculated using Quantum Espresso v6.3. The calculations were performed with the semi-local PBE xc functional, Tkatchenko-Scheffler dispersion correction, optimised norm-conserving Vanderbilt pseudopotentials, a Monkhorst-Pack k-point grid with a maximum spacing of 0.06 x 2π A^-1, and a plane-wave energy cut-off of 100 Ry for the wavefunction. |
10.60732/45375ebf |
C, H, O |
Venkat Kapil, Edgar A. Engel |
1800 |
50400 |
3 |
DFT-PBE+TS |
Quantum ESPRESSO v6.3 |
32 |
||||
Test sets from Si_Al_Ti_Seko_PRB_2019. This dataset is compiled of 10,000 selected structures from the ICSD, divided into training and test sets. The dataset was generated for the purpose of training a MLIP with introduced high-order linearly independent rotational invariants up to the sixth order based on spherical harmonics. DFT calculations were carried out with VASP using the PBE cross-correlation functional and an energy cutoff of 400 eV. |
10.60732/9b58ca47 |
Al, Si, Ti |
Atsuto Seko, Atsushi Togo, Isao Tanaka |
36152 |
1774526 |
3 |
DFT-PBE |
VASP |
32 |
||||
One configuration of an enzyme: training data for a quantum-guided molecular mechanics model. |
10.60732/e75f2602 |
C, H, N, O, S |
Taylor R. Quinn, Himani N. Patel, Kevin H. Koh, Brandon E. Haines, Per-Ola Norrby, Paul Helquist, Olaf Wiest |
1 |
117 |
5 |
DFT-RM06 |
Gaussian 09 |
31 |
||||
NEB path of proton transfer reaction between the two forms of acetylacetone. Acetylacetone dataset generated from a long molecular dynamics simulation at 300 K using a Langevin thermostat at the semi-empirical GFN2-xTB level of theory. Configurations were sampled at an interval of 1 ps and the resulting set of configurations were recomputed with density functional theory using the PBE exchange-correlation functional with D3 dispersion correction and def2-SVP basis set and VeryTightSCF convergence settings using the ORCA electronic structure package. |
10.60732/88a37621 |
C, H, O |
Ilyes Batatia, Simon Batzner, Dávid Péter Kovács, Albert Musaelian, Gregor N. C. Simm, Ralf Drautz, Christoph Ortner, Boris Kozinsky, Gábor Csányi |
15 |
225 |
3 |
DFT-PBE+D3 |
ORCA 5.0 |
31 |
||||
This dataset includes Mg and Mg-Zn alloy structures with solute atoms at the prism edge locations. The dataset was created to study the strengthening effect of solute atoms at the prism edge locations in Mg alloys. |
10.60732/95b38454 |
Mg, Zn |
Masoud Rahbar Niazi, W. A Curtin |
94 |
28615 |
2 |
DFT-PBE |
VASP |
31 |
||||
The test set of a train/test pair from the benzene dataset from sGDML. To create the coupled cluster datasets, the data used for training the models were created by running ab initio MD in the NVT ensemble using the Nosé-Hoover thermostat at 500 K during a 200 ps simulation with a resolution of 0.5 fs. Energies and forces were recalculated using all-electron coupled cluster with single , double and perturbative triple excitations (CCSD(T)). The Dunning correlation-consistent basis set cc-pVDZ was used for benzene. All calculations were performed with the Psi4 software suite. |
10.60732/81df086b |
C, H |
Stefan Chmiela, Huziel E. Sauceda, Klaus-Robert Müller, Alexandre Tkatchenko |
500 |
6000 |
2 |
CCSD(T) |
Psi4 |
31 |
||||
Test set of decorrelated geometries sampled from 300 K xTB MD. Acetylacetone dataset generated from a long molecular dynamics simulation at 300 K using a Langevin thermostat at the semi-empirical GFN2-xTB level of theory. Configurations were sampled at an interval of 1 ps and the resulting set of configurations were recomputed with density functional theory using the PBE exchange-correlation functional with D3 dispersion correction and def2-SVP basis set and VeryTightSCF convergence settings using the ORCA electronic structure package. |
10.60732/9ed20baf |
C, H, O |
Ilyes Batatia, Simon Batzner, Dávid Péter Kovács, Albert Musaelian, Gregor N. C. Simm, Ralf Drautz, Christoph Ortner, Boris Kozinsky, Gábor Csányi |
650 |
9750 |
3 |
DFT-PBE+D3 |
ORCA 5.0 |
31 |
||||
The main part of the dataset consists of structures of liquid water at 300 K from first-principles molecular dynamics (FPMD) simulations using a hybrid density functional with dispersion corrections. The dataset is expanded to include nuclear quantum effects by adding structures from path-integral molecular dynamics (PIMD) simulations. The final dataset contains 814 structures of liquid water at different temperatures and pressures, water slab, and ice Ih and ice VIII. These systems cover a wide range of structural and dynamical properties of water and ice. This dataset builds on the dataset from Schran, et al (2020) https://doi.org/10.1063/5.0016004 |
10.60732/39dba9fb |
H, O |
Zekun Chen, Margaret L. Berrens, Kam-Tung Chan, Zheyong Fan, Davide Donadio |
814 |
216144 |
2 |
DFT-revPBE0+D3 |
CP2K |
31 |
||||
This dataset contains pristine monolayer phosphorene as well as structures with monovacancies which were used to train an artificial neural network (ANN) for use with a high-dimensional neural network potentials molecular dynamics (HDNNP-MD) simulation. The publication investigates the mechanism and rates of the processes of defect diffusion, as well as monovacancy-to-divacancy defect coalescence. |
10.60732/87b2341a |
P |
Lukáš Kývala, Andrea Angeletti, Cesare Franchini, Christoph Dellago |
5085 |
722033 |
1 |
DFT-PBE |
VASP |
31 |
||||
6095 isomers of C7O2H10. Energetics were calculated at the G4MP2 level of theory. |
10.60732/64be4f16 |
C, H, O |
Raghunathan Ramakrishnan, Pavlo Dral, Matthias Rupp, O. Anatole von Lilienfeld |
6094 |
115786 |
3 |
G4MP2 |
Gaussian 09 |
31 |
||||
Training simulations from CGM-MLP_natcomm2023 of carbon deposition on a Ti surface. This dataset was one of the datasets used in training during the process of producing an active learning dataset for the purposes of exploring substrate-catalyzed deposition on metal surfaces such as Cu(111), Cr(110), Ti(001), and oxygen-contaminated Cu(111) as a means of controllable synthesis of carbon nanomaterials. The combined dataset includes structures from the Carbon_GAP_20 dataset and additional configurations of carbon clusters on a Cu, Cr and Ti surfaces. |
10.60732/4e8857ac |
C, Ti |
Di Zhang, Peiyun Yi, Xinmin Lai, Linfa Peng, Hao Li |
1309 |
259636 |
2 |
DFT-PBE+D3 |
CP2K |
31 |
||||
Includes CHON molecules of 4-15 atoms, developed in counterpoint to the MD17 dataset, run at higher total energies (above 500 K) and with a broader configuration space. |
10.60732/999055f6 |
C, H, O |
Joel M. Bowman, Chen Qu, Riccardo Conte, Apurba Nandi, Paul L. Houston, Qi Yu |
6762 |
101430 |
3 |
DFT-B3LYP |
MOLPRO |
31 |
||||
Approximately 28,000 configurations split into 4 datasets, each using a different functional, used in the training of a high-dimensional neural network potential (HDNNP). |
10.60732/4f9f05e5 |
H, O |
Tobias Morawietz, Jörg Behler |
14537 |
1523796 |
2 |
DFT-RPBE+D3, DFT-BLYP, DFT-rPBE, DFT-BLYP+D3 |
FHI-aims |
31 |
||||
Random-random configurations from CA-9 dataset used for training NNP_RR potential. CA-9 consists of configurations of carbon with curated subsets chosen to test the effects of intentionally choosing dissimilar configurations when training neural network potentials |
10.60732/4096ff5c |
C |
Daniel Hedman, Tom Rothe, Gustav Johansson, Fredrik Sandin, J. Andreas Larsson, Yoshiyuki Miyamoto |
20012 |
1099992 |
1 |
DFT-PBE |
VASP |
31 |
||||
This dataset was designed to enable machine learning of Mo elastic, thermal, and defect properties, as well as surface energetics, melting, and the structure of the liquid phase. The dataset was constructed by starting with the dataset from J. Byggmästar et al., Phys. Rev. B 100, 144105 (2019), then rescaling all of the configurations to the correct lattice spacing and adding in gamma surface configurations. |
10.60732/31dbb6ee |
Mo |
Jesper Byggmästar, Kai Nordlund, Flyura Djurabekova |
3785 |
45667 |
1 |
DFT-PBE |
VASP |
31 |
||||
Approximately 45,000 configurations of metal oxides of Mg, Ag, Pt, Cu and Zn, with initial training structures taken from the Materials Project database. |
10.60732/009ff6b1 |
Ag, Cu, Mg, O, Pt, Zn |
Pandu Wisesa, Christopher M. Andolina, Wissam A. Saidi |
44010 |
1975080 |
6 |
DFT-PBE |
VASP |
31 |
||||
16748 configurations of magnesium with gathered energy, stress and forces at the DFT level of theory. |
10.60732/4b13be86 |
Mg |
Marvin Poul |
16746 |
78239 |
1 |
DFT-PBE |
VASP 5.4.4 |
31 |
||||
The val_aimd-from-PBE-3000-npt validation split of OMat24 (Open Materials 2024). OMat24 is a large-scale open dataset of density functional theory (DFT) calculations. The dataset is available in subdatasets and subsampled sub-datasets based on the structure generation strategy used. There are two main splits in OMat24: train and validation, each divided into the aforementioned subsampling and sub-datasets. |
10.60732/3b112bfe |
Ac, Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Y, Yb, Zn, Zr |
Luis Barroso-Luque, Muhammed Shuaibi, Xiang Fu, Brandon M. Wood, Misko Dzamba, Meng Gao, Ammar Rizvi, C. Lawrence Zitnick, Zachary W. Ulissi |
59516 |
4036396 |
85 |
DFT-PBE+U |
VASP |
31 |
||||
Approximately 20,000 configurations from a dataset of alpha-iron and hydrogen. Properties include forces and potential energy, calculated using VASP at the DFT level using the GGA-PBE functional. |
10.60732/6e08b70b |
Fe, H |
Fan-Shun Meng, Jun-Ping Du, Shuhei Shinzato, Hideki Mori, Peijun Yu, Kazuki Matsubara, Nobuyuki Ishikawa, Shigenobu Ogata |
20800 |
1857588 |
2 |
DFT-PBE |
VASP |
31 |
||||
Binning-binning configurations from CA-9 dataset used for training NNP_BB potential. CA-9 consists of configurations of carbon with curated subsets chosen to test the effects of intentionally choosing dissimilar configurations when training neural network potentials |
10.60732/f3bbbd36 |
C |
Daniel Hedman, Tom Rothe, Gustav Johansson, Fredrik Sandin, J. Andreas Larsson, Yoshiyuki Miyamoto |
20006 |
1053753 |
1 |
DFT-PBE |
VASP |
31 |
||||
The test split of the MAD (Massive Atomic Diversity) dataset. From the creators: Starting from relatively small sets of stable structures, the dataset is built to contain “massive atomic diversity” (MAD) by aggressively distorting these configurations, with near-complete disregard for the stability of the resulting configurations. The electronic structure details, on the other hand, are chosen to maximize consistency rather than to obtain the most accurate prediction fora given structure, or to minimize computational effort. The MAD dataset we present here, despite containing fewer than 100k structures, has already been shown to enable training universal interatomic potentials that are competitive with models trained on traditional datasets with two to three orders of magnitude more structures. |
10.60732/e55c4ce1 |
Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, O, Os, P, Pb, Pd, Pm, Po, Pr, Pt, Rb, Re, Rh, Rn, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Ti, Tl, Tm, V, W, Xe, Y, Yb, Zn, Zr |
Arslan Mazitov, Sofiia Chorna, Guillaume Fraux, Marnik Bercx, Giovanni Pizzi, Sandip De, Michele Ceriotti |
9546 |
259376 |
85 |
DFT-PBEsol |
VASP |
30 |
||||
Structures from discrepencies_and_error_metrics_NPJ_2023 test set; these include a single migrating vacancy. The full discrepencies_and_error_metrics_NPJ_2023 dataset includes the original mlearn_Si_train dataset, modified with the purpose of developing models with better diffusivity scores by replacing ~54% of the data with structures containing migrating interstitials. The enhanced validation set contains 50 total structures, consisting of 20 structures randomly selected from the 120 replaced structures of the original training dataset, 11 snapshots with vacancy rare events (RE) from AIMD simulations, and 19 snapshots with interstitial RE from AIMD simulations. We also construct interstitial-RE and vacancy-RE testing sets, each consisting of 100 snapshots of atomic configurations with a single migrating vacancy or interstitial, respectively, from AIMD simulations at 1230 K. |
10.60732/63c3da57 |
Si |
Yunsheng Liu, Xingfeng He, Yifei Mo |
100 |
6300 |
1 |
DFT-PBE |
VASP 5.4.4 |
30 |
||||
The dataset for "Origin of high strength in the CoCrFeNiPd high-entropy alloy", containing DFT-calculated values of the high-entropy alloy CoCrFeNiPd, created to explore the reasons behind experimental findings of the increased strength CoCrFeNiPd in comparison to CoCrFeNi. |
10.60732/74f33a37 |
Co, Cr, Fe, Ni, Pd |
Binglun Yin, W. A. Curtin |
102 |
8508 |
5 |
DFT-PBEsol |
VASP |
30 |
||||
Structures from discrepencies_and_error_metrics_NPJ_2023 training set; includes some structures with vacancies. The full discrepencies_and_error_metrics_NPJ_2023 dataset includes the original mlearn_Si_train dataset, modified with the purpose of developing models with better diffusivity scores by replacing ~54% of the data with structures containing migrating interstitials. The enhanced validation set contains 50 total structures, consisting of 20 structures randomly selected from the 120 replaced structures of the original training dataset, 11 snapshots with vacancy rare events (RE) from AIMD simulations, and 19 snapshots with interstitial RE from AIMD simulations. We also construct interstitial-RE and vacancy-RE testing sets, each consisting of 100 snapshots of atomic configurations with a single migrating vacancy or interstitial, respectively, from AIMD simulations at 1230 K. |
10.60732/5a780a3a |
Si |
Yunsheng Liu, Xingfeng He, Yifei Mo |
218 |
13389 |
1 |
DFT-PBE |
VASP 5.4.4 |
30 |
||||
Dataset created for "Vanadium is an optimal element for strengthening in both fcc and bcc high-entropy alloys", to explore the effect of V in the high-entropy systems fcc Co-Cr-Fe-Mn-Ni-V and bcc Cr-Mo-Nb-Ta-V-W-Hf-Ti-Zr. Structures include pure V, misfit volumes of V in Ni, and misfit volumes of Ni2V random alloys |
10.60732/2a29960c |
Ni, V |
Binglun Yin, Francesco Maresca, W. A. Curtin |
232 |
21148 |
2 |
DFT-PBE |
VASP |
30 |
||||
The JARVIS_AGRA_COOH dataset is part of the joint automated repository for various integrated simulations (JARVIS) DFT database. This dataset contains data from the CO2 reduction reaction (CO2RR) dataset from Chen et al., as used in the automated graph representation algorithm (AGRA) training dataset: a collection of DFT training data for training a graph representation method to extract the local chemical environment of metallic surface adsorption sites. JARVIS is a set of tools and datasets built to meet current materials design challenges. |
10.60732/692a415c |
C, Co, Cu, Fe, H, Mo, Ni, O |
Zhi Wen Chen, Zachary Gariepy, Lixin Chen, Xue Yao, Abu Anand, Szu-Jia Liu, Conrard Giresse Tetsassi Feugmo, Isaac Tamblyn, Chandra Veer Singh |
280 |
19040 |
8 |
DFT-PBE |
VASP |
30 |
||||
This dataset contains configurations of lithium titanate from the publication Kinetic Pathways of ionic transport in fast-charging lithium titanate. In order to understand the origin of various EELS (electron energy-loss spectroscopy) spectra features, EELS spectra were simulated using the Vienna Ab initio Simulation (VASP) package. For a specific Li in a given configuration, this is done by calculating the DOS and integrated DOS considering a Li core-hole on the position of the specific Li and calculating the EELS based on the DOS. The minimum energy paths (MEP) and migration energy of Li were calculated in various compositions, including Li4Ti5O12 with an additional Li carrier, Li5Ti5O12 with an additional Li carrier, and Li7Ti5O12 with a Li vacancy carrier. |
10.60732/03896523 |
Be, Li, O, Ti |
Tina Chen, Dong-hwa Seo |
848 |
149914 |
4 |
DFT-PBE |
VASP |
30 |
||||
Benzene test PBE-TS dataset from "Semi-local and hybrid functional DFT data for thermalised snapshots of polymorphs of benzene, succinic acid, and glycine". DFT reference energies and forces were calculated using Quantum Espresso v6.3. The calculations were performed with the semi-local PBE xc functional, Tkatchenko-Scheffler dispersion correction, optimised norm-conserving Vanderbilt pseudopotentials, a Monkhorst-Pack k-point grid with a maximum spacing of 0.06 x 2π A^-1, and a plane-wave energy cut-off of 100 Ry for the wavefunction. |
10.60732/de02dca3 |
C, H |
Venkat Kapil, Edgar A. Engel |
1000 |
29736 |
2 |
DFT-PBE+TS |
Quantum ESPRESSO v6.3 |
30 |
||||
~100,000 configurations of water, ethanol, malondialdehyde and uracil gathered at the PBE/def2-SVP level of theory using ORCA. |
10.60732/b0a10262 |
C, H, N, O |
Kristof T. Schütt, Michael Gastegger, Alexandre Tkatchenko, Klaus-Robert Müller, Reinhard J. Maurer |
91966 |
887691 |
4 |
DFT-PBE |
ORCA |
30 |
||||
Approximately 6,500 configurations of Li10GeP2S12, based on crystal structures from the Materials Project database, material ID mp-696129. One of two LiGePS datasets from this source. The other uses the PBEsol functional, rather than the PBE functional. |
10.60732/5ebf5a54 |
Ge, Li, P, S |
Jianxing Huang, Linfeng Zhang, Han Wang, Jinbao Zhao, Jun Cheng, Weinan E |
6549 |
1478600 |
4 |
DFT-PBE |
VASP 5.4.4 |
30 |
||||
Energy, computed with LR-CCSD, hybrid DFT (B3LYP & SCAN0) for 7211 molecules in QM7b and 52 molecules in AlphaML showcase database. |
10.60732/8fb1d4c7 |
C, Cl, H, N, O, S |
Yang Yang, Ka Un Lao, David M. Wilkins, Andrea Grisafi, Michele Ceriotti, Robert A. DiStasio Jr |
7255 |
112218 |
6 |
CCSD, DFT-B3LYP |
Psi4 |
30 |
||||
Succinic acid training PBE-TS dataset from "Semi-local and hybrid functional DFT data for thermalised snapshots of polymorphs of benzene, succinic acid, and glycine". DFT reference energies and forces were calculated using Quantum Espresso v6.3. The calculations were performed with the semi-local PBE xc functional, Tkatchenko-Scheffler dispersion correction, optimised norm-conserving Vanderbilt pseudopotentials, a Monkhorst-Pack k-point grid with a maximum spacing of 0.06 x 2π A^-1, and a plane-wave energy cut-off of 100 Ry for the wavefunction. |
10.60732/e3efb796 |
C, H, O |
Venkat Kapil, Edgar A. Engel |
29211 |
817908 |
3 |
DFT-PBE+TS |
Quantum ESPRESSO v6.3 |
30 |
||||
This dataset was designed to enable machine-learning of V elastic, thermal, and defect properties, as well as surface energetics, melting, and the structure of the liquid phase. The dataset was constructed by starting with the dataset from J. Byggmästar et al., Phys. Rev. B 100, 144105 (2019), then rescaling all of the configurations to the correct lattice spacing and adding in gamma surface configurations. |
10.60732/aad06a25 |
V |
Jesper Byggmästar, Kai Nordlund, Flyura Djurabekova |
3801 |
46454 |
1 |
DFT-PBE |
VASP |
30 |
||||
Binning-random configurations from CA-9 dataset used for training NNP_BR potential. CA-9 consists of configurations of carbon with curated subsets chosen to test the effects of intentionally choosing dissimilar configurations when training neural network potentials |
10.60732/07b7d297 |
C |
Daniel Hedman, Tom Rothe, Gustav Johansson, Fredrik Sandin, J. Andreas Larsson, Yoshiyuki Miyamoto |
20013 |
1072779 |
1 |
DFT-PBE |
VASP |
30 |
||||
The QMC-calculated split of the Graphene-hBN_and_Graphene-Graphene dataset. This dataset family (see other Graphene-hBN_and_Graphene_Graphene datasets) contains data for Graphene-Graphene and Graphene-hexagonal boron nitride (hBN) ab initio calculations for structures with different interlayer distances and disregistries, calculated using DFT with D2 van der Waals corrections, DFT with D3 van der Waals corrections, and QMC methods. |
B, C, N |
Kittithat Krongchon, Lucas K. Wagner, Tawfiqur Rakib, Daniel Palmer, Elif Ertekin, Harley T. Johnson |
75 |
2700 |
3 |
IP-QMC |
QMCPACK |
30 |
|||||
Structures from Vector-QM24 (VQM24) that represent constitutional isomers, or the most stable conformers, with properties calculated using DFT. Vector-QM24 is a quantum chemistry dataset of ~836 thousand small organic and inorganic molecules. Dataset covers all possible neutral closed-shell small organic and inorganic molecules with up to five heavy (p-block) atoms: C, N, O, F, Si, P, S, Cl, Br. |
Br, C, Cl, F, H, N, O, P, S, Si |
Danish Khan, Anouar Benali, Scott Y. H. Kim, Guido Falk von Rudorff, O. Anatole von Lilienfeld |
258242 |
2430476 |
10 |
DFT-ωB97X+D3 |
Psi4 |
30 |
|||||
Structures from discrepencies_and_error_metrics_NPJ_2023 validation set, enhanced by inclusion of rare events. The full discrepencies_and_error_metrics_NPJ_2023 dataset includes the original mlearn_Si_train dataset, modified with the purpose of developing models with better diffusivity scores by replacing ~54% of the data with structures containing migrating interstitials. The enhanced validation set contains 50 total structures, consisting of 20 structures randomly selected from the 120 replaced structures of the original training dataset, 11 snapshots with vacancy rare events (RE) from AIMD simulations, and 19 snapshots with interstitial RE from AIMD simulations. We also construct interstitial-RE and vacancy-RE testing sets, each consisting of 100 snapshots of atomic configurations with a single migrating vacancy or interstitial, respectively, from AIMD simulations at 1230 K. |
10.60732/9c77bb8c |
Si |
Yunsheng Liu, Xingfeng He, Yifei Mo |
50 |
3198 |
1 |
DFT-PBE |
VASP 5.4.4 |
29 |
||||
Structures from discrepencies_and_error_metrics_NPJ_2023 training set, enhanced by inclusion of interstitials. The full discrepencies_and_error_metrics_NPJ_2023 dataset includes the original mlearn_Si_train dataset, modified with the purpose of developing models with better diffusivity scores by replacing ~54% of the data with structures containing migrating interstitials. The enhanced validation set contains 50 total structures, consisting of 20 structures randomly selected from the 120 replaced structures of the original training dataset, 11 snapshots with vacancy rare events (RE) from AIMD simulations, and 19 snapshots with interstitial RE from AIMD simulations. We also construct interstitial-RE and vacancy-RE testing sets, each consisting of 100 snapshots of atomic configurations with a single migrating vacancy or interstitial, respectively, from AIMD simulations at 1230 K. |
10.60732/f0a44294 |
Si |
Yunsheng Liu, Xingfeng He, Yifei Mo |
218 |
13629 |
1 |
DFT-PBE |
VASP 5.4.4 |
29 |
||||
This is the verification dataset (see companion training dataset: datasets_for_magnetic_MTP_NatSR2024_training) used in training a magnetic multi-component machine-learning potential for Fe-Al systems. The configurations from the verification set include different levels of magnetic moment perturbation than configurations from the training set. For this reason, the authors refer to this dataset as a "verification set", rather than a "validation set". |
10.60732/acd42be9 |
Al, Fe |
Alexey S. Kotykhov, Konstantin Gubaev, Max Hodapp, Christian Tantardini, Alexander V. Shapeev, Ivan S. Novikov |
210 |
3360 |
2 |
DFT-PBE |
ABINIT |
29 |
||||
The face-centered cubic medium-entropy alloy NiCoCr has received considerable attention for its good mechanical properties, uncertain stacking fault energy, etc, some of which have been attributed to chemical short-range order (SRO). Here, we examine the yield strength and misfit volumes of NiCoCr to determine whether SRO has measurably influenced mechanical properties. Polycrystalline strengths show no systematic trend with different processing conditions. Measured misfit volumes in NiCoCr are consistent with those in random binaries. Yield strength prediction of a random NiCoCr alloy matches well with experiments. Finally, we show that standard spin-polarized density functional theory (DFT) calculations of misfit volumes are not accurate for NiCoCr. This implies that DFT may be inaccurate for other subtle structural quantities such as atom-atom bond distance so that caution is required in drawing conclusions about NiCoCr based on DFT. These findings all lead to the conclusion that, under typical processing conditions, SRO in NiCoCr is either negligible or has no systematic measurable effect on strength. |
10.60732/aa9d7982 |
Co, Cr, Ni |
Binglun Yin, William Curtin |
428 |
40624 |
3 |
DFT-PBE |
VASP |
29 |
||||
493 structures available from the GAP-20 database, excluding any structures present in the training set. This dataset was one of the datasets used in testing screening parameters during the process of producing an active learning dataset for Cu-C interactions for the purposes of exploring substrate-catalyzed deposition as a means of controllable synthesis of carbon nanomaterials. The combined dataset includes structures from the Carbon_GAP_20 dataset and additional configurations of carbon clusters on a Cu(111) surface. |
10.60732/3e23c305 |
C |
Di Zhang, Peiyun Yi, Xinmin Lai, Linfa Peng, Hao Li |
494 |
32279 |
1 |
DFT-PBE+D3 |
CP2K |
29 |
||||
500 decorrelated geometries sampled from 300 K xTB MD run. Acetylacetone dataset generated from a long molecular dynamics simulation at 300 K using a Langevin thermostat at the semi-empirical GFN2-xTB level of theory. Configurations were sampled at an interval of 1 ps and the resulting set of configurations were recomputed with density functional theory using the PBE exchange-correlation functional with D3 dispersion correction and def2-SVP basis set and VeryTightSCF convergence settings using the ORCA electronic structure package. |
10.60732/e359e8ed |
C, H, O |
Ilyes Batatia, Simon Batzner, Dávid Péter Kovács, Albert Musaelian, Gregor N. C. Simm, Ralf Drautz, Christoph Ortner, Boris Kozinsky, Gábor Csányi |
500 |
7500 |
3 |
DFT-PBE+D3 |
ORCA 5.0 |
29 |
||||
This dataset was generated using the following active learning scheme: 1) candidate structures were relaxed by a partially-trained MTP model, 2) structures for which the MTP had to perform extrapolation were passed to DFT to be re-computed, 3) the MTP was retrained, including the structures that were re-computed with DFT, 4) steps 1-3 were repeated until the MTP no longer extrapolated on any of the original candidate structures. The original candidate structures for this dataset included 40,000 unrelaxed configurations with BCC, FCC, and HCP lattices. |
10.60732/1058e01c |
Cu, Pd |
Konstantin Gubaev, Evgeny V. Podryabinkin, Gus L.W. Hart, Alexander V. Shapeev |
522 |
2450 |
2 |
DFT-undefined |
VASP |
29 |
||||
The JARVIS_AGRA_OH dataset is part of the joint automated repository for various integrated simulations (JARVIS) DFT database. This dataset contains data from the training set for the oxygen reduction reaction (ORR) dataset from Batchelor et al., as used in the automated graph representation algorithm (AGRA) training dataset: a collection of DFT training data for training a graph representation method to extract the local chemical environment of metallic surface adsorption sites. Bulk calculations were performed with k-point = 8 x 8 x 4. Training adsorption energies were calculated on slabs, k-point = 4 x 4 x 1, while testing energies used k-point = 3 x 3 x 1. JARVIS is a set of tools and datasets built to meet current materials design challenges. |
10.60732/4db2a2e7 |
H, Ir, O, Pd, Pt, Rh, Ru |
Thomas A.A. Batchelor, Jack K. Pedersen, Simon H. Winther, Ivano E. Castelli, Karsten W. Jacobsen, Jan Rossmeisl |
877 |
15786 |
7 |
DFT-rPBE |
GPAW |
29 |
||||
The JARVIS_AGRA_O dataset is part of the joint automated repository for various integrated simulations (JARVIS) DFT database. This dataset contains data from the training set for the oxygen reduction reaction (ORR) dataset from Batchelor et al., as used in the automated graph representation algorithm (AGRA) training dataset: a collection of DFT training data for training a graph representation method to extract the local chemical environment of metallic surface adsorption sites. Bulk calculations were performed with k-point = 8 x 8 x 4. Training adsorption energies were calculated on slabs, k-point = 4 x 4 x 1, while testing energies used k-point = 3 x 3 x 1. JARVIS is a set of tools and datasets built to meet current materials design challenges. |
10.60732/a3177807 |
Ir, O, Pd, Pt, Rh, Ru |
Thomas A.A. Batchelor, Jack K. Pedersen, Simon H. Winther, Ivano E. Castelli, Karsten W. Jacobsen, Jan Rossmeisl |
1000 |
17000 |
6 |
DFT-rPBE |
GPAW |
29 |
||||
The rattled-relax validation split of OMat24 (Open Materials 2024). OMat24 is a large-scale open dataset of density functional theory (DFT) calculations. The dataset is available in subdatasets and subsampled sub-datasets based on the structure generation strategy used. There are two main splits in OMat24: train and validation, each divided into the aforementioned subsampling and sub-datasets. |
10.60732/7a878cdf |
Ac, Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, Hf, Hg, Ho, I, In, Ir, K, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Y, Yb, Zn, Zr |
Luis Barroso-Luque, Muhammed Shuaibi, Xiang Fu, Brandon M. Wood, Misko Dzamba, Meng Gao, Ammar Rizvi, C. Lawrence Zitnick, Zachary W. Ulissi |
91043 |
764266 |
84 |
DFT-PBE+U |
VASP |
29 |
||||
OC20_S2EF_val_ood_both is the out-of-domain validation set of the OC20 Structure to Energy and Forces (S2EF) dataset featuring both unseen catalyst composition and unseen adsorbate. Features include energy, atomic forces and data from the OC20 mappings file, including adsorbate id, materials project bulk id and miller index. |
10.60732/f8398b5c |
Ag, Al, As, Au, Bi, C, Ca, Cd, Cl, Co, Cr, Cs, Cu, Fe, Ga, Ge, H, Hf, Hg, In, Ir, K, Mn, Mo, N, Na, Nb, Ni, O, Os, P, Pb, Pd, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Tc, Te, Ti, Tl, V, W, Y, Zn, Zr |
Lowik Chanussot, Abhishek Das, Siddharth Goyal, Thibaut Lavril, Muhammed Shuaibi, Morgane Riviere, Kevin Tran, Javier Heras-Domingo, Caleb Ho, Weihua Hu, Aini Palizhati, Anuroop Sriram, Brandon Wood, Junwoong Yoon, Devi Parikh, C. Lawrence Zitnick, Zachary Ulissi |
999944 |
84604635 |
55 |
DFT-rPBE |
VASP |
29 |
||||
Configurations from CA-9 dataset used during validation step for NNP_CA-9 potential. CA-9 consists of configurations of carbon with curated subsets chosen to test the effects of intentionally choosing dissimilar configurations when training neural network potentials |
10.60732/5cebd981 |
C |
Daniel Hedman, Tom Rothe, Gustav Johansson, Fredrik Sandin, J. Andreas Larsson, Yoshiyuki Miyamoto |
8000 |
436601 |
1 |
DFT-PBE |
VASP |
29 |
||||
2558 structures selected from the GAP-20 database. This dataset was one of the datasets used in testing screening parameters during the process of producing an active learning dataset for Cu-C interactions for the purposes of exploring substrate-catalyzed deposition as a means of controllable synthesis of carbon nanomaterials. The combined dataset includes structures from the Carbon_GAP_20 dataset and additional configurations of carbon clusters on a Cu(111) surface. |
10.60732/f340d1d9 |
C |
Di Zhang, Peiyun Yi, Xinmin Lai, Linfa Peng, Hao Li |
2558 |
168066 |
1 |
DFT-PBE+D3 |
CP2K |
29 |
||||
Dataset used to train a machine learning model to calculate density functional theory-quality formation energies of all ~2 x 106 pristine ABC2D6 elpasolite crystals that can be made up from main-group elements (up to bismuth). |
10.60732/f87a64e4 |
Al, Ar, As, B, Ba, Be, Bi, Br, C, Ca, Cl, Cs, F, Ga, Ge, H, He, I, In, K, Kr, Li, Mg, N, Na, Ne, O, P, Pb, Rb, S, Sb, Se, Si, Sn, Sr, Te, Tl, Xe |
Felix Faber, Alexander Lindmaa, O. Anatole von Lilienfeld, Rickard Armiento |
21881 |
218810 |
39 |
DFT-PBE |
VASP 5.2.2 |
29 |
||||
Binning-random configurations from CA-9 dataset used during validation step for NNP_BR potential. CA-9 consists of configurations of carbon with curated subsets chosen to test the effects of intentionally choosing dissimilar configurations when training neural network potentials |
10.60732/0f8a1418 |
C |
Daniel Hedman, Tom Rothe, Gustav Johansson, Fredrik Sandin, J. Andreas Larsson, Yoshiyuki Miyamoto |
4002 |
214310 |
1 |
DFT-PBE |
VASP |
29 |
||||
Partial dataset for "Accuracy evaluation of different machine learning force field features". The included data is limited to that hosted directly on the repository at the related GitHub link. From publication abstract: Predicting energies and forces using machine learning force field (MLFF) depends on accurate descriptions (features) of chemical environment. Despite the numerous features proposed, there is a lack of controlled comparison among them for their universality and accuracy. In this work, we compared several commonly used feature types for their ability to describe physical systems. These different feature types include cosine feature, Gaussian feature, moment tensor potential (MTP) feature, spectral neighbor analysis potential feature, simplified smooth deep potential with Chebyshev polynomials feature and Gaussian polynomials feature, and atomic cluster expansion feature. We evaluated the training root mean square error (RMSE) for the atomic group energy, total energy, and force using linear regression model regarding to the density functional theory results. We applied these MLFF models to an amorphous sulfur system and carbon systems, and the fitting results show that MTP feature can yield the smallest RMSE results compared with other feature types for either sulfur system or carbon system in the disordered atomic configurations. Moreover, as an extending test of other systems, the MTP feature combined with linear regression model can also reproduce similar quantities along the ab initio molecular dynamics trajectory as represented by Cu systems. Our results are helpful in selecting the proper features for the MLFF development. |
10.60732/209e0c9c |
C, H, Mg, Ni, O, Si |
Ting Han, Jie Li, Liping Liu, Fengyu Li, Lin-Wang Wang |
17255 |
918240 |
6 |
DFT-PBE |
PWmat |
29 |
||||
The MAD benchmark dataset, containing a selection of MAD test, MPtrj, Alexandria, SPICE, MD22 and OC2020 datasets, computed with MPtrj DFT settings. Part of the MAD (Massive Atomic Diversity) dataset family. From the creators: Starting from relatively small sets of stable structures, the dataset is built to contain “massive atomic diversity” (MAD) by aggressively distorting these configurations, with near-complete disregard for the stability of the resulting configurations. The electronic structure details, on the other hand, are chosen to maximize consistency rather than to obtain the most accurate prediction fora given structure, or to minimize computational effort. The MAD dataset we present here, despite containing fewer than 100k structures, has already been shown to enable training universal interatomic potentials that are competitive with models trained on traditional datasets with two to three orders of magnitude more structures. |
10.60732/30653c33 |
Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, Np, O, Os, P, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr |
Arslan Mazitov, Sofiia Chorna, Guillaume Fraux, Marnik Bercx, Giovanni Pizzi, Sandip De, Michele Ceriotti |
2114 |
58755 |
85 |
DFT-PBEsol |
VASP |
28 |
||||
Benzene validation PBE0-MBD dataset from "Semi-local and hybrid functional DFT data for thermalised snapshots of polymorphs of benzene, succinic acid, and glycine". DFT reference energies and forces were calculated using Quantum Espresso v6.3. The calculations were performed with the semi-local PBE xc functional, Tkatchenko-Scheffler dispersion correction, optimised norm-conserving Vanderbilt pseudopotentials, a Monkhorst-Pack k-point grid with a maximum spacing of 0.06 x 2π A^-1, and a plane-wave energy cut-off of 100 Ry for the wavefunction. |
10.60732/eb286cb6 |
C, H |
Venkat Kapil, Edgar A. Engel |
200 |
6072 |
2 |
DFT-PBE+TS |
Quantum ESPRESSO v6.3 |
28 |
||||
Example dataset for MISPR (Materials Informatics for Structure-Property Relationships) materials science simulation software, with DFT-calculated configuration properties for three different MISPR workflows: nuclear magnetic resonance (NMR) chemical shifts, electrostatic partial charges (ESP) and bond dissociation energies (BDE). |
10.60732/2b830270 |
C, Cl, F, H, N, O, P, S, Si |
Rasha Atwi, Matthew Bliss, Maxim Makeev, Nav Nidhi Rajput |
503 |
8996 |
9 |
DFT-ωB97X, DFT-B3LYP |
Gaussian 16 |
28 |
||||
The JARVIS_TinNet dataset is part of the joint automated repository for various integrated simulations (JARVIS) database. This dataset contains configurations from the TinNet-O dataset: a collection assembled to train a machine learning model for the purposes of assisting catalyst design by predicting chemical reactivity of transition-metal surfaces. The adsorption systems contained in this dataset consist of {111}-terminated metal surfaces. JARVIS is a set of tools and collected datasets built to meet current materials design challenges. |
10.60732/9541fb8b |
Ag, Al, Au, Bi, Cd, Co, Cr, Cu, Fe, Ga, Hf, In, Ir, La, Mn, Mo, Nb, Ni, O, Os, Pb, Pd, Pt, Re, Rh, Ru, Sc, Sn, Ta, Ti, Tl, V, W, Y, Zn, Zr |
Shih-Han Wang, Hemanth Somarajan Pillai, Siwen Wang, Luke E. K. Achenie, Hongliang Xin |
747 |
12699 |
36 |
DFT-PBE |
Quantum ESPRESSO |
28 |
||||
Benzene validation PBE-TS dataset from "Semi-local and hybrid functional DFT data for thermalised snapshots of polymorphs of benzene, succinic acid, and glycine". DFT reference energies and forces were calculated using Quantum Espresso v6.3. The calculations were performed with the semi-local PBE xc functional, Tkatchenko-Scheffler dispersion correction, optimised norm-conserving Vanderbilt pseudopotentials, a Monkhorst-Pack k-point grid with a maximum spacing of 0.06 x 2π A^-1, and a plane-wave energy cut-off of 100 Ry for the wavefunction. |
10.60732/01b77268 |
C, H |
Venkat Kapil, Edgar A. Engel |
1000 |
29712 |
2 |
DFT-PBE+TS |
Quantum ESPRESSO v6.3 |
28 |
||||
The val_aimd-from-PBE-3000-nvt validation split of OMat24 (Open Materials 2024). OMat24 is a large-scale open dataset of density functional theory (DFT) calculations. The dataset is available in subdatasets and subsampled sub-datasets based on the structure generation strategy used. There are two main splits in OMat24: train and validation, each divided into the aforementioned subsampling and sub-datasets. |
10.60732/6f64849f |
Ac, Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, Hf, Hg, Ho, I, In, Ir, K, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Y, Yb, Zn, Zr |
Luis Barroso-Luque, Muhammed Shuaibi, Xiang Fu, Brandon M. Wood, Misko Dzamba, Meng Gao, Ammar Rizvi, C. Lawrence Zitnick, Zachary W. Ulissi |
76478 |
5186115 |
84 |
DFT-PBE+U |
VASP |
28 |
||||
This dataset contains structural calculations of LaMnO3 carried out in Quantum ESPRESSO at the DFT-PBEsol+U level of theory. The dataset was built to explore strained and stoichiometric and oxygen-deficient LaMnO3. |
10.60732/9772459c |
Ba, La, Mn, O, Ti |
Chiara Ricca, Nicolas Niederhauser, Ulrich Aschauer |
4513 |
174298 |
5 |
DFT-PBE+U |
Quantum ESPRESSO |
28 |
||||
Training simulations from CGM-MLP_natcomm2023 of carbon deposition on a Cu surface. This appears similar to CGM-MLP_natcomm2023_CU-C_deposition, as there are no O atoms present in this set. This dataset was one of the datasets used in training during the process of producing an active learning dataset for the purposes of exploring substrate-catalyzed deposition on metal surfaces such as Cu(111), Cr(110), Ti(001), and oxygen-contaminated Cu(111) as a means of controllable synthesis of carbon nanomaterials. The combined dataset includes structures from the Carbon_GAP_20 dataset and additional configurations of carbon clusters on a Cu, Cr and Ti surfaces. |
10.60732/ae9380c5 |
C, Cu |
Di Zhang, Peiyun Yi, Xinmin Lai, Linfa Peng, Hao Li |
1693 |
326182 |
2 |
DFT-PBE+D3 |
CP2K |
28 |
||||
Training simulations from CGM-MLP_natcomm2023 of carbon on an oxygen-contaminated Cu surface. This dataset was one of the datasets used in training during the process of producing an active learning dataset for the purposes of exploring substrate-catalyzed deposition on metal surfaces such as Cu(111), Cr(110), Ti(001), and oxygen-contaminated Cu(111) as a means of controllable synthesis of carbon nanomaterials. The combined dataset includes structures from the Carbon_GAP_20 dataset and additional configurations of carbon clusters on a Cu, Cr and Ti surfaces. |
10.60732/215303a5 |
C, Cu, O |
Di Zhang, Peiyun Yi, Xinmin Lai, Linfa Peng, Hao Li |
1717 |
387151 |
3 |
DFT-PBE+D3 |
CP2K |
28 |
||||
Approximately 28,500 configurations of hafnia (HfO2) used in the training of a DP model for the prediction of properties of various hafnia polymorphs, including transition barriers between different phases. |
10.60732/9fbd0fcb |
Hf, O |
Jing Wu, Yuzhi Zhang, Linfeng Zhang, Shi Liu |
28506 |
2736576 |
2 |
DFT-PBE |
VASP |
28 |
||||
The chalcopyrite Cu(In,Ga)S2 has gained renewed interest in recent years due to its potential application in tandem solar cells. In this contribution, a combined theoretical and experimental approach is applied to investigate stable and metastable phases forming in sputtered CuInS2 (CIS) thin films. Ab initio calculations are performed to obtain formation energies, X-ray diffraction patterns, and Raman spectra of various CIS polytypes and related compounds. Multiple low-energy CIS structures with zinc-blende and wurtzite-derived lattices are identified and their XRD/Raman patterns are shown to contain many overlapping features, which could lead to misidentification unless the techniques are duly combined and analyzed. The results are verified against experimental XRD/Raman spectra measured on a series of CIS films with different compositions and treated at different temperatures, revealing the formation of several CIS polymorphs and secondary phases. The characteristic features and the mechanisms behind the formation of different phases are discussed with the focus on the thin-film photovoltaic application of CIS. The dataset contains structures and VASP output files used to derive the discussed trends. version 2 |
10.60732/bcce3f87 |
Cu, In, Na, S |
Jes Larsen, Kostiantyn Sopiha, Clas Persson, Charlotte Platzer-Björkman, Marika Edoff |
3103 |
117852 |
4 |
DFT-PBE |
VASP |
28 |
||||
A dataset used to train machine-learning interatomic potentials (moment tensor potentials) for multicomponent alloys to ab initio data in order to investigate the disordered body-centered cubic (bcc) TiZrHfTax system with varying Ta concentration. |
10.60732/434db566 |
Hf, Ta, Ti, Zr |
Konstantin Gubaev, Yuji Ikeda, Ferenc Tasnádi, Jörg Neugebauer, Alexander V. Shapeev, Blazej Grabowski, Fritz Körmann |
3622 |
223930 |
4 |
DFT-PBE |
VASP |
28 |
||||
The rattled-300-subsampled validation split of OMat24 (Open Materials 2024). OMat24 is a large-scale open dataset of density functional theory (DFT) calculations. The dataset is available in subdatasets and subsampled sub-datasets based on the structure generation strategy used. There are two main splits in OMat24: train and validation, each divided into the aforementioned subsampling and sub-datasets. |
10.60732/de580670 |
Ac, Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Y, Yb, Zn, Zr |
Luis Barroso-Luque, Muhammed Shuaibi, Xiang Fu, Brandon M. Wood, Misko Dzamba, Meng Gao, Ammar Rizvi, C. Lawrence Zitnick, Zachary W. Ulissi |
34244 |
490880 |
85 |
DFT-PBE+U |
VASP |
28 |
||||
This dataset was designed to enable machine-learning of Ta elastic, thermal, and defect properties, as well as surface energetics, melting, and the structure of the liquid phase. The dataset was constructed by starting with the dataset from J. Byggmästar et al., Phys. Rev. B 100, 144105 (2019), then rescaling all of the configurations to the correct lattice spacing and adding in gamma surface configurations. |
10.60732/43837a12 |
Ta |
Jesper Byggmästar, Kai Nordlund, Flyura Djurabekova |
3773 |
45385 |
1 |
DFT-PBE |
VASP |
28 |
||||
130,000 configurations of zeolite from the Database of Zeolite Structures. Calculations performed using Amsterdam Modeling Suite software. |
10.60732/7eb1fefb |
Al, Ba, Be, C, Ca, Cs, F, Ge, H, K, Li, N, Na, O, Si |
Leonid Komissarov, Toon Verstraelen |
12929 |
1841496 |
15 |
DFT-revPBE+D3(BJ) |
BAND |
28 |
||||
This dataset contains data from density functional theory calculations of various atomic configurations of pure Zr, pure Sn, and Zr-Sn alloys with different structures, defects, and compositions. Energies, forces, and stresses are calculated at the DFT level of theory. Includes 23,956 total configurations. |
10.60732/8f77465e |
Sn, Zr |
Haojie Mei, Liang Chen, Feifei Wang, Guisen Liu, Jing Hu, Weitong Lin, Yao Shen, Jinfu Li, Lingti Kong |
23232 |
680289 |
2 |
DFT-PBE |
VASP |
28 |
||||
The validation split of the MAD (Massive Atomic Diversity) dataset. From the creators: Starting from relatively small sets of stable structures, the dataset is built to contain “massive atomic diversity” (MAD) by aggressively distorting these configurations, with near-complete disregard for the stability of the resulting configurations. The electronic structure details, on the other hand, are chosen to maximize consistency rather than to obtain the most accurate prediction fora given structure, or to minimize computational effort. The MAD dataset we present here, despite containing fewer than 100k structures, has already been shown to enable training universal interatomic potentials that are competitive with models trained on traditional datasets with two to three orders of magnitude more structures. |
10.60732/8ff541c9 |
Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, O, Os, P, Pb, Pd, Pm, Po, Pr, Pt, Rb, Re, Rh, Rn, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Ti, Tl, Tm, V, W, Xe, Y, Yb, Zn, Zr |
Arslan Mazitov, Sofiia Chorna, Guillaume Fraux, Marnik Bercx, Giovanni Pizzi, Sandip De, Michele Ceriotti |
9566 |
257052 |
85 |
DFT-PBEsol |
VASP |
27 |
||||
This dataset consists of graphene superlattices with tungsten adatoms with properties calculated at the DFT level of theory. The authors modeled the placement of tungsten adatoms on a graphene monolayer. The resulting superlattice structures were then used to calculate electronic band structure and phonon dispersion relations. The dataset was used to investigate the effect of adatom placement on electronic band structure and phonon dispersion relations of graphene superlattices. The creation of the dataset involved the following steps: 1. Selection of the graphene monolayer as the starting point for the superlattice construction. 2. Placement of tungsten adatoms in the center of the unit cell 3. Calculation of the electronic structure and other properties of the resulting superlattice using DFT. 4. Generation of a set of reduced Brillouin zones representing the symmetry of the superlattice. 5. Calculation of the electronic band structure and phonon dispersion relations for each superlattice structure in the dataset. |
10.60732/bdd0d26b |
C, Cr, Ir, Mo, Nb, Os, Re, Rh, Ru, Ta, W |
Anastasiia Skurativska, Stepan S. Tsirkin, Fabian D Natterer, Titus Neupert, Mark H Fischer |
18 |
774 |
11 |
DFT-PBE |
VASP |
27 |
||||
This data was assembled to investigate rare-earth-catalyzed benzylic C(sp3)-H addition of pyridines to olefins. All calculations were performed with the Gaussian 09 software package. The B3PW91 functional was used for geometric optimization without any symmetric constraints. Each optimized structure was subsequently analyzed by harmonic vibrational frequencies at the same level of theory for characterization of a minimum (NImag = 0) or a transition state (NImag = 1) to obtain the thermodynamic data. The 6-31G(d) basis set was used for C, H, and N atoms, and Stuttgart/Dresden relativistic effective core potentials (RECPs) as well as the associated valence basis sets were used for the Y atom. To obtain more accurate energies, single-point energy calculations were performed with a larger basis set. In such single-point calculations, the M06-L functional, which often shows good performance in the treatment of transition-metal systems, was used together with the CPCM solvation model for consideration of the toluene solvation effect. The same basis set together with associated pseudopotentials as in geometry optimization was used for the Y atom, and the 6-311+G(d,p) basis set was used for the remaining atoms. |
10.60732/445f826b |
C, H, N, Y |
Guangli Zhou, Gen Luo, Xiaohui Kang, Zhaomin Hou, Yi Luo |
58 |
3514 |
4 |
DFT-M06-L |
Gaussian 09 |
27 |
||||
192 structures were uniformly selected from the AIMD simulation, excluding any structures that are part of the training set. This dataset was one of the datasets used in testing screening parameters during the process of producing an active learning dataset for Cu-C interactions for the purposes of exploring substrate-catalyzed deposition as a means of controllable synthesis of carbon nanomaterials. The combined dataset includes structures from the Carbon_GAP_20 dataset and additional configurations of carbon clusters on a Cu(111) surface. |
10.60732/eb6e9ead |
C, Cu |
Di Zhang, Peiyun Yi, Xinmin Lai, Linfa Peng, Hao Li |
193 |
38004 |
2 |
DFT-PBE+D3 |
CP2K |
27 |
||||
The train set of a train and test set pair.The combined datasets comprise approximately 275 configurations of monolayer quasi-hexagonal-phase fullerene (qHPF) membrane used to train and test an NEP model. |
10.60732/906d79f3 |
C |
Penghua Ying |
237 |
28440 |
1 |
DFT-PBE |
VASP |
27 |
||||
Glycine validation PBE-TS dataset from "Semi-local and hybrid functional DFT data for thermalised snapshots of polymorphs of benzene, succinic acid, and glycine". DFT reference energies and forces were calculated using Quantum Espresso v6.3. The calculations were performed with the semi-local PBE xc functional, Tkatchenko-Scheffler dispersion correction, optimised norm-conserving Vanderbilt pseudopotentials, a Monkhorst-Pack k-point grid with a maximum spacing of 0.06 x 2π A^-1, and a plane-wave energy cut-off of 100 Ry for the wavefunction. |
10.60732/2d6ebecc |
C, H, N, O |
Venkat Kapil, Edgar A. Engel |
500 |
17800 |
4 |
DFT-PBE+TS |
Quantum ESPRESSO v6.3 |
27 |
||||
The JARVIS_TinNet dataset is part of the joint automated repository for various integrated simulations (JARVIS) database. This dataset contains configurations from the TinNet-OH dataset: a collection assembled to train a machine learning model for the purposes of assisting catalyst design by predicting chemical reactivity of transition-metal surfaces. The adsorption systems contained in this dataset consist of {111}-terminated metal surfaces. JARVIS is a set of tools and collected datasets built to meet current materials design challenges. |
10.60732/401689c1 |
Ag, Al, Au, Bi, Cd, Co, Cr, Cu, Fe, Ga, H, Hf, In, Ir, La, Mn, Mo, Nb, Ni, O, Os, Pb, Pd, Pt, Re, Rh, Ru, Sc, Sn, Ta, Ti, Tl, V, W, Y, Zn, Zr |
Shih-Han Wang, Hemanth Somarajan Pillai, Siwen Wang, Luke E. K. Achenie, Hongliang Xin |
748 |
13464 |
37 |
DFT-PBE |
Quantum ESPRESSO |
27 |
||||
This dataset of molecular structures was extracted, using the NOMAD API, from all available structures in the NOMAD Archive that only include C, H, O, and N. This dataset consists of 50.42% H, 30.41% C, 10.36% N, and 8.81% O and includes 96 804 atomic environments in 5217 structures. |
10.60732/c5e37779 |
C, H, N, O |
Berk Onat, Christoph Ortner, James R. Kermode |
3774 |
60197 |
4 |
DFT-PBE, DFT-HSE06, DFT-mPW1PW91, DFT-B1B95, DFT-M06-2X, DFT-B3PW91, DFT-B88-LYP, DFT-LDA-PW-PZ, DFT-LDA-PZ_MOD, DFT-LDA-C_VWN, DFT-B2PLYP, DFT-TPSSh, DFT-PBE0 |
Octopus, Gaussian, VASP, exciting, FHI-aims |
27 |
||||
This dataset is formed from two parts: single-species datasets for Al, Ni, and Cu from the NOMAD Encyclopedia and multi-species datasets that include Al, Ni and Cu from NOMAD Archive. Duplicates have been removed from NOMAD Encyclopedia data. For the multi-species data, only the last configuration steps for each NOMAD Archive record were used because the last configuration typically cooresponds with a fully relaxed configuration. In this dataset, the NOMAD unique reference access IDs are retained along with a subset of their meta information that includes whether the supplied configuration is from a converged calculation as well as the Density Functional Theory (DFT) code, version, and type of DFT functionals with the total potential energies. This dataset consists of 39.1% Al, 30.7% Ni, and 30.2% Cu and has 27,987 atomic environments in 3337 structures. |
10.60732/2744ff4e |
Al, Cu, Ni |
Berk Onat, Christoph Ortner, James R. Kermode |
1016 |
4646 |
3 |
DFT-undefined |
GPAW, VASP, exciting, FHI-aims |
27 |
||||
Training simulations from CGM-MLP_natcomm2023 of carbon deposition on a Cu surface. This dataset was one of the datasets used in training during the process of producing an active learning dataset for the purposes of exploring substrate-catalyzed deposition on metal surfaces such as Cu(111), Cr(110), Ti(001), and oxygen-contaminated Cu(111) as a means of controllable synthesis of carbon nanomaterials. The combined dataset includes structures from the Carbon_GAP_20 dataset and additional configurations of carbon clusters on a Cu, Cr and Ti surfaces. |
10.60732/c3b4e684 |
C, Cu |
Di Zhang, Peiyun Yi, Xinmin Lai, Linfa Peng, Hao Li |
1177 |
204591 |
2 |
DFT-PBE+D3 |
CP2K |
27 |
||||
The extended training dataset for GST_GAP_22, calculated using the PBEsol functional. New configurations, simulated under external electric fields, were labelled with DFT and added to the original reference database GST-GAP-22 contains configurations of phase-change materials on the quasi-binary GeTe-Sb2Te3 (GST) line of chemical compositions. Data was used for training a machine learning interatomic potential to simulate a range of germanium-antimony-tellurium compositions under realistic device conditions. |
10.60732/37c76fa8 |
Ge, Sb, Te |
Yuxing Zhou, Wei Zhang, Evan Ma, Volker L. Deringer |
2913 |
398991 |
3 |
DFT-PBEsol |
CASTEP |
27 |
||||
Approximately 15,000 configurations of copper used to demonstrate the DP-GEN data generator for PES machine learning models. |
10.60732/2060021e |
Cu |
Yuzhi Zhang, Haidi Wang, Weijie Chen, Jinzhe Zeng, Linfeng Zhang, Han Wang, Weinan E |
15269 |
297369 |
1 |
DFT-PBE |
VASP |
27 |
||||
Configurations from CA-9 dataset used for training NNP_CA-9 potential. CA-9 consists of configurations of carbon with curated subsets chosen to test the effects of intentionally choosing dissimilar configurations when training neural network potentials |
10.60732/8b765383 |
C |
Daniel Hedman, Tom Rothe, Gustav Johansson, Fredrik Sandin, J. Andreas Larsson, Yoshiyuki Miyamoto |
39993 |
2195024 |
1 |
DFT-PBE |
VASP |
27 |
||||
Data from the paper 'Ferrimagnetism induced by thermal vibrations in oxygen-deficient manganite heterostructures'. Includes Quantum ESPRESSO calculations of SrCaMnO3 and SrMnO3, stoichiometric and defective cells. |
Ca, Mn, O, Sr |
Moloud Kaviani, Chiara Ricca, Ulrich Aschauer |
11594 |
459546 |
4 |
DFT-PBEsol+U |
Quantum ESPRESSO |
27 |
|||||
Succinic acid validation PBE-TS dataset from "Semi-local and hybrid functional DFT data for thermalised snapshots of polymorphs of benzene, succinic acid, and glycine". DFT reference energies and forces were calculated using Quantum Espresso v6.3. The calculations were performed with the semi-local PBE xc functional, Tkatchenko-Scheffler dispersion correction, optimised norm-conserving Vanderbilt pseudopotentials, a Monkhorst-Pack k-point grid with a maximum spacing of 0.06 x 2π A^-1, and a plane-wave energy cut-off of 100 Ry for the wavefunction. |
10.60732/fa9d54f4 |
C, H, O |
Venkat Kapil, Edgar A. Engel |
500 |
14000 |
3 |
DFT-PBE+TS |
Quantum ESPRESSO v6.3 |
26 |
||||
500 configurations of Mg2 for MD prediction using a model fitted on Al, W, Mg and Si. |
10.60732/965983d1 |
Mg |
Connor Allen, Albert P. Bartok |
500 |
1000 |
1 |
IP-GAP |
CASTEP |
26 |
||||
500 decorrelated geometries sampled from 600 K xTB MD run. Acetylacetone dataset generated from a long molecular dynamics simulation at 300 K using a Langevin thermostat at the semi-empirical GFN2-xTB level of theory. Configurations were sampled at an interval of 1 ps and the resulting set of configurations were recomputed with density functional theory using the PBE exchange-correlation functional with D3 dispersion correction and def2-SVP basis set and VeryTightSCF convergence settings using the ORCA electronic structure package. |
10.60732/7239a192 |
C, H, O |
Ilyes Batatia, Simon Batzner, Dávid Péter Kovács, Albert Musaelian, Gregor N. C. Simm, Ralf Drautz, Christoph Ortner, Boris Kozinsky, Gábor Csányi |
500 |
7500 |
3 |
DFT-PBE+D3 |
ORCA 5.0 |
26 |
||||
This data set was used to generate a multi-element linear SNAP potential for InP, as published in Cusentino, M. A. et. al, J. Chem. Phys. (2020). Intended to produce an interatomic potential for indium phosphide capable of capturing high-energy defects that result from radiation damage cascades. |
10.60732/50cc0906 |
In, P |
Mary Alice Cusentino, Mitchell A. Wood, Aidan P. Thompson |
1802 |
106761 |
2 |
DFT-LDA |
VASP |
26 |
||||
Configurations of water clusters from HO_LiMoNiTi_NPJCM_2020 used in the training of an ANN, whereby total energy is extrapolated by a Taylor expansion as a means of reducing computational costs. |
10.60732/b633b325 |
H, O |
April M. Cooper, Johannes Kästner, Alexander Urban, Nongnuch Artrith |
1847 |
33246 |
2 |
DFT-BLYP+D3 |
VASP |
26 |
||||
Training data only from the Co_dimer_JPCA_2022 dataset. This dataset contains dimer molecules of Co(II) with potential energy calculations for structures with ferromagnetic and antiferromagnetic spin configurations. Calculations were carried out in Gaussian 16 with the PBE exchange-correlation functional and 6-31+G* basis set. All molecules contain the same atomic core region, consisting of the tetrahedral and octahedral Co centers and the three PO2R2 bridging ligands. The ligand exchange provides a broad range of exchange energies (ΔEJ), from +50 to -200 meV, with 80% of the ligands yielding ΔEJ < 10 meV. |
10.60732/07315f04 |
C, Cl, Co, H, N, O, P, S |
Sijin Ren, Eric Fonseca, William Perry, Hai-Ping Cheng, Xiao-Guang Zhang, Richard Hennig |
1794 |
154593 |
8 |
DFT-PBE |
Gaussian 16 |
26 |
||||
Benzene training PBE0-MBD dataset from "Semi-local and hybrid functional DFT data for thermalised snapshots of polymorphs of benzene, succinic acid, and glycine". DFT reference energies and forces were calculated using Quantum Espresso v6.3. The calculations were performed with the semi-local PBE xc functional, Tkatchenko-Scheffler dispersion correction, optimised norm-conserving Vanderbilt pseudopotentials, a Monkhorst-Pack k-point grid with a maximum spacing of 0.06 x 2π A^-1, and a plane-wave energy cut-off of 100 Ry for the wavefunction. |
10.60732/8d563e8a |
C, H |
Venkat Kapil, Edgar A. Engel |
1799 |
49512 |
2 |
DFT-PBE+TS |
Quantum ESPRESSO v6.3 |
26 |
||||
About 2,500 configurations of alpha-Fe used in the training and testing of a ML model with the goal of building magneto-elastic machine-learning interatomic potentials for large-scale spin-lattice dynamics simulations. |
10.60732/fe28ef5e |
Fe |
Svetoslav Nikolov, Mitchell A. Wood, Attila Cangi, Jean-Bernard Maillet, Mihai-Cosmin Marinica, Aidan P. Thompson, Michael P. Desjarlais, Julien Tranchida |
2157 |
44480 |
1 |
DFT-PBE |
VASP |
26 |
||||
The rattled-500-subsampled validation split of OMat24 (Open Materials 2024). OMat24 is a large-scale open dataset of density functional theory (DFT) calculations. The dataset is available in subdatasets and subsampled sub-datasets based on the structure generation strategy used. There are two main splits in OMat24: train and validation, each divided into the aforementioned subsampling and sub-datasets. |
10.60732/6f9ded6d |
Ac, Ag, Al, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, Hf, Hg, Ho, I, In, Ir, K, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, Np, O, Os, P, Pa, Pb, Pd, Pm, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Xe, Y, Yb, Zn, Zr |
Luis Barroso-Luque, Muhammed Shuaibi, Xiang Fu, Brandon M. Wood, Misko Dzamba, Meng Gao, Ammar Rizvi, C. Lawrence Zitnick, Zachary W. Ulissi |
39464 |
564068 |
85 |
DFT-PBE+U |
VASP |
26 |
||||
This dataset was generated using the following active learning scheme: 1) candidate structures were relaxed by a partially-trained MTP model, 2) structures for which the MTP had to perform extrapolation were passed to DFT to be re-computed, 3) the MTP was retrained, including the structures that were re-computed with DFT, 4) steps 1-3 were repeated until the MTP no longer extrapolated on any of the original candidate structures. The original candidate structures for this dataset included about 27,000 configurations that were bcc-like and close-packed (fcc, hcp, etc.) with 8 or fewer atoms in the unit cell and different concentrations of Co, Nb, and V. |
10.60732/f2c623f1 |
Co, Nb, V |
Konstantin Gubaev, Evgeny V. Podryabinkin, Gus L.W. Hart, Alexander V. Shapeev |
383 |
2812 |
3 |
DFT-undefined |
VASP |
25 |
||||
468 structures uniformly selected from the MD/tfMC simulation, excluding any structures that are part of the training set. This dataset was one of the datasets used in testing screening parameters during the process of producing an active learning dataset for Cu-C interactions for the purposes of exploring substrate-catalyzed deposition as a means of controllable synthesis of carbon nanomaterials. The combined dataset includes structures from the Carbon_GAP_20 dataset and additional configurations of carbon clusters on a Cu(111) surface. |
10.60732/8ad1a886 |
C, Cu |
Di Zhang, Peiyun Yi, Xinmin Lai, Linfa Peng, Hao Li |
469 |
156312 |
2 |
DFT-PBE+D3 |
CP2K |
25 |
||||
The train set of a train/test pair from the ethanol dataset from sGDML. To create the coupled cluster datasets, the data used for training the models were created by running ab initio MD in the NVT ensemble using the Nosé-Hoover thermostat at 500 K during a 200 ps simulation with a resolution of 0.5 fs. Energies and forces were recalculated using all-electron coupled cluster with single, double and perturbative triple excitations (CCSD(T)).The Dunning correlation-consistent basis set cc-pVTZ was used for ethanol. All calculations were performed with the Psi4 software suite. |
10.60732/c254fdb2 |
C, H, O |
Stefan Chmiela, Huziel E. Sauceda, Klaus-Robert Müller, Alexandre Tkatchenko |
998 |
8982 |
3 |
CCSD(T) |
Psi4 |
25 |
||||
A dataset of DFT-calculated energies created to investigate the effect of hydrogen doping on the crystal structure and the electronic state in SmNiO3.Configuration sets include sets for apically and side-bonded hydrogen atoms for 1-9 hydrogen atoms. |
10.60732/b834b71f |
H, Ni, O, Sm |
Kunihiko Yamauchi, Ikutaro Hamada |
3318 |
156419 |
4 |
DFT-PBE+U |
VASP |
25 |
||||
This dataset contains DFT calculations that were carried out in conjunction with experimental investigation of a cationic phenoxyimine yttrium complex as an isoprene polimerization catalyst. Calculations were performed using the Gaussian 09 D.01 suite of programs.Electronic structure calculations were performed at the DFT level using the B3PW91 functional. The Stuttgart-Cologne small-core quasi-relativistic pseudopotential ECP28MWB and its available basis set including up to the g function were used to describe yttrium. Similarly, silicon and phosphorus were represented by a Stuttgart-Dresden-Bonn pseudopotential along with the related basis set augmented by a d function of polarization (αd(P) = 0.387 and αd(Si) = 0.284). Other atoms were described by a polarized all-electron triple-ζ 6-311G(d,p) basis set. Bulk solvent effect of toluene or THF was simulated using the SMD continuum model. The Grimme empirical correction with the original D3 damping function was used to include the dispersion correction as a single-point calculation. Transition-state optimization was followed by frequency calculations to characterize the stationary point. Intrinsic reaction coordinate calculations were performed to confirm the connectivity of the transition states. Gibbs energies were estimated within the harmonic oscillator approximation and estimated at 298 K and 1 atm. |
10.60732/bd18acbe |
Al, B, C, F, H, N, O, Si, Y |
Alexis D. Oswald, Ludmilla Verrieux, Pierre-Alain R. Breuil, Hélène Olivier-Bourbigou, Julien Thuilliez, Florent Vaultier, Mostafa Taoufik, Lionel Perrin, Christophe Boisson |
109 |
9074 |
9 |
DFT-B3PW91+D3(BJ) |
Gaussian 09 |
24 |
||||
The train set of a train/test pair from the malonaldehyde dataset from sGDML. To create the coupled cluster datasets, the data used for training the models were created by running ab initio MD in the NVT ensemble using the Nosé-Hoover thermostat at 500 K during a 200 ps simulation with a resolution of 0.5 fs. Energies and forces were recalculated using all-electron coupled cluster with single, double and perturbative triple excitations (CCSD(T)). The Dunning correlation-consistent basis set cc-pVDZ was used for malonaldehyde. All calculations were performed with the Psi4 software suite. |
10.60732/b53c02ad |
C, H, O |
Stefan Chmiela, Huziel E. Sauceda, Klaus-Robert Müller, Alexandre Tkatchenko |
1000 |
9000 |
3 |
CCSD(T) |
Psi4 |
23 |
||||
6000 configurations of liquid and amorphous HfO2 generated for use with an active learning ML model. |
10.60732/dcb4440a |
Hf, O |
Ganesh Sivaraman, Anand Narayanan Krishnamoorthy, Matthias Baur, Christian Holm, Marius Stan, Gábor Csányi, Chris Benmore, Álvaro Vázquez-Mayagoitia |
5999 |
575904 |
2 |
DFT-PBE |
VASP 5.4.4 |
23 |
||||
Approximately 2,800 configurations of Li10GeP2S12, based on crystal structures from the Materials Project database, material ID mp-696129. One of two LiGePS datasets from this source. The other uses the PBE functional, rather than the PBEsol functional. |
10.60732/03312bdd |
Ge, Li, P, S |
Jianxing Huang, Linfeng Zhang, Han Wang, Jinbao Zhao, Jun Cheng, Weinan E |
2835 |
504350 |
4 |
DFT-PBEsol |
VASP 5.4.4 |
23 |
||||
This dataset contains all frames from the trajectories for the training configurations in the OC20 initial structure to relaxed energy (IS2RE) and initial structure to relaxed structure (IS2RS) tasks of Open Catalyst 2020 (OC20). Dataset corresponds to the "All IS2RE/S training" data split under the "Relaxation Trajectories" section of the Open Catalyst Project page. |
10.60732/d63dce0c |
Ag, Al, As, Au, B, Bi, C, Ca, Cd, Cl, Co, Cr, Cs, Cu, Fe, Ga, Ge, H, Hf, Hg, In, Ir, K, Mn, Mo, N, Na, Nb, Ni, O, Os, P, Pb, Pd, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Tc, Te, Ti, Tl, V, W, Y, Zn, Zr |
Lowik Chanussot, Abhishek Das, Siddharth Goyal, Thibaut Lavril, Muhammed Shuaibi, Morgane Riviere, Kevin Tran, Javier Heras-Domingo, Caleb Ho, Weihua Hu, Aini Palizhati, Anuroop Sriram, Brandon Wood, Junwoong Yoon, Devi Parikh, C. Lawrence Zitnick, Zachary Ulissi |
92897924 |
7522584885 |
56 |
DFT-rPBE |
VASP |
13 |
||||
DFT reference structures used to train neuroevolution potentials (NEP) for a study disentangling lone-pair chemistry and geometric effects in the octahedral tilting of halide double perovskites. The dataset contains the training configurations (with energies, forces, and stresses) for three representative compounds: Cs2AgAlBr6, Cs2AgBiBr6, and Cs2InBiBr6. Reference calculations used VASP with the SCAN+rVV10 meta-GGA functional (BPARAM=15.7, CPARAM=0.0093), a 520 eV plane-wave cutoff, Gaussian smearing (SIGMA=0.1 eV), and Gamma-centered Brillouin-zone sampling (KSPACING=0.25). Configuration sets group the structures by compound. |
Ag, Al, Bi, Br, Cs, In |
Mehmet Baskurt, Erik Fransson, Madeleine Lindvik, Paul Erhart, Julia Wiktor |
1389 |
89760 |
6 |
DFT-SCAN+rVV10 |
VASP |
11 |
|||||
Density functional theory dataset for cobalt, platinum, and CoPt bimetallic catalysts investigated for the dry reforming of methane (DRM). It comprises bulk metals and alloys (Co, Pt, CoPt L1_0 and fcc), (111) surface slab models, and minimum-energy reaction paths for CH4 and CO2 dissociation obtained with the machine-learning nudged elastic band (ML-NEB) method. All ionic relaxation/path images are included. Calculations used VASP 6.4.2 with the GGA-PBE functional, Grimme D3 dispersion with Becke-Johnson damping (IVDW=12), a plane-wave cutoff of 400 eV for slabs and 520 eV for bulk, spin polarization for cobalt-containing systems, Methfessel-Paxton smearing (ISMEAR=1), Gamma-centered k-point meshes, and dipole corrections normal to the surfaces. Transition states were located with the CatLearn ML-NEB module. Configuration sets separate bulk structures, surface slabs, and reaction-barrier images. |
C, Co, H, O, Pt |
David Niedbalka, Hector Prats, Estefanía Díaz López, Marcel Janák, Diana Piankova, Anna Loiudice, Raffaella Buonsanti, Aleix Comas-Vives, Christoph R. Müller, Paula M. Abdala |
4028 |
248920 |
5 |
DFT-PBE+D3(BJ) |
VASP 6.4.2 |
11 |
|||||
VASP single-point (SCF) DFT calculations underpinning a mechanistic study of A-site doping in lithium-lanthanum-titanate (LMTO/LLTO) perovskite nanorods and their interfaces with a p(MTFSI) polymer electrolyte, aimed at understanding interfacial Li-ion and Na-ion transport for composite polymer electrolyte design. The dataset includes bulk/reference systems (Li, Na, MTFSI, LiMTFSI, NaMTFSI, MTFSI dimers) and LMTO/polymer interface slabs at varying A-site compositions. Calculations used VASP with the r2SCAN meta-GGA functional (with rVV10 nonlocal correlation for surfaces) via pymatgen's MPScanRelaxSet, PBE PAW (version 54) potentials, EDIFF=1e-5 eV, and Gaussian smearing (ISMEAR=0). Energies and forces are taken from the SCF OUTCAR; structures from the paired CONTCAR. Configuration sets group calculations by reference/interface category. |
C, F, H, K, La, Li, N, Na, O, S, Ti |
Lauren B. Shepard, Ji-young Ock, Amit Bhattacharya, Tao Wang, Albina Borisevich, Michelle Lehmann, Sheng Dai, Raphaële Clément, Alexei P. Sokolov, X. Chelsea Chen, Susan B. Sinnott |
50 |
16097 |
11 |
DFT-R2SCAN |
VASP |
10 |
|||||
Density functional theory reference data for constructing a machine-learning force field (MLFF) of cerium oxide (CeO2) surfaces containing an oxygen vacancy, generated with VASP on-the-fly machine-learning and stored in ML_AB training files. The dataset follows a dataset-merging strategy, combining six independently sampled surface families that vary the surface orientation (CeO2(100) Ce-terminated and CeO2(111)), slab thickness (two- vs three-layer), and oxygen-vacancy content (zero or one vacancy), for roughly 1,700 configurations carrying total energies, atomic forces, and stresses. Configuration sets group the data by surface family. Reference calculations used VASP with the spin-polarized PBE functional plus Grimme D3 dispersion and a Hubbard U correction on the Ce 4f states (DFT+U, Ueff=5.0 eV), a 520 eV plane-wave cutoff, and a 1x1x1 Gamma-centered k-point grid. |
Ce, O |
Kai Oshiro, Min Gao, Jun-ya Hasegawa |
1746 |
51361 |
2 |
DFT-PBE+U+D3 |
VASP 6.4.2 |
9 |
|||||
DFT reference dataset for an Allegro machine-learned interatomic potential for silica (SiO2) valid up to 15000 K, spanning the high-temperature melt, melt-quench amorphization, and mechanical-deformation regimes. The configurations were selected by HYAL active learning - an Allegro/LAMMPS sampler proposing structures of alpha-quartz, beta-cristobalite, coesite, and amorphous silica across an initial set and melt, high-temperature melt, melt-quench, and mechanical shear/tension sampling stages - and each was then labeled with a single-point VASP calculation (roughly 2780 successfully labeled configurations with total energies, atomic forces, and stresses). Calculations used VASP 6.3.2 with the r2SCAN meta-GGA functional (PAW_PBE potentials), a 1000 eV plane-wave cutoff, an electronic convergence of 1e-6 eV, Gaussian smearing (ISMEAR=0, SIGMA=0.1 eV), and Gamma-point Brillouin-zone sampling; each r2SCAN calculation was preceded by a PBE pre-convergence step. The resulting Allegro potential was used to study dynamic fracture and energy dissipation in silica glass. Configuration sets group the data by silica system (quartz, cristobalite, coesite, amorphous). |
O, Si |
Henrik Andersen Sveinsson |
2781 |
558432 |
2 |
DFT-r2SCAN |
VASP 6.3.2 |
9 |
|||||
VASP machine-learning-force-field (ML_AB) training set for the CsPbI3 halide perovskite, covering the cubic-tetragonal phase transition. This is one of seven sister datasets from the same publication, each providing a VASP ML_AB training set for a slightly different characteristic of the CsPbI3 perovskite (the cubic-tetragonal phase transition and the migration of iodide vacancies and interstitials in neutral, positively charged, and negatively charged states), all generated to train machine-learned force fields for ion migration. Each configuration carries the total energy, atomic forces, and stress from VASP single-point reference calculations using the PBE functional with Grimme D3 dispersion and Becke-Johnson damping (PBE-D3-BJ), on 2x2x2 cubic supercells (~40 atoms). (Plane-wave cutoff and k-point details are reported only in the paper's Supporting Information.) |
Cs, I, Pb |
Viren Tyagi, Mike Pols, Geert Brocks, Shuxia Tao |
862 |
34520 |
3 |
DFT-PBE+D3(BJ) |
VASP |
9 |
|||||
VASP machine-learning-force-field (ML_AB) training set for the CsPbI3 halide perovskite, covering migration of a neutral iodide interstitial. This is one of seven sister datasets from the same publication, each providing a VASP ML_AB training set for a slightly different characteristic of the CsPbI3 perovskite (the cubic-tetragonal phase transition and the migration of iodide vacancies and interstitials in neutral, positively charged, and negatively charged states), all generated to train machine-learned force fields for ion migration. Each configuration carries the total energy, atomic forces, and stress from VASP single-point reference calculations using the PBE functional with Grimme D3 dispersion and Becke-Johnson damping (PBE-D3-BJ), on 2x2x2 cubic supercells (~40 atoms). (Plane-wave cutoff and k-point details are reported only in the paper's Supporting Information.) |
Cs, I, Pb |
Viren Tyagi, Mike Pols, Geert Brocks, Shuxia Tao |
2008 |
82328 |
3 |
DFT-PBE+D3(BJ) |
VASP |
9 |
|||||
Density functional theory study of the adsorption of hexachlorobenzene (C6Cl6, HCB) on a montmorillonite clay surface and the effect of partial hydration, with explicit co-adsorbed water molecules. Each configuration is a VASP geometry optimization of an HCB molecule (with water) on a montmorillonite slab; all ionic relaxation steps are included. The dataset also contains the isolated montmorillonite-slab and HCB-molecule reference calculations used to evaluate interaction energies. Calculations used VASP 6.2.0 with the PBE functional, the Tkatchenko-Scheffler dispersion correction with iterative Hirshfeld partitioning (IVDW=21), Gaussian smearing (ISMEAR=0), and Gamma-point Brillouin-zone sampling. |
Al, C, Ca, Cl, F, Fe, H, Mg, Na, O, Si |
Daniel Tunega, Peter Grančič, Martin H. Gerzabek, Leonard Böhm |
9929 |
3551246 |
11 |
DFT-PBE |
VASP 6.2.0 |
8 |
|||||
VASP machine-learning-force-field (ML_AB) training set for the CsPbI3 halide perovskite, covering migration of a positively charged iodide interstitial. This is one of seven sister datasets from the same publication, each providing a VASP ML_AB training set for a slightly different characteristic of the CsPbI3 perovskite (the cubic-tetragonal phase transition and the migration of iodide vacancies and interstitials in neutral, positively charged, and negatively charged states), all generated to train machine-learned force fields for ion migration. Each configuration carries the total energy, atomic forces, and stress from VASP single-point reference calculations using the PBE functional with Grimme D3 dispersion and Becke-Johnson damping (PBE-D3-BJ), on 2x2x2 cubic supercells (~40 atoms). (Plane-wave cutoff and k-point details are reported only in the paper's Supporting Information.) |
Cs, I, Pb |
Viren Tyagi, Mike Pols, Geert Brocks, Shuxia Tao |
2176 |
89216 |
3 |
DFT-PBE+D3(BJ) |
VASP |
8 |
|||||
VASP machine-learning-force-field (ML_AB) training set for the CsPbI3 halide perovskite, covering migration of a positively charged iodide vacancy. This is one of seven sister datasets from the same publication, each providing a VASP ML_AB training set for a slightly different characteristic of the CsPbI3 perovskite (the cubic-tetragonal phase transition and the migration of iodide vacancies and interstitials in neutral, positively charged, and negatively charged states), all generated to train machine-learned force fields for ion migration. Each configuration carries the total energy, atomic forces, and stress from VASP single-point reference calculations using the PBE functional with Grimme D3 dispersion and Becke-Johnson damping (PBE-D3-BJ), on 2x2x2 cubic supercells (~40 atoms). (Plane-wave cutoff and k-point details are reported only in the paper's Supporting Information.) |
Cs, I, Pb |
Viren Tyagi, Mike Pols, Geert Brocks, Shuxia Tao |
1827 |
71253 |
3 |
DFT-PBE+D3(BJ) |
VASP |
8 |
|||||
VASP machine-learning-force-field (ML_AB) training set for the CsPbI3 halide perovskite, covering migration of a neutral iodide vacancy. This is one of seven sister datasets from the same publication, each providing a VASP ML_AB training set for a slightly different characteristic of the CsPbI3 perovskite (the cubic-tetragonal phase transition and the migration of iodide vacancies and interstitials in neutral, positively charged, and negatively charged states), all generated to train machine-learned force fields for ion migration. Each configuration carries the total energy, atomic forces, and stress from VASP single-point reference calculations using the PBE functional with Grimme D3 dispersion and Becke-Johnson damping (PBE-D3-BJ), on 2x2x2 cubic supercells (~40 atoms). (Plane-wave cutoff and k-point details are reported only in the paper's Supporting Information.) |
Cs, I, Pb |
Viren Tyagi, Mike Pols, Geert Brocks, Shuxia Tao |
2715 |
105885 |
3 |
DFT-PBE+D3(BJ) |
VASP |
7 |
|||||
VASP machine-learning-force-field (ML_AB) training set for the CsPbI3 halide perovskite, covering migration of a negatively charged iodide interstitial. This is one of seven sister datasets from the same publication, each providing a VASP ML_AB training set for a slightly different characteristic of the CsPbI3 perovskite (the cubic-tetragonal phase transition and the migration of iodide vacancies and interstitials in neutral, positively charged, and negatively charged states), all generated to train machine-learned force fields for ion migration. Each configuration carries the total energy, atomic forces, and stress from VASP single-point reference calculations using the PBE functional with Grimme D3 dispersion and Becke-Johnson damping (PBE-D3-BJ), on 2x2x2 cubic supercells (~40 atoms). (Plane-wave cutoff and k-point details are reported only in the paper's Supporting Information.) |
Cs, I, Pb |
Viren Tyagi, Mike Pols, Geert Brocks, Shuxia Tao |
2217 |
90897 |
3 |
DFT-PBE+D3(BJ) |
VASP |
7 |
|||||
VASP machine-learning-force-field (ML_AB) training set for the CsPbI3 halide perovskite, covering migration of a negatively charged iodide vacancy. This is one of seven sister datasets from the same publication, each providing a VASP ML_AB training set for a slightly different characteristic of the CsPbI3 perovskite (the cubic-tetragonal phase transition and the migration of iodide vacancies and interstitials in neutral, positively charged, and negatively charged states), all generated to train machine-learned force fields for ion migration. Each configuration carries the total energy, atomic forces, and stress from VASP single-point reference calculations using the PBE functional with Grimme D3 dispersion and Becke-Johnson damping (PBE-D3-BJ), on 2x2x2 cubic supercells (~40 atoms). (Plane-wave cutoff and k-point details are reported only in the paper's Supporting Information.) |
Cs, I, Pb |
Viren Tyagi, Mike Pols, Geert Brocks, Shuxia Tao |
2371 |
114129 |
3 |
DFT-PBE+D3(BJ) |
VASP |
7 |
|||||
Spin-polarized density functional theory structural relaxations probing hydrogen-induced lattice strain in a chemically complex Fe-based (bcc) hybrid steel. A 55-atom supercell of composition VMoCrMnFe47NiAlSiC is relaxed with 0, 1, 2, and 5 hydrogen atoms inserted into interstitial sites; all ionic steps of each cell-and-ion optimization are included, yielding configurations with total energies, atomic forces, and stresses. Calculations used VASP 6.4.2 (within the MedeA environment) with the GGA-PBE functional, a 400 eV plane-wave cutoff, spin polarization (ISPIN=2), Methfessel-Paxton smearing (SIGMA=0.2 eV), a 2x2x2 Gamma-centered k-point mesh, and full relaxation of lattice vectors and atomic positions (IBRION=2, ISIF=3, EDIFFG=-0.02 eV/Angstrom). Configuration sets group the relaxations by hydrogen content. |
Al, C, Cr, Fe, H, Mn, Mo, Ni, Si, V |
Ammar Aksoy, Cem Örnek, Beste Payam, Bilgehan M. Şeşen, Çağatay Yelkarası, Steve Ooi |
214 |
12324 |
10 |
DFT-PBE |
VASP 6.4.2 |
7 |
|||||
DFT-optimized structures and total energies of graphene interacting with urea and water molecules, supporting a combined experimental and first-principles study of a graphene field-effect-transistor (FET) sensor for the detection of urea in water. The configurations span graphene with one or more urea/water molecules in various adsorption geometries. Calculations used VASP with the optB86b-vdW exchange-correlation functional (GGA=MK, LUSE_VDW, PARAM1=0.1234, PARAM2=1.0), a 900 eV plane-wave cutoff, Gaussian smearing (SIGMA=0.01 eV), and a 3x3x1 Monkhorst-Pack k-point mesh. The relaxed geometries (CONTCAR) were not archived, so each input geometry (POSCAR) is paired with the energy of the first ionic step from OSZICAR - the single-point energy of that geometry. |
C, H, N, O |
Ondřej Špaček, Linda Supalová, Jindřich Mach, David Nezval, Tomáš Šikola, Miroslav Bartošík |
13 |
1764 |
4 |
DFT-optB86b-vdW |
VASP |
6 |
|||||
OC20_S2EF_train_all is the ~63 million structure full training set of the OC20 Structure to Energy and Forces (S2EF) dataset. Features include energy, atomic forces and data from the OC20 mappings file, including adsorbate id, materials project bulk id and miller index. |
10.60732/a9baab35 |
Ag, Al, As, Au, B, Bi, C, Ca, Cd, Cl, Co, Cr, Cs, Cu, Fe, Ga, Ge, H, Hf, Hg, In, Ir, K, Mn, Mo, N, Na, Nb, Ni, O, Os, P, Pb, Pd, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sn, Sr, Ta, Tc, Te, Ti, Tl, V, W, Y, Zn, Zr |
Lowik Chanussot, Abhishek Das, Siddharth Goyal, Thibaut Lavril, Muhammed Shuaibi, Morgane Riviere, Kevin Tran, Javier Heras-Domingo, Caleb Ho, Weihua Hu, Aini Palizhati, Anuroop Sriram, Brandon Wood, Junwoong Yoon, Devi Parikh, C. Lawrence Zitnick, Zachary Ulissi |
133934018 |
9810895377 |
56 |
DFT-rPBE |
VASP |
3 |
||||
The test set of OMol25. OMol25 (Open Molecules 2025) is a large dataset of structures with up to 350 atoms, calculated at a high level of DFT theory (ωB97M-V/def2-TZVPD). This dataset is intended to provide a broad sampling of chemical complexity and structural diversity. OMol2 includes biomolecules, metal complexes, electrolytes, and community datasets that have been recalculated at this higher level of theory. Included community datasets are: ANI-2X, Transition-1X, ANI-1xBB, OrbNet Denali, SPICE2, and Solvated Protein Fragments. OMol25 also includes 30% of the GEOM dataset, with these systems optimized and a fraction of these having their initial positions randomly perturbed. |
Ag, Al, Ar, As, Au, B, Ba, Be, Bi, Br, C, Ca, Cd, Ce, Cl, Co, Cr, Cs, Cu, Dy, Er, Eu, F, Fe, Ga, Gd, Ge, H, He, Hf, Hg, Ho, I, In, Ir, K, Kr, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ne, Ni, O, Os, P, Pb, Pd, Pm, Pr, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Ti, Tl, Tm, V, W, Xe, Y, Yb, Zn, Zr |
Daniel S. Levine, Muhammed Shuaibi, Evan Walter Clark Spotte-Smith, Michael G. Taylor, Muhammad R. Hasyim, Kyle Michel, Ilyes Batatia, Gábor Csányi, Misko Dzamba, Peter Eastman, Nathan C. Frey, Xiang Fu, Vahe Gharakhanyan, Aditi S. Krishnapriyan, Joshua A. Rackers, Sanjeev Raja, Ammar Rizvi, Andrew S. Rosen, Zachary Ulissi, Santiago Vargas, C. Lawrence Zitnick, Samuel M. Blau, Brandon M. Wood |
2766167 |
342021649 |
83 |
DFT-ωB97M-V |
ORCA 6.0.0 |
3 |