from jarvis.db.figshare import data
d = data('qm9_std_jctc')
{'mu': -1.77790756800166,
'alpha': -7.59467417670514,
'HOMO': -6.71425764235072,
'LUMO': 2.24686567442436,
'gap': 5.35591684810335,
'R2': -4.11464477806684,
'ZPVE': -3.14893653207103,
'U0': 5.70989371834825,
'U': 5.69336539320842,
'H': 5.68508295617329,
'G': 5.75764468354196,
'Cv': -6.18353212813309,
'omega1': -1.3203823354756,
'SMILES': 'C',
'SMILES_relaxed': 'C',
'id': '000001',
'atoms': {'lattice_mat': [[60, 0, 0], [0, 60, 0], [0, 0, 60]],
'coords': [[0.4999998496686667, 0.5000001250963333, 0.4999999923633333],
[0.5002473255336667, 0.481802867173, 0.4998995777733333],
[0.5170736659886667, 0.5062992418296667, 0.4998712520133333],
[0.49119790078366665, 0.5060288326963334, 0.48525591384666666],
[0.4914812580253333, 0.5058689332046666, 0.5149732640033333]],
'elements': ['C', 'H', 'H', 'H', 'H'],
'abc': [60.0, 60.0, 60.0],
'angles': [90.0, 90.0, 90.0],
'cartesian': False,
'props': ['', '', '', '', '']}}
5
gdb 1 157.7118 157.70997 157.70699 0. 13.21 -0.3877 0.1171 0.5048 35.3641 0.044749 -40.47893 -40.476062 -40.475117 -40.498597 6.469
C -0.0126981359 1.0858041578 0.0080009958 -0.535689
H 0.002150416 -0.0060313176 0.0019761204 0.133921
H 1.0117308433 1.4637511618 0.0002765748 0.133922
H -0.540815069 1.4475266138 -0.8766437152 0.133923
H -0.5238136345 1.4379326443 0.9063972942 0.133923
1341.307 1341.3284 1341.365 1562.6731 1562.7453 3038.3205 3151.6034 3151.6788 3151.7078
C C
InChI=1S/CH4/h1H4 InChI=1S/CH4/h1H4
Line Content
---- -------
1 Number of atoms na
2 Properties 1-17 (see below)
3,...,na+2 Element type, coordinate (x,y,z) (Angstrom), and Mulliken partial charge (e) of atom
na+3 Frequencies (3na-5 or 3na-6)
na+4 SMILES from GDB9 and for relaxed geometry
na+5 InChI for GDB9 and for relaxed geometry
The properties stored in the second line of each file:
I. Property Unit Description
-- -------- ----------- --------------
1 tag - "gdb9"; string constant to ease extraction via grep
2 index - Consecutive, 1-based integer identifier of molecule
3 A GHz Rotational constant A
4 B GHz Rotational constant B
5 C GHz Rotational constant C
6 mu Debye Dipole moment
7 alpha Bohr^3 Isotropic polarizability
8 homo Hartree Energy of Highest occupied molecular orbital (HOMO)
9 lumo Hartree Energy of Lowest occupied molecular orbital (LUMO)
10 gap Hartree Gap, difference between LUMO and HOMO
11 r2 Bohr^2 Electronic spatial extent
12 zpve Hartree Zero point vibrational energy
13 U0 Hartree Internal energy at 0 K
14 U Hartree Internal energy at 298.15 K
15 H Hartree Enthalpy at 298.15 K
16 G Hartree Free energy at 298.15 K
17 Cv cal/(mol K) Heat capacity at 298.15 K
I. = Property index (properties are given in this order)
For the 6095 isomers, properties 12-16 were calculated at the G4MP2 level of theory.
All other calculations were done at the DFT/B3LYP/6-31G(2df,p) level of theory.
I used the following code:
The 1st data in QM9 dataset obtained from JARVIS:
And, the original 1st data in QM9 dataset with description:
I found the units are converted and normalized
For example, for homo, lumo, ...
Hartree -> eV, and then normalized from the entire data with mean and std
How could I get a unit and mean/std factors for each property?