суббота, 24 июня 2017 г.

The basics of high-accuracy ab initio computations (Feller–Peterson–Dixon method)

  For performing high-level ab initio computations (implying the usage of post-HF methods), three cornerstones must be taken into account: the level of theory for describing the electron correlation effects, the basis set size, and the level of theory for describing relativistic effects.  
  Currently, the coupled cluster-based techniques (CC) are very powerful ab initio methods for describing the electron correlation effects (for inorganic molecules). The hierarchy of these methods includes CCSD (with singles and doubles excitations), CCSDT (also triple excitations), CCSDTQ, etc. Within this series, the energy quickly converges to a hypothetical point of “full correlation” (for a specified basis set). The advantage of the coupled cluster theory, compared to the configuration interaction (CI) technique, is that with CC the energy converges more quickly upon increasing the excitation rank than with CI.
  CCSD(T) is called the “golden standard” of quantum chemistry. Unlike the CCSDT method, in CCSD(T) the triples (triple excitations) are treated non-iteratively, in a perturbative fashion, which reduces computation cost quite significantly. CCSD does not give such good describing of electronic correlation, CCSDT is quite costly, while the CCSD(T) method is a good compromise; in addition, the success of the CCSD(T) approach stems from fortuitous error cancellation (however at Ref. [14] it is stated that CCSD(T) is worse than CCSDT for open shell systems, in contrast to closed shell systems).
  Note that for all post-HF methods, upon increasing the basis set size the total energy converges relatively slowly; that’s why, it is a kind of naïve to expect “golden quality” results from the CCSD(T) method with the basis sets as small as DZ or even TZ (the DZ and TZ abbreviations stand for double-zeta and triple-zeta quality basis sets, respectively).
  As we have written before, the so-called complete basis set (CBS) limit means that you first compute with cc-pVDZ, then cc-pVTZ, then cc-pVQZ, then cc-pV5Z, etc., and the energy should converge to a hypothetical “complete” basis set limit. At the same time, there are more than 10 extrapolation schemes which give nearly the same result after performing only 2-3 computations (however, these extrapolation schemes are empirical to some extent).
  In the work of a quantum chemistry guru Kirk Peterson [1] the following is stated: “To directly achieve chemical accuracy with CCSD(T) without extrapolation would require correlation of both the valence and outer-core electrons with a basis set of at least aug-cc-pCV6Z or augcc-pCV7Z quality”. The “chemical accuracy” term is usually interpreted in the thermochemistry literature as the error of 1 kcal/mol ( 4.184 kJ/mol) for atomization energies; Peterson offers the following arbitrary estimates for spectroscopic parameters: +-0.005 Å for bond lengths and +-15 cm-1 for vibrational frequencies. Without the extrapolation schemes, such computations are possible for very small molecules only; besides that, for heavy atoms (starting, e.g., from post-3d elements), the relativistic effects must be properly included to attain even semiquantitative accuracy [2, 3].    An efficient way of reducing the computation cost is the “composite approach”, which is based on the concept that smaller components are additive and that these terms can be most efficiently calculated at different levels of theory (but comparable levels of accuracy).
  Previous composite methods like G1, G2, etc., or W1, W2 comprise computing chemical properties with molecular geometries obtained at HF/6-31G* or MP2/6-31G* level, and are much less reliable than fully ab-initio Feller–Peterson–Dixon method (see below) (unlike the FPD approach, these methods are a “black box”).
  For very small molecules, the HEAT model (high-accuracy extrapolated ab initio thermochemistry) can be used; it produced a MAD of 0.09 kcal/mol relative to well-established experimental DHf(0 K) values (for a test set of 26 small molecules involving 5 atoms (H, C, N, O and F)) [5]. The maximum observed error was 0.33 kcal/mol (C2H2). The HEAT approach implies optimizing geometries at CCSD(T)(CV)//cc-pVQZ, then estimating the basis set limit by a 1/lmax 3 extrapolation [4] of aug-cc-pCVQZ and aug-cc-pCV5Z energies.
 In the Feller–Peterson–Dixon (FPD) approach, the total energy of atomization of the molecule is usually computed as the sum of the following terms:
1) Estimated complete basis set (CBS) limit of the CCSD(T) energy;
2) Energy of correlation of outer-core electrons with themselves and with the valence electrons (core-valence (CV) term). This term is estimated at CBS limit too, if possible;
3) Scalar relativistic effects term (or, when these effects are already accounted for in the first two stages, the elimination of errors due to using the pseudopotential approximation);
Sometimes it is advised to use all-electron DK-contracted basis sets already for the first two stages if available relativistic sets are of the same size as non-relativistic ones.
4) Spin-orbit (SO) interaction term. This is a relativistic term too; more specifically, there are two levels of spin-orbit effects, the effects of second order are not always necessary to include. For closed shell species, only first order effects should be taken into account;
5) Corrections to the total atomization energy beyond the CCSD(T) level;
6) Zero-point energy (the energy of the first vibrational state). Usually this correction is computed in the harmonic approximation. For polyatomic molecules, this term may be reliably obtained from harmonic frequencies computed at lower levels of theory (DFT, MP2).

  The approach provides flexibility in terms of the use of a particular level of theory for computing individual contributions. In some cases, a few of these components may be excluded without severe loss of accuracy. Below we present two sample formulas of computing the total energy in FPD approach, appropriate for a molecule comprising 2-5 atoms (the first formula demands lower hardware level than the latter):

 Particularly, for the Q5 extrapolation algorithm:


 

N=4 for QZ, 5 for 5Z; here we have got 2 equations with two variables (Ecbs and A).
∆ESO includes atomic corrections taken from experimental data (NIST) and computed corrections for molecules.
  The explanation of this formula is presented in Ref. [6]. We know that the ∆ESO can be computed using the Molpro program (via the state-interacting approach within the first-order perturbation theory) or the Dirac program (spin-orbit coupling is already included at the HF level of theory).
  In the FPD approach, when high accuracy is necessary, the molecular geometry is usually optimized using CCSD(T) method with a sequence of the correlation-consistent basis sets in an effort to evaluate energy differences at or near the CBS limit geometries. Besides that, reasonable results can be usually obtained with lower level geometries (e.g. computed at DFT level).
  The FPD approach currently seems to be the best solution for computing high-accuracy energies and other properties of relatively small molecules (usually inorganic). As an indication of the performance of the FPD procedure for atomization energies, a mean signed deviation of -0.04 kcal/mol, RMS = 0.28 kcal/mol and MAD = 0.17 kcal/mol is found when comparing against 121 molecules (small organic or inorganic molecules containing s and p elements, like CH4, C2N2, C2H2O, C4H6, SO3, HBr, H2SiO, I2, etc) whose experimental uncertainties are ±1 kcal/mol or less [7].
  Now we should make an important statement. There is a persuasive hypothesis that the FPD approach never produces wrong results, if one reaches  convergence in each of the computed contributions to the energy. And when the computation is performed by an experienced quantum chemist, he can always tell whether the convergence is reached in a particular component. For reliability of the FPD method it is required that the additivity approach works, and an experienced quantum chemist can test this approach too.
  The term “wrong result” needs to be interpreted in more details. Any computation can be characterized by a specific accuracy. In chemistry, the concepts of “spectroscopic accuracy” and “chemical accuracy” are used. The spectroscopic accuracy (1 cm-1) is needed for the spectroscopy, the chemical accuracy (1 kcal/mole) is needed for thermochemistry, and for the majority of tasks probably even smaller accuracy is required. Using the FPD method on personal computers, one can reach chemical accuracy for the overwhelming majority of small stable inorganic molecules (diatomic and symmetrical polyatomic), or for small- to medium-sized  organic molecules. It should be noted that for first and second row compounds the said accuracy can be achieved, using only the first and second step of the FPD procedure (elimination of basis set incompleteness and taking into account the core-valence correlation).
  It has been shown [12], that for molecules containing d and f elements the experimental error exceeds 1 kcal/mol, and because of that, for these compounds it was proposed [13] to consider the chemical accuracy being 3 kcal/mol for d elements and 5 kcal/mol for f elements.
    The most important point here is as follows: an experienced quantum chemist, if he can’t reach the required accuracy in the computation, is usually aware of that, because the convergence of any FPD component (contribution to the energy) can be estimated via its computation on a higher level of theory. The necessity to analyze individual components is both the disadvantage of the FPD approach (because it makes this method relatively laborious, demanding a high qualification of the chemist and the use of appropriate software), and its advantage (because the quantum chemist can decide which contributions should be calculated at a high level of theory, and which of them may be neglected). Because of that, the FPD method is definitely not a “black box”. And besides that, the error of the computation can be often estimated by comparing the results of the computations for similar molecules with the experimental data (mainly high-accuracy, particularly spectroscopic). For example, to estimate the computational errors for the NdI3 molecule, one can calculate the Nd3+ ion and compare the relative energies of its electronic states with the experiment.
  At the moment, only a few groups are performing the FPD computations: groups of Peterson K.A., DeYonker N., Hill J. G., Solomonik V.G.
  It should be mentioned that the abbreviation "FPD" should not be considered as some prescribed procedure; this term does not imply using any particular approaches when any energy contribution is computed. A researcher who applies this method must decide himself, what level of theory should be used in the computation of a particular energy contribution (depending on the task level and available hardware and software). So, a significant advantage of the FPD method is that it can immediately benefit from any newest developments of methods, basis sets, and hardware.
  While modeling spectra, it is desirable to ensure that the Born-Oppenheimer approach is suitable for the molecule studied. By the term “an approach is non-suitable” we mean that the accuracy of the computation within this approach is worse than required. The Born-Oppenheimer approach can yield large errors, if there are low-lying electronic states in the system: in this case, the reciprocal influence of the PESs of different electronic states will be observed, and this will lead to significant discrepancies between the computed and experimental spectra. When the Born-Oppenheimer approach fails, the electronic and vibrational states interact, and the electronic and vibrational (nuclear) solutions cannot be separated [8]. Ab initio methods, in fact, allow one to deal with the electronic part, while the Born-Oppenheimer approach is related to the nuclear one.
  For very heavy elements, other relativistic elements can play a role, in particular the effects of quantum electrodynamics (first of all, the Lamb shift). Kirk Peterson writes:
While generally neglected in most composite methodologies, it has been shown previously by Dyall et al. that the contribution of the Lamb shift to the atomization energy can approach 3–5% of the scalar relativistic contribution. In cases where the scalar relativistic correction is large, for example when the valence s orbital occupation strongly changes, the Lamb shift can contribute ~0.1 to 0.2 kcal/mol to the atomization energy for even relatively light systems like AlF3 and GaF3.
    The FPD method is usually applied for inorganic molecules. It should be mentioned, that for molecules with complicated electronic structure, even small, the convergence of all contributions for achieving the chemical accuracy still can’t be reached.
  Nevertheless, at the moment the properties of the majority of molecules with up to 4 atoms  can be computed by the FPD method with accuracy, in practice not worse than the experimental one.
  The FPD method is based on the assumption of the additivity of the contributions, which are enlisted above. In some cases, when the researcher needs a very high (spectroscopic) accuracy, the standard approach implying separate accounting for various contributions may fail to yield such accuracy. This can occur when individual contributions (for example, spin-orbit) are relatively high. In such situations several contributions should be taken into account simultaneously.
  For molecules, which are to big to be computed via the “traditional” FPD, the following approach becomes righteous: the geometry is optimized at DFT level with triple-zeta basis set, and then a single point FPD computation is performed (for example, in the Peterson’s work [9]).
  It should be noted that for the majority of small first and second row compounds, as it can be seen, the chemical accuracy can be reached at the DFT level (not systematically however). In Ref. [10] the computations of 211 molecules were performed using the PBE and PBE0 functionals and quintuple zeta-quality basis sets (aug-cc-pV5Z), and the authors obtained MAEs of lower than 1 kcal/mol and MaxAEs of 2-5 kcal/mol. See this blog [11] for more information.
  

Refs:

[1] Theor Chem Acc (2012) 131:1079

[2] Methods in Computational Chemistry;  Wilson  , S., Ed.Vol. 2; Plenum Press:  New York, 1988.

[3] Hess, B. A.; Dolg, M. Relativistic Quantum Chemistry with Pseudopotentials and  transformed Hamiltonians. Wiley Series in Theoretical Chemistry; John Wiley & Sons:  Chichester , 2002; Vol. 57.

[4] Helgaker T, Klopper W, Koch H, Noga J (1997) J Chem Phys. 106:9639

[5] Chemical accuracy in ab initio thermochemistry and spectroscopy: current strategies and  future challenges. Kirk A. Peterson • David Feller • David A. Dixon. Theor Chem Acc (2012) 131:1079 DOI 10.1007/s00214-011-1079-5

[6] David A. Dixon, David Feller, Kirk A. Peterson. Annual Reports in Computational Chemistry. Volume 8, 2012, Pages 1–28

[7] D. Feller, K.A. Peterson, and D.A. Dixon, "A survey of factors contributing to accurate theoretical predictions of atomization energies and molecular structures", J. Chem. Phys. 129, 204105 (2008).

[8] https://en.wikipedia.org/wiki/Born–Oppenheimer_approximation

[9] R. Craciun, D. Picone, R.T. Long, S. Li, D.A. Dixon, K.A. Peterson, and K.O. Christe, “Third row transition metal hexafluorides, extraordinary oxidizers and Lewis acids: Electron affinities, fluoride affinities, and heats of formation of WF6, ReF6, OsF6, IrF6, PtF6, and AuF6”, Inorg. Chem. 49, 1056 (2010).

[10] Jensen, Stig Rune; Saha, Santanu; Flores-Livas, José Abdenago; Huhn, William; Blum, Volker; Goedecker, Stefan; Frediani, Luca, 2017, "GGA-PBE and hybrid-PBE0 energies and dipole moments with MRChem, FHI-aims, NWChem and ELK", doi:10.18710/0EM0EL, UiT Open Research Data Dataverse, V3

[11] http://www.compchemhighlights.org/2017/03/the-elephant-in-room-of-density.html

[12] DeYonker et al. J. Phys. Chem. A, Vol. 111, No. 44, 2007

[13] Stephanie Grimmel, George Schoendorff, and Angela K Wilson. J. Chem. Theory Comput., 05 Feb 2016, DOI: 10.1021/acs.jctc.5b01193.  

[14] http://www.compchemhighlights.org/2016/01/assessment-of-accuracy-of-coupled.html

воскресенье, 20 ноября 2016 г.

Integration of Chemcraft with programs such as Excel, Origin, etc.



  Sometimes users ask us to implement a feature like plotting multiple spectra (from different files) in a single graph. We do not plan to implement such features, because:
1) This will require a reconstruction of the user interface, and old users will have to learn how to use Chemcraft again;
2) For building graphs of any type, powerful software packages already exist – Excel, Origin, etc. We do not want to compete with them. But Chemcraft allows one to export any graph, spectrum, diagram in text format, copying its data to Excel via Clipboard. So, multiple spectra in one picture can be easily obtained as follows: you should open several output files in a row, copy the data with spectra to Excel and then combine these spectra in Excel. Note that the broadened spectrum (Doppler, Lorentzian broadening of bands) can be exported too.
  See, for example, this graph built in Chemcraft:



  This is a Gaussian09 Born-Oppenheimer Molecular Dynamics (BOMD) computation, and the graph shows the distance between atoms C1 and H1 versus time in femtoseconds. There seems to be a problem, or a bug, with visualization of such files – Chemcraft reads the first 3 points with zero time. We found the format of the output file quite difficult. Maybe this problem will be fixed in future, but we are not even sure that this is necessary: if you need to plot a good graph “Distance vs Time”, click on the “Copy” button, insert the data into Excel and manually delete the invalid lines. Of course you should check your graph against the output file – if you work with BOMD computations, you should understand them well enough.
  We always give our users the possibility to verify the data visualization in Chemcraft – for example, when you visualize the molecular orbitals, you can click the “Check orbitals” button to check whether Chemcraft extracted all orbitals correctly. When we implement visualization of MOS from output files of new software, it is sometimes difficult to thoroughly verify whether the MOS are visualized correctly (probably free software is characterized with such problems to a greater extent). And don’t forget to check the data shown by Chemcraft using the Source mode.

вторник, 21 июня 2016 г.

DFT advices

We have gathered some information from computational chemistry forums, which can be helpful for you. It should be noted that the following text cannot be considered as a professional guide; such guides will possibly appear in the future.

1. The choice of a DFT functional

First of all, you should read this manual:


  One of the thesises in this book is that the GGA functionals are ususally more universal (let’s say, closer to “Ab initio”) than the hybrid functionals (this statement, however, has some weak points). This also means that the errors with these functionals are more systematic: for example, the PBE functional usually overestimates the bond lengths and underestimates the vibrational frequencies. In our opinion, if you choose between, e.g., the PBE and B3LYP functionals, you should note that the latter should be more accurate for most organic molecules, but it should be less accurate in some problematic cases; so, the PBE functional is more reliable. Because of that, at one forum we found the following advice (written in 2010): always use the PBE functional and don’t worry. It is CGA, so it must be more universal than, e.g., the B3LYP functional.
  It is written in this manual that the B3LYP functional has shown good results for organic molecules, but it is worse for transition metal compounds and for large molecules. TPSSh probably is a good functional for transition metal compounds (according to this manual).
  The B3LYP functional is commonly used in chemistry, while the PBE, PBE0 functionals are commonly used in applications to extended systems (materials) [13].


Here is another compilation on the subject:



Here is a screenshot from this paper:


It is not clear from this list, whether the dispersion correction should be always used. However, at other forums we found the advices to use the dispersion correction always if possible. In Ref. [1] you can see that the ωB97X-D is the best single-component functional, while PBE0-D3 perform almost as well. Besides that, on the CCL list one can read that B3LYP-D3 is usually better than B3LYP.
Dispersion correction is the interaction of induced dipoles. This correction becomes important if two parallel benzene rings interact (stacking). So, the dispersion correction is important for computing such molecules as tetraphenylporphyrin, bilirubin, etc.
Here is a list of favorable and non-favorable DFT functionals from the DFT 2015 poll for computing particular properties:



  With all due respect to the creators of the above list, we must mention that we tried to compute the properties of bilirubin molecule (having intermolecular H-bonds) using the PBE, B3LYP and wB97XD functionals, and we found that the PBE functional is the worst at describing intermolecular H bonds (the PMR spectra computed using the PBE/6-311G(D,P) method are in poorer agreement with the experimental ones than the PMR spectra computed using the B3LYP/6-311G(D,P) or wB97XD/6-311G(D,P) methods). So, we found that the PBE functional is not good at describing H-bonds, in contrast to the conclusions drawn above. So, we think that you should not fully trust these tables.
Another post from CCL states the following:

 - Recommended GGA methods: revPBE-D3, B97-D3

 - Recommended meta-GGA methods: oTPSS-D3, TPSS-D3

 - Hybrid functionals: PW6B95-D3, M062X-D3

 - Double-hybrids are the most accurate DFT methods on the market: DSD-BLYP-D3, DSD-PBEP86-D3, PWPB95-D3
  In Ref. [2], a thorough energy benchmark study of various density functionals (DFs) was carried out. The authors write:
“In summary, we recommend on the GGA level the B97-D3 and revPBE-D3 functionals. The best meta-GGA is oTPSS-D3 although meta-GGAs represent in general no clear improvement compared to numerically simpler GGAs. Notably, the widely used B3LYP functional performs worse than the average of all tested hybrids and is also very sensitive to the application of dispersion corrections.”
“The ωB97X-D functional seems to be a promising method. The most robust hybrid is Zhao and Truhlar's PW6B95 functional in combination with DFT-D3”.
“If higher accuracy is required, double-hybrids should be applied. The corresponding DSD-BLYP-D3 and PWPB95-D3 variants are the most accurate and robust functionals of the entire study.”

The tests in this paper were performed on GMTKN30 set – this set covers mainly molecules containing main group elements, mostly organic (link).

  So, the double-hybrids seem to be the best DFT methods at the moment. This is illustrated by the following chart from the aforementioned paper:

Another advantage of PBE is that this functional is “cheap”.
Note that the PBE and PBE0 methods are quite different: PBE is a CGA, while PBE0 is a hybrid method. However, if one compares e.g.  BP86, BLYP, BPW91 functionals (GGA) with PBE0, he finds that PBE0 is "less semi-empirical".
  
  Here is another comparison of DFT functionals. In Ref. [3], a few DFT functionals were benchmarked for 14 compounds (calculation of vertical excitation energies by TDDFT and their comparison to experiment). Here are two pictures from this paper:



Ref. [4] reports that the CAM-B3LYP and BHandH functionals yield the best agreement between computed and experimental vertical absorption energies for a set of some simple organic molecules (involving first and second row atoms).

  We have performed some benchmark NMR computations with different functionals. The 1H NMR spectra of 26 simple organic molecules (not containing internal hydrogen bonds) were computed at PCM wB97XD/6-31G(D,P), PCM B3LYP/6-31G(D,P), PCM B3LYP-D3/6-31G(D,P), PCM B3LYP-D3/aug-cc-pVTZ and some other levels, and the following conclusions were made:

1) The methods PCM wB97XD/6-31G(D,P) and PCM B3LYP-D3/6-31G(D,P) yield very similar standard deviations (SD) from the experiment of 0.1411 ppm and 0.13005 ppm, respectively; note that the signals of the protons not attached to carbons do not fit into common correlation). This, however, does not mean that these two functionals produce similar results (correlation of the values computed by them has an SD of 0.06849 ppm);
2) Switching from PCM B3LYP-D3/6-31G(D,P) to PCM B3LYP-D3/aug-cc-pVTZ does not improve the agreement with the experimental data: the SD is 0.13005 for the former and 0.13539 ppm for the latter. This is even rather strange for us, why the enlargement of the basis set does not lead to the improvement of the agreement with experiment; maybe, the main source of disagreement is the experimental error or some fundamental problems of NMR computation algorithms.
Note that we have performed some benchmark IR spectra computations (mentioned in a previous post in this blog), and we found that switching from wB97XD/6-31G(D,P) to wB97XD/aug-cc-pVTZ method improves the agreement with the experiment approximately by a factor of 1.2;
3) It is of no real importance, whether to perform a full geometry optimization at PCM B3LYP-D3/aug-cc-pVTZ level of theory, or just perform the geometry optimization at PCM B3LYP-D3/6-31G(D,P) level and then do a single point with aug-cc-pVTZ basis set. The NMR shift values obtained by these two approaches correlate with SD=0.01442 ppm;
4) Taking into account the solvation effects with PCM model improves the agreement with the experiment with an almost negligible increase in computational costs;
5) Switching from PCM B3LYP/6-31G(D,P) to PCM B3LYP-D3/6-31G(D,P) improves the agreement with the experiment by 0.4% only (SD changes from 0,13056 to 0,13005). This is because only such molecules as phenol, biphenyl, anthracene, hexane, etc. were computed. If we had computed such molecule as benzene dimer, the dispersion correction would become important. Besides that, for our molecules the energies differ rather significantly with the B3LYP and B3LYP-D3 methods.
6) Two NMR computation schemes – GIAO and CSGT – produce almost identical results (SD between them is 0.016 ppm).

  The combination of these advices can confuse an inexperienced user. As for us, we decided that we should use PBE-D3 for inorganic molecules and ωB97X-D or B3LYP-D3 for organic ones, since we deal with the Gaussian09A package. Such an advice should be useful only for “amateurs” who are unable to gather more information.
  Anyway, it is better to use several functionals to ensure that they produce similar results. MP2 should not also be forgotten (SCS-MP2 seems to be better than conventional MP2, as written in the paper above; as far as we know, SOS-MP2 is better too).
  Recently, the B3LYP/6-31G(D,P) method has been quite popular. We think that using this method for computing organic molecules (not containing d and f elements) is still rather adequate, but the snobs can interpret the use of this method as the sign of amateurishness (at least, if you don’t employ different functionals and/or basis sets in the same study). See, for example, this and this posts on the CCL list.

  The flaws of this famous B3LYP/6-31G* model chemistry are discussed in Ref. [5]:
  The authors write that the relatively good performance of B3LYP/6-31G*, which made it so popular, is caused by a hidden error cancellation. The B3LYP-gCP-D3/6-31G* method, according to the authors, is much better (it removes the two major deficiencies: missing London dispersion effects and basis set superposition error). The B3LYP-D3/6-31G* method is slightly worse as it does not provide a BSSE elimination. This picture illustrates the aforesaid:


  As far as we know, the density fitting / RI (Resolution of the Identity) approximation is usually a good thing, as it speeds up your calculations without significant loss of accuracy (it least, this is written in Orca manual). However, in some cases it can lead to bad SCF convergence or give the error of 1-2 kcal/mol in energies.
  Here is a picture from Ref. [13] illustrating the availability of DFT functionals:


2. The choice of basis set

  As far as we know, at the moment the optimal basis sets for high-accuracy computations are Dunning family sets: cc-pVnZ, aug-cc-pVnZ, cc-pCVnZ, cc-pwCVnZ (n=2,3.4,5, etc). These basis sets are correlation consistent; this means, that they were optimized using correlated methods, unlike the 6-31G** basis sets. In Ref. [6] the following is stated:
  "One of the primary reasons for the cc basis set family’s lasting
popularity is due to a series of empirical observations that as
the cardinal number (n in cc-pVnZ) of the basis set is increased,
energies and various properties converge smoothly toward the
complete basis set (CBS) limit."
  The so-called complete basis set (CBS) limit means that you first compute with cc-pVDZ, then cc-pVTZ, then cc-pVQZ, then cc-pV5Z, etc., and the energy should converge to a hypothetical “complete” basis set limit. At the same time, there are more than 10 extrapolation schemes which give nearly the same result after performing only 2-3 computations (however, these extrapolation schemes are empirical to some extent).
  For heavy elements (Z>29), relativistic effects are strong and must be taken into account either using the methods like ZORA, DKH, or using effective core potentials (ECPs, PPs). The main relativistic effects include relativistic contraction and spin-orbit interaction. For many tasks, even such elements as Fe, Co, Ni do not require including relativistic effects in the computation (you will have a lot of problems besides relativism with these atoms).
  The Ref. [6] provides an overview of the development of Gaussian basis sets for molecular calculations, with a focus on four popular families of modern bases ("Gaussian basis set" means any basis set with Gaussian (not Slater) functions, not a specific set for the GAUSSIAN program). The authors write about the cases when using ECPs is not advisable (in particular, electron paramagnetic resonance), and it is written that using the DFT-based ZORA or DKH models with segmented all-electron relativistic contracted (SARC) basis sets produce good agreement with experiment and higher level ab initio computations.
  One interesting point is mentioned in Ref. [7]: the authors report that the computations with 6-311++G** basis set gave better molecular geometries than the more costly aug-cc-pVDZ (the methods used were MP2 and CCSD). In addition, the smaller 6-311++G** invariably leads to lower calculated total energies than aug-cc-pVDZ. So, it seems that the aug-cc-pVDZ can be worse than the 6-311++G** set (nevertheless, we suppose that if you need an expensive basis set or CBS (complete basis set) extrapolation, you should use cc-pVTZ, cc-pVQZ, cc-pV5Z, etc).

  Some people say that it is not actual to use basis sets larger than cc-pVTZ with DFT. However, in Ref. [14] the authors performed energy computations of 211 small first and second row compounds (mostly organic), and they concluded that the 5Z basis set (aug-cc-pV5Z) is required to get the MAE of atomization energies below 1 kcal/mol. See this blog for more information.
 
  The same is written at this handbook “Practical Advice for Quantum Chemistry Computations”:


  For some small organic molecules, we have found that the basis sets 6-31++G(D,P) and AUG-cc-pVDZ give almost identical results (protonation energies of 16 amide-containing molecules computed with wB97XD/6-31++G(D,P) and wB97XD/AUG-cc-pVDZ methods correlate with R= 0,99966; this difference is almost negligible for our applied tasks). In contrast to the results reported in the aforementioned paper, the total energies computed with wB97XD/AUG-cc-pVDZ method are 3-30 kJ/mol lower than the energies computed with wB97XD/6-31++G(D,P).
  At the same time, with the basis set AUG- cc-pVDZ the computation time was 3-6 times higher than with the 6-31++G(D,P) basis set. So, the 6-31++G** basis set should be still considered good enough.
 
  It is usually considered that the computation of anions or significantly electronegative atoms (which show big negative Mulliken charge) requires the use of diffuse functions (“++” for 6-31G or “aug” for cc-pVnZ). However, in the paper [8] this conclusion is criticized to a significant extent. The authors write:
“We conclude that the use of diffuse functions for calculating geometrical parameters for PAH anions in general is unnecessary and does not improve the calculated results significantly. Energy calculations are affected in much the same way.”.
  As the authors write, the only case when the diffuse functions are important are the computations of absolute values of chemical shifts; however, in most cases, when the experimental data are available, it is no necessary to obtain their absolute values as the correlations between the computed and experimental values can be built instead.
  On the other hand, D. Truhlar who investigated the use of diffuse functions writes here:
"How should one add diffuse functions to the basis set? Diffuse functions are known to be critical in describing the electron distribution of anions (as discussed in my book), but they are also quite important in describing weak interactions, like hydrogen bonds, and can be critical in evaluating activation barriers and other properties."
  The Truhlar group recommends using the "jun-" basis sets (see below).
  One more source of information is the review "Basis sets in quantum chemistry" by C. David. Sherill. The author writes in this review about the diffuse functions:

  Our knowledge of the subject and our personal experience says that the diffuse functions indeed should be used when calculating anions. We have computed the energies of deprotonation of 12 carbon acids (with PCM solvation model), both with diffuse functions and without them (wB97XD/6-31++G(D,P) method and the wB97XD/6-31G(D,P) method), and the values calculated by the first method correlate much better with experimental PKa values than the values computed without diffuse function (the correlation coefficients R are correspondingly 0,99522 for wB97XD/6-31++G(D,P) and 0,98884 for wB97XD/6-31G(D,P)).
  
Some recommendations concerning the choice of basis sets can be found on Orca input library:. These recommendations are:
  
 - Rule of thumb: Energies and geometries are usually fairly converged at the DFT level when using a balanced polarized triple-zeta basis set (such as def2-TZVP) while MP2 and other post-HF methods converge slower w.r.t. the basis set. Ab initio methods are much more basis set sensitive than DFT methods

 - Stick with one family of basis sets that is available for all the elements of your system. Mixing and matching basis sets from different families can lead to problems.

 - Calculations on heavy elements can either be performed using an all-electron approach or effective core potentials (ECPs).

Here is a picture from the Orca input library:

So, it seems that diffuse functions are really important for computing electron affinities.

  As far as we know, usually it is not needed to use a larger basis set than cc-pVTZ with DFT: further increasing basis set size will not improve the accuracy of the computation.  In contrast, this is not true for ab initio computations, which will benefit from using larger basis sets, such as cc-pVQZ, cc-pV5Z, etc.
  Some papers, in which the results of DFT computations are compared to those of ab initio methods and to the experimental data, conclude that DFT performs not worse (or even slightly better) [10, 11, 12]. This is caused by employing modest basis sets (not larger than cc-pVTZ) in these papers.
  So, the choice between DFT or ab initio methods depends on which properties are calculated and what accuracy is required.
  The larger the basis set, the more difficult the SCF convergence is (especially if diffuse-augmented basis sets are used). We recommend to always specify SCF=XQC in GAUSSIAN input files. With this keyword, the scf is firstly converged using the default DIIS algorithm, and if the convergence is not achieved, Gaussian switches to more reliable and costly quadratically convergent SCF procedure.
  Ref. [9] describes the role of diffuse functions in computations. It is known, that for many tasks using the diffuse functions will not lead to significant increase of computational accuracy, but will increase the cost of the calculation; besides that, using the diffuse functions can lead to SCF convergence problems and can increase the basis set superposition error (BSSE). The authors write: “We conclude that much current practice includes more diffuse functions than are needed. Often, better accuracy could be achieved if the additional cost were invested in higher-ζ basis set or more polarization functions.”
  The popular basis set family cc-pVnZ (of Dunning and co-workers) comprises the diffuse functions, if  “aug-” prefix is used. The authors notice that chemists usually utilize “fully augmented” basis sets, and this may not be optimal for large molecules. For example, the cc-pVTZ basis set for methane has s, p, d, and f functions on C and s, p, and d functions on H; aug-cc-pVTZ contains diffuse s, p, d, and f functions on C and diffuse s, p, and d functions on H atoms.
  In contrast, the earlier “plus” basis sets originally systematized by Pople and co-workers contained only diffuse s and p functions on non-hydrogen atoms and no diffuse functions on hydrogen atoms. In Ref. [9] this is called “minimal augmentation”. The maug-cc-pVTZ basis set retains the diffuse s and p functions on carbon with the exponential parameters optimized for the aug case but deletes all other diffuse functions.
  So, the authors (Truhlar et al.) conclude that using the minimal augmentation is usually more optimal than using the full augmentation (particularly with DFT). The authors recommend the so-called “calendar” basis sets, in particular the “jun” level of augmentation – for example, the jun-cc-pVTZ set is recommended in comparison to aug-cc-pVDZ or cc-pVTZ. When increasing the zeta number in Dunning basis sets (i.e. switching from cc-pVDZ to cc-pVTZ, then to cc-pVQZ, etc), augmentation becomes less important, and using the “calendar” basis sets provides a more efficient sequence of basis sets (than unaugmented, minimally augmented, or fully augmented sets) for basis set extrapolation to the complete basis set limit. We know, however, that many researchers have criticized the approach proposed by the authors.


  
3. DFT quackery

  Anyway, density functional theory is a “black box”. Look at this picture from Ref. [13]:


  Our comment on this picture:
  First and second points: In contrast to ab initio methods, DFT is not hierarchical. Ab initio (non-empirical) methods are hierarchical: this means that if we increase basis set size, level of taking into account the electronic correlation (excitation rank), and possibly the level of taking into account the relativistic effects (for heavy elements), we approach the exact solution (within the Born–Oppenheimer approximation). More specifically, if we go, e.g., through CCSD/cc-pVDZ ->  CCSDT/cc-pVDZ ->  CCSDTQ/cc-pVDZ -> CCSDTQ5/cc-pVDZ, etc., the results of the computation systematically approach some limit; if we go through CCSD/cc-pVDZ ->  CCSD/cc-pVTZ -> CCSD/cc-pVQZ -> CCSD/cc-pV5Z -> CCSD/cc-pV6Z, etc., the results systematically approach the complete basis set (CBS) limit. For the first row, the improvement can be non-monotonic, while for the second case the improvement seems to be always monotonic.
  So, we can verify the accuracy of an ab initio method by comparing its results with the results of a higher level computation. For DFT, this possibility is much less available.
  The points mentioned below are mostly our private opinion, maybe not fully right.
  As far as we know, DFT is often used to “confirm” an experiment. This means that if the experiment and a DFT computation lead to similar conclusions, this increases the reliability of the investigation. On the contrary, if the experiment and the DFT calculation give different results, this can be either a discovery or a failure (inaccuracy of the computation, or maybe the experiment).
  Speaking of “confirming” an experiment, it should be noted that this approach is only good with an independent experiment. We know some cases when the experiment was “adjusted” for better agreement with the computation (both at DFT and ab initio levels).
  As mentioned above, it is a good practice to perform the computation with several different DFT functionals, to ensure that they all give the same results. And as far as we know, some researchers, being not honest enough, meaningly avoid using more than one functional, because if different functionals give contradictory results in their work, this makes this whole work less “publishable”. 

  Here you can read an ironical essay “Obituary : Density Functional Theory. 1927-1993”:

  The author claims that the density functional theory in current implementation is not a mathematically correct approach:

  “The Hohenberg-Kohn argument is what mathematicians call an existence proof, as opposed to a constructive proof. That is, although we now know that, in theory, DFT can extract as much information from r(r) as her brother can from  Y  ( r 1,  r 2, ... ,  r n) , no-one knew how to dress her so that she could achieve this in practice. All quantum mechanical theories are created equal, but some are more equal than others.”

  The hybrid functionals, which appeared in 1993, are even more unreliable and not correct from the theoretical point of view; in other words, using such functionals may be a kind of “shamanism”, or maybe even “scientific charlatanism”. The author thinks that the density functional theory finally died (we should add, it died as a well-grounded scientific theory) in 1993, after the spreading of hybrid functionals.

  On the other hand, in Ref. [13] the author states the following:


"I believe that a fundamental principle underlies the success 
of DFT, which is that local approximations are a peculiar type of semiclassical approximation to the many-electron problem. For the last 6 years, with both my group and many collaborators, I have been trying to uncover this connection, and make use of it. The underlying math is very challenging, and some must be invented."

  The "DFT shamanism" can exist in the following form: if different functionals are applied to the same object, the user may select any results consistent with experimental data (even the latter are invalid or erroneous) and explain them. We suggest calling such practive "DFT quackery".

  In Ref. [13] the following is proposed: "Users should stick to the standard functionals (as most do, according to Fig. 1), or explain very carefully why not."


4. DFT future

  Here is a picture from Ref. [13]:
  A fragment of the paper [13]:


"XII. THE FUTURE

  So, where does this leave us? It is clearly both the best and worst of times for DFT. More calculations, both good and bad, are being performed than ever. One of the most frequently asked questions of developers of traditional approaches to electronic structure is: “When will DFT go away?.” Judging from Fig. 1, the answer is clearly no time soon. Although based on exact theorems, as shown in Fig. 2, these theorems give no simple prescription for constructing approximations. This leads to the many frustrations of the now manifold users listed in Table I.Without such guidance, the swarm of available approximations of Fig. 3 will continue to evolve and reproduce, perhaps ultimately undermining the entire field. But I expect that some of the many excellent ideas being developed by the DFT community will come to fruition, i.e., produce new and more general standard approximations, well before that happens."

References



[3] S.S.Leang, F.Zahariev, M.S.Gordon, J.Chem.Phys.,  136, 104101 (2012)

[4] G.Garcı´a, C.Adamo, I.Ciofini, Phys. Chem. Chem. Phys., 2013, 15, 20210--20219




[8] Calculations of PAH anions: When are diffuse functions necessary? Noach Treitel1, Roy Shenhar, Ivan Aprahamian, Tuvia Sheradsky and Mordecai Rabinovitz. P h y s . C h e m . C h e m . P h y s . , 2 0 0 4 , 6 , 1 1 1 3 – 1 1 2 1


[10] Do Practical Standard Coupled Cluster Calculations
Agree Better than Kohn–Sham Calculations with
Currently Available Functionals When Compared
to the Best Available Experimental Da...
Article in Journal of Chemical Theory and Computation · May 2015
Impact Factor: 5.5 · DOI: 10.1021/acs.jctc.5b00081


[11] On the dissociation energy of Ti(OH,)+.

An MCSCF, CCSD(T), and DFT study
A. Irigoras, J.M. Ugalde, X. Lopez, and C. Sarasola
Can. J. Chem. 74: 1824-1829 (1996). Printed in Canada / Imprimt au Canada




[13] J. Chem. Phys. 136, 150901 (2012). Perspective on density functional theory. Kieron Burke.

[14]
Jensen, Stig Rune; Saha, Santanu; Flores-Livas, José Abdenago; Huhn, William; Blum, Volker; Goedecker, Stefan; Frediani, Luca, 2017, "GGA-PBE and hybrid-PBE0 energies and dipole moments with MRChem, FHI-aims, NWChem and ELK", doi:10.18710/0EM0EL, UiT Open Research Data Dataverse, V3


пятница, 1 апреля 2016 г.

Some useful links for people dealing with quantum chemistry

At this page we are collecting links we find useful in our own research work.

1) Gaussian Error Messages


Here 26 types of Gaussian error messages are explained (e.g., memory allocation problems, change in point group or standard orientation, convergence failure, etc.).

2) Practical Advice for Quantum Chemistry Computations, by C. David Sherrill.


Here many simple hints are provided:
 - The choice of basis set;
 - The choice of ab initio/DFT methods for different tasks.
The author writes that standard HF, semiempirical, and DFT techniques are not appropriate for van der Waals complexes or systems dominated by London dispersion forces. However, Ref. reported that for the compounds under investigation the B3LYP-gCP-D3/6-31G* method could correctly compute London dispersion effects (and it does not have the basis set superposition error);
 - Hints on the problem of SCF convergence;
 - Common hints on imaginary frequencies and saddle points;

 - Hints on optimization with symmetry.