Latest Event Updates

Yet another Runge Kutta method in Haskell

Posted on Updated on

Although I still have interest in Python, when I met the language ‘Haskell’ recently, I instantly fell in love with it. With the elegance in grammer the language have, I can focus on the essence of the problem to be solved by the program. You don’t have to be bothered by memory control, nested loops, multi-thread synchronization and that kind of stuff that has nothing to do with the problem. Furthermore, its execution time is attractive to me since I am more used to C++ languages. Mapping operations in Haskell also reminds me of STL in C++.

Monad and functor are still difficult to me. So for a coding practice, I chose Runge-Kutta method which I learned when I was a student. The problem was taken from here. A sample code was shown on this site, but it was hard for me to understand, so I rewrote it in more straightforwardly.

import Data.Ratio
import Text.Printf

start_time = 0 :: Ratio Integer
end_time = 10 :: Ratio Integer
initial_value = 1.0 :: Double
initial_err = 0.0 :: Double

dt = 1 % 10 :: Ratio Integer
steps = truncate $ (end_time - start_time)/dt + 1 :: Int

rk4 :: Floating a => (a -> a -> a) -> a -> a -> a -> a
rk4 f dt t y = y + (dy1 + 2*dy2 + 2*dy3 + dy4)/6
    where
        dy1 = dt * f t y
        dy2 = dt * f (t + dt/2) (y + dy1/2)
        dy3 = dt * f (t + dt/2) (y + dy2/2)
        dy4 = dt * f (t + dt) (y + dy3)

src :: Floating a => a -> a -> a
src t y = t * sqrt y

propagator :: Floating y => (Ratio Integer, y, y) -> (Ratio Integer, y, y)
propagator (t,y,_) = (t',y',err)
    where
        t'= t + dt
        y'= rk4 src (fromRational dt) (fromRational t) y
        err = y' - (1/16) * ((fromRational t')^2 + 4)^2

main :: IO ()
main = mapM_ (\(t,v,e) -> printf "(%d, %.15f, %.15e)\n" (numerator t) v e) result
    where
        result = filter (\(t,_,_) -> ((1==).denominator) t) $ take steps $ iterate propagator (start_time, initial_value, initial_err)

(0, 1.000000000000000, 0.000000000000000e0)
(1, 1.562499854278108, -1.457218921085968e-7)
(2, 3.999999080520799, -9.194792007782837e-7)
(3, 10.562497090437551, -2.909562448749625e-6)
(4, 24.999993765090636, -6.234909363911356e-6)
(5, 52.562489180302585, -1.081969741534294e-5)
(6, 99.999983405403580, -1.659459641700778e-5)
(7, 175.562476482271250, -2.351772874931157e-5)
(8, 288.999968434798600, -3.156520142510999e-5)
(9, 451.562459276839660, -4.072316033898460e-5)
(10, 675.999949016709700, -5.098329029351589e-5)

I used Rational data type for time variables since adding a time interval ‘dt’ repeatedly makes a tiny error at the final time, like  t = 9.9999999 instead of t=10. Because of this modification, the error value diminished a little bit compared to the sample code.

Changing my career

Posted on

I have changed my career as an environmental consultant for some reason.

But I’m still continuing pharmaceutical research in a faculty of medicine in a Japanese university as a part-time assistant,…

which means I’m too busy to write articles.

But I’ll try to write something if I find something interesting.

 

Listing AA sequence entries for a specific strain

Posted on Updated on

If you study bacteria, you should sometimes need amino acid sequences for your strain including mutants. Following codes are what I wrote for that purpose.

Let’s say we want to check UniProt for S. aureus ParC.
We, firstly, need the taxonomy ID for S. aureus. Since UniProt uses the same taxonomy classes as in NCBI, here we access the site with SOAP.

from suds.client import Client

wsdl_file = 'http://www.ncbi.nlm.nih.gov/soap/v2.0/eutils.wsdl'
client = Client( wsdl_file )

print client

Suds ( https://fedorahosted.org/suds/ )  version: 0.4.1 (beta)  build: R703-20101015

Service ( eUtilsService ) tns="http://www.ncbi.nlm.nih.gov/soap/eutils/"
   Prefixes (6)
      ns0 = "http://www.ncbi.nlm.nih.gov/soap/eutils/egquery"
      ns1 = "http://www.ncbi.nlm.nih.gov/soap/eutils/einfo"
      ns2 = "http://www.ncbi.nlm.nih.gov/soap/eutils/elink"
      ns3 = "http://www.ncbi.nlm.nih.gov/soap/eutils/epost"
      ns4 = "http://www.ncbi.nlm.nih.gov/soap/eutils/esearch"
      ns5 = "http://www.ncbi.nlm.nih.gov/soap/eutils/esummary"
   Ports (1):
      (eUtilsServiceSoap)
         Methods (7):
            run_eGquery(xs:string term, xs:string tool, xs:string email, )
            run_eInfo(xs:string db, xs:string tool, xs:string email, )
            run_eLink(xs:string db, xs:string[] id, xs:string reldate, xs:string mindate, xs:string maxdate, xs:string datetype, xs:string term, xs:string dbfrom, xs:string linkname, xs:string WebEnv, xs:string query_key, xs:string cmd, xs:string tool, xs:string email, )
            run_ePost(xs:string db, xs:string id, xs:string WebEnv, xs:string tool, xs:string email, )
            run_eSearch(xs:string db, xs:string term, xs:string WebEnv, xs:string QueryKey, xs:string usehistory, xs:string tool, xs:string email, xs:string field, xs:string reldate, xs:string mindate, xs:string maxdate, xs:string datetype, xs:string RetStart, xs:string RetMax, xs:string rettype, xs:string sort, )
            run_eSpell(xs:string db, xs:string term, xs:string tool, xs:string email, )
            run_eSummary(xs:string db, xs:string id, xs:string WebEnv, xs:string query_key, xs:string retstart, xs:string retmax, xs:string tool, xs:string email, )
         Types (34):
            ns1:DbInfoType
            ns1:DbListType
            ns5:DocSumType
            ns4:ErrorListType
            ns1:FieldListType
            ns1:FieldType
            ns2:FirstCharsType
            ns2:IconUrlType
            ns2:IdCheckListType
            ns2:IdLinkSetType
            ns2:IdListType
            ns4:IdListType
            ns2:IdType
            ns2:IdUrlListType
            ns2:IdUrlSetType
            ns3:InvalidIdListType
            ns5:ItemType
            ns2:LinkInfoType
            ns1:LinkListType
            ns2:LinkSetDbHistoryType
            ns2:LinkSetDbType
            ns2:LinkSetType
            ns1:LinkType
            ns2:LinkType
            ns2:ObjUrlType
            ns2:ProviderType
            ns0:ResultItemType
            ns4:TermSetType
            ns4:TranslationSetType
            ns4:TranslationStackType
            ns4:TranslationType
            ns2:UrlType
            ns4:WarningListType
            ns0:eGQueryResultType

From the output above we are able to know that the service provides the method, run_eSearch(). The meanings of its arguments are listed in http://www.ncbi.nlm.nih.gov/books/NBK43082/.

Specifying the taxonomy DB from NCBI, you can easily get the ID of S.aureus.

reply = client.service.run_eSearch(db='taxonomy', term='Staphylococcus aureus[all names]')

print reply
if (reply.Count>0):
    taxoID = reply.IdList[0][0]
(reply){
   Count = "1"
   RetMax = "1"
   RetStart = "0"
   IdList = 
      (IdListType){
         Id[] = 
            "1280",
      }
   TranslationSet = ""
   TranslationStack = 
      (TranslationStackType){
         TermSet = 
            (TermSetType){
               Term = "Staphylococcus aureus[all names]"
               Field = "all names"
               Count = "1"
               Explode = "N"
            }
         OP = "GROUP"
      }
   QueryTranslation = "Staphylococcus aureus[all names]"
 }

Next, we post a query to UniProt as we did in here.

import urllib

args = {
        'query':'gene:ParC taxonomy:%s active:yes reviewed:yes' % taxoID,
        'format':'xml',
        'offset':0
        }
encoded_args = urllib.urlencode( args )

url = "http://www.uniprot.org/uniprot/"

xml_data = urllib.urlopen( url, data = encoded_args )

The returned XML will be analyzed accordingly.

from xml.dom import minidom

dom = minidom.parse( xml_data )
entries = dom.getElementsByTagName("entry")

for entry in entries:
    ent_name = entry.getElementsByTagName("name")
    print ent_name[0].childNodes[0].data
    organism = entry.getElementsByTagName("organism")
    sci_name = organism[0].getElementsByTagName("name")
    print sci_name[0].childNodes[0].data
PARC_STAA8
Staphylococcus aureus (strain NCTC 8325)
PARC_STAAC
Staphylococcus aureus (strain COL)
PARC_STAAM
Staphylococcus aureus (strain Mu50 / ATCC 700699)
PARC_STAAN
Staphylococcus aureus (strain N315)
PARC_STAAR
Staphylococcus aureus (strain MRSA252)
PARC_STAAS
Staphylococcus aureus (strain MSSA476)
PARC_STAAU
Staphylococcus aureus
PARC_STAAW
Staphylococcus aureus (strain MW2)

We’ve got eight entries.

Reading Szabo’s MO program (part 5)

Posted on

We are finally going into the last subroutine SCF(). Actually, the subroutines we have seen so far were just “preparation routines” for SCF().

A molecular orbit (MO) \vert \psi_a \rangle can be decomposed into atomic orbits (AOs) \vert \phi_\mu \rangle as

\begin{array}{rcl}  \displaystyle \vert \psi_a \rangle &=& \displaystyle \sum_{\mu} \vert \phi_\mu \rangle \langle \phi_\mu \vert \psi_a \rangle \\  C_{\mu a} &\equiv& \langle \phi_\mu \vert \psi_a \rangle  \end{array}.

Our ultimate objective is to determine the coefficient matrix C_{\mu a}. Once the MOs \psi_a are fixed, every phisical observable is expressed by \phi_\mu s which already have been given as linear combinations of Gaussian-type functions.

The coefficients C_{\mu a} are determined self-consistently so that the nonlinear Roothaan equation (3.158) is satisfied:

F(C)C = SC\varepsilon

In the subroutine the iteration loop begins at

C START OF ITERATION LOOP
   20 ITER=ITER+1

and ends at

C IF MAXIMUM NUMBER NOT YET REACHED THEN
C GO BACK FOR ANOTHER ITERATION
      IF (ITER.LT.MAXIT) GO TO 20

In the loop, the Fock matrix is firstly evaluated as

C ADD CORE HAMILTONIAN TO GET FOCK MATRIX
      DO 60 I=1,2
      DO 60 J=1,2
      F(I,J)=H(I,J)+G(I,J)
   60 CONTINUE

which corresponds to (3.154):

\begin{array}{rcl}  F_{\mu\nu} &=& \displaystyle H_{\mu\nu}^{core} + \sum_{\lambda\sigma}[P_{\lambda\sigma}(\mu\nu\vert\sigma\lambda)-\frac{1}{2}(\mu\lambda\vert\sigma\nu)]\\  &=& H_{\mu\nu}^{core} + G_{\mu\nu}.  \end{array}

In this equation H_{\mu\nu}^{core} and (\mu\nu\vert\sigma\lambda) are already evaluated in (Part 4) as H(I,J) and TT(I,J,K,L) each. The only unknown variable, therefore, is the density matrix, P_{\mu\nu}, which is defined in (3.145) as

\displaystyle  P_{\mu\nu} = 2 \sum_{a}^{N/2} C_{\mu a} C_{\nu_a}^{\ast}.

Because the value of C_{\mu a} changes in the iteration, G_{\mu\nu} , therefore F_{\mu\nu} must be evaluated in each step.

One may consider the system where there is no electron-electron interaction would be suitable state to begin our SCF calculation. In that case, we set all components of the matrix P_{\mu\nu} to zero before getting into the loop.

Given the P_{\mu\nu}, the system’s electron energy (EN) is evaluated as in (3.184):
\displaystyle  E_{0}=\frac{1}{2}\sum_{\mu}\sum_{\nu} P_{\nu\mu}(H_{\mu\nu}^{core} + F_{\mu\nu})

C CALCULATE ELECTRON ENRGY
      EN=0.0D0
      DO 70 I=1,2
      DO 70 J=1,2
      EN=EN+0.5D0*P(I,J)*(H(I,J)+F(I,J))
   70 CONTINUE

After that, just follow the procedure in section 3.4.6. Using X matrix prepared in (Part 4), we can obtain the primed Fock matrix F'=X^{\dagger}FX.

The transformed Roothaan equation (3.178),
\displaystyle  F'C' = C'\varepsilon
means that if you diagonalize the matrix F' with the matrix C, you will get the energy matrix \varepsilon with F'‘s eigenvalues as its diagonal elements, \varepsilon = diag(\varepsilon_1, \varepsilon_2, ...).

Since F' is just a 2×2 matrix, its diagonalization is performed analytically, that is,
\begin{array}{rcl}   \displaystyle   C' &=&   \begin{pmatrix}    \cos \theta & \sin \theta \\    \sin \theta & -\cos \theta    \end{pmatrix}   \\   \displaystyle   \varepsilon &=&    \begin{pmatrix}    F'_{11} \cos^2 \theta + F'_{22} \sin^2 \theta + F'_{12} \sin 2\theta & 0 \\    0 & F'_{22} \cos^2 \theta + F'_{11} \sin^2 \theta - F'_{12} \sin 2\theta   \end{pmatrix}   \\   \displaystyle   \theta &=& \frac{1}{2} \displaystyle\tan^{-1} \left(\frac{2F'_{12}}{F'_{11}-F'_{22}} \right)  \end{array}
Those equations are implemented in the subroutine DIAG(). From (3.174) the original coefficients C_{\mu\nu} are recovered as C=XC' .

The convergence of the calculation is judged using \delta P defined in page 149:

\displaystyle   \delta P = \frac{1}{2} \sqrt{\sum_{\mu\nu}(P_{\mu\nu}^{\mbox{\small (new)}} - P_{\mu\nu}^{\mbox{\small (previous)}})}.

This parameter is evaluated in every iteration step and checked whether the value is less than the criterion constant (CRIT).

If the calculation is successfully converged, the system’s total energy is obtained by adding the nuclear-nuclear repulsion energy as in (3.185):

\displaystyle  E_{tot} = E_{0} + \frac{Z_A Z_B}{R_{AB}}.

Well, this is the end of the series. Actually the most difficult part was the evaluation of the molecular integrals. Once those values are obtained, the SCF calculation can be executed more systematically than I thought.

Although this calculation was one of the simplest ones, it was helpful enough to speculate what’s going on inside of the more sophisticated MO programs.

Another beautiful example of π–π stacking

Posted on Updated on

As a researcher in structural biology, I’m always attracted to the mysterious properties of π-orbitals. So the Chymase’s new cocrystal structure from Matsumoto, et al.(pmid 24121339) fascinated me a lot.

π–π stacking between  His57 and a benzimidazole ring
π–π stacking between His57 and a benzimidazole ring (pdbid: 4kp0, drawn by PyMol)

From their FMO calculations, they discuss the possibility of charge transfer type interaction between the LUMO on the imidazole ring of His57 and the HOMO on the benzimidazole ring of ligand leaving their energy gap, 3.764 eV. This type of interactions, I think, must be existing more commonly than we think and play important roles in protein-ligand interactions.

Electron density in 1T31
Electron density around His57 and a ligand (pdbid: 1T31, drawn by PyMol)

I drew an electron density contour for another Chymase structure (pdbid 1T31). In this picture we can see a bulge on the imidazole. Does it indicate the existence of an bonding MO or just a coincidence, I wander.

Reading Szabo’s MO program (part 4)

Posted on Updated on

Subroutine COLECT() will be understood straightforwardly.

First, we construct our core-Hamiltonian matrix. Since our atomic orbitals includes only the Helium’s 1s orbital, \phi_1 , and the Hydrogen’s one, \phi_2 , the matrix defined in (3.149) is quite simple:

H_{\mu\nu}^{core} =  \begin{pmatrix}  \langle \phi_1 \vert \hat{h} \vert \phi_1 \rangle & \langle \phi_1 \vert \hat{h} \vert \phi_2 \rangle \\  \langle \phi_2 \vert \hat{h} \vert \phi_1 \rangle & \langle \phi_2 \vert \hat{h} \vert \phi_2 \rangle   \end{pmatrix}

where
\displaystyle  \langle \textbf{r} \vert \hat{h} \vert \textbf{r} \rangle = -\frac{1}{2}\nabla^2 -\frac{Z_A}{|\textbf{r} - \textbf{R}_A|}-\frac{Z_B}{|\textbf{r} - \textbf{R}_B|}  .

Referring to part 3, we can rewrite the \hat{H}^{core}_{\mu\nu} as

H_{\mu\nu}^{core} =  \begin{pmatrix}  T11 + V11A + V11B & T12 + V12A + V12B \\  T21 + V21A + V21B & T22 + V22A + V22B   \end{pmatrix}  .

Second, we make our overlap matrix, S_{\mu|nu}, and diagonalize it. From the definition of (3.136),

S_{\mu\nu} =  \begin{pmatrix}  \langle \phi_1 \vert \phi_1 \rangle & \langle \phi_1 \vert \phi_2 \rangle \\  \langle \phi_2 \vert \phi_1 \rangle & \langle \phi_2 \vert \phi_2 \rangle   \end{pmatrix}  =  \begin{pmatrix}  1 & S_{12} \\  S_{12} & 1  \end{pmatrix}

where S_{12} was defined in part 3.

When we try to convert the S_{\mu\nu} matrix into unity, as in (3.165), \textbf{X}^{\dagger} \textbf{SX} = \textbf{1} , \textbf{X} can be taken as

\textbf{X} = \textbf{U}\textbf{s}^{-1/2}

where \textbf{U} is a unitary matrix to diagonalize \textbf{S} and \textbf{s} is its diagonalized matrix. The relation between \textbf{S} and \textbf{U} is

\begin{array}{rcl}  \textbf{U}^{\dagger}\textbf{SU} &=& \textbf{s} = diag(\lambda_1, \lambda_2) \\  \textbf{s}^{-1/2} &=& diag(1/\sqrt{\lambda_1}, 1/\sqrt{\lambda_2})  \end{array}  .

Since S_{\mu\nu} is a 2\times 2 matrix, \textbf{U} and its eigen values are easily solved:

\begin{array}{rcl}  \textbf{U} &=& \textbf{U}^{\dagger} = \frac{1}{\sqrt{2}}    \begin{pmatrix}      1 & 1 \\      1 & -1    \end{pmatrix} \\  \textbf{s} &=&     \begin{pmatrix}    1+S_{12} & 0 \\    0 & 1-S_{12}    \end{pmatrix}  \end{array}  .

And, therefore,
\textbf{X} = \textbf{U}\textbf{s}^{-1/2} =   \begin{pmatrix}  \displaystyle \frac{1}{\sqrt{2(1+S_{12})}} & \displaystyle \frac{1}{\sqrt{2(1-S_{12})}} \\  \displaystyle \frac{1}{\sqrt{2(1+S_{12})}} & \displaystyle \frac{-1}{\sqrt{2(1-S_{12})}}  \end{pmatrix}
.

Finally, we prepare some arrays, XT(I,J) and TT(I,J,K,L), for the later convenience.

Now, we are ready for SCF calculation!
(To be continued to part 5)

Reading Szabo’s MO program (part3)

Posted on Updated on

Here we will look into the subroutine INTGRL(). In this function we will calculate a lot of integrals among the Gaussian primitives. Since the forms of primitives are fixed, the values of those integrals do not change during the SCF process (The things that will change are the coefficient on the primitives).

In the next subroutine, COLECT(), those integrals are put together as ‘building blocks’ to constuct values with more physical meanings, like Hamiltonian, energies of electric interactions, and so on.

At the beginning of the subroutine, there are the definitions of COEF and EXPON

COEF =  \begin{pmatrix}  1 & 0.678914 & 0.444635 \\  0 & 0.430129 & 0.535328 \\  0 & 0 & 0.154329  \end{pmatrix}

EXPON =  \begin{pmatrix}  0.270950 & 0.151623 & 0.109818 \\  0 & 0.851819 & 0.405771 \\  0 & 0 & 2.22766  \end{pmatrix}

In those matrices, each column corresponds to STO-LG with L=1,2,3. So you use only one of those three columns (If you consider the computing power of today’s computers, there is no reason to choose L=1 or L=2). With those parapeters the 1s atomic orbital is expanded as in (3.295):

\displaystyle  \phi^{CGF}_{1s}(\zeta=1.0) = \sum_{i=1}^{L} d_{i,1s} g_{1s}(\alpha_{i,1s})

where d_{i,1s} = COEF(i,L) and \alpha_{i,1s} = EXPON(i,L). To apply this formula to our system, the scale factor \zeta must be set to \zeta_1=2.0925 for Helium nuclei and \zeta_2=1.24 for Hydrogen nuclei.

For general \zeta value \alpha scales in accordance with the formula (3.224):
\alpha \to \alpha' = \alpha \cdot \zeta^2  .

Using this scaled \alpha', the 1s atomic orbital for general \zeta value is expressed as
\begin{array}{rcl}      \phi^{CGF}_{1s}(\zeta) &=& \displaystyle\sum_{i=1}^{L} d_{i,1s} g_{1s}(\alpha_{i,1s}') \\      &=& \displaystyle\sum_{i=1}^{L} d_{i,1s} (2\alpha_{i,1s}'/\pi)^{3/4} \tilde{g}_{1s}(\alpha_{i,1s}')  \end{array}  .
where
\tilde{g}_{1s}(\alpha) \equiv e^{-\alpha |\textbf{r}-\textbf{R}|^2}  .

In the Szabo’s program following variants are defined:
\begin{array}{rcl}      A1(I) &=& \alpha_{i,1s} \cdot \zeta_1^2 \\      A2(I) &=&  \alpha_{i,1s} \cdot \zeta_2^2 \\      D1(I) &=& d_{i,1s} \cdot (2\alpha_{i,1s}'|_{\zeta = \zeta_1}/\pi)^{3/4} \\      D2(I) &=& d_{i,1s} \cdot (2\alpha_{i,1s}'|_{\zeta = \zeta_2}/\pi)^{3/4}  \end{array}  .

Those definitions make it possible to utilize the formulas defined in my previous post, “Reading Szabl’s MO program (part 2)”. For example, the overlap integral, S_{12}, is expressed as

\begin{array}{rcl}      S_{12} &=& \displaystyle\int\! d^3 r\;\phi_{1s}^{CGF\ast}(\textbf{r}-\textbf{R}_A) \phi_{1s}^{CGF}(\textbf{r}-\textbf{R}_B) \\      &=& \displaystyle\sum_{I=1}^{L} \sum_{J=1}^{L} D1(I)^{\ast} D2(J) \int\! d^3 r\;\tilde{g}_{1s}^{\ast}(A1(I),\textbf{r}-\textbf{R}_A)\tilde{g}_{1s}(A2(J),\textbf{r}-\textbf{R}_B) \\      &=& \displaystyle\sum_{I=1}^{L} \sum_{J=1}^{L} D1(I)^{\ast} D2(J) \cdot (A|B)  \end{array}
where (A|B) was defined in part 2.

Now that we have worked our way up to this point, it is not difficult to understand the other integrals:
Kinetic energies
\begin{array}{rcl}  T11 &=& \displaystyle\sum_{I=1}^{L} \sum_{J=1}^{L} D1(I)^{\ast} D1(J) \left(A \middle| -\frac{\nabla^2}{2} \middle| A \right) \\    T12 &=& \displaystyle\sum_{I=1}^{L} \sum_{J=1}^{L} D1(I)^{\ast} D2(J) \left(A \middle| -\frac{\nabla^2}{2} \middle| B \right) \\    T22 &=& \displaystyle\sum_{I=1}^{L} \sum_{J=1}^{L} D2(I)^{\ast} D2(J) \left(B \middle| -\frac{\nabla^2}{2} \middle| B \right) \\  \end{array}

Nuclear attraction potentials
\begin{array}{rcl}  V11A &=& \displaystyle\sum_{I=1}^{L} \sum_{J=1}^{L} D1(I)^{\ast} D1(J) \left(A \middle| -\frac{Z_A}{|\textbf{r}-\textbf{R}_A|} \middle| A \right) \\    V12A &=& \displaystyle\sum_{I=1}^{L} \sum_{J=1}^{L} D1(I)^{\ast} D2(J) \left(A \middle| -\frac{Z_A}{|\textbf{r}-\textbf{R}_A|} \middle| B \right) \\    V22A &=& \displaystyle\sum_{I=1}^{L} \sum_{J=1}^{L} D2(I)^{\ast} D2(J) \left(B \middle| -\frac{Z_A}{|\textbf{r}-\textbf{R}_A|} \middle| B \right) \\    V11B &=& \displaystyle\sum_{I=1}^{L} \sum_{J=1}^{L} D1(I)^{\ast} D1(J) \left(A \middle| -\frac{Z_B}{|\textbf{r}-\textbf{R}_B|} \middle| A \right) \\    V12B &=& \displaystyle\sum_{I=1}^{L} \sum_{J=1}^{L} D1(I)^{\ast} D2(J) \left(A \middle| -\frac{Z_B}{|\textbf{r}-\textbf{R}_B|} \middle| B \right) \\    V22B &=& \displaystyle\sum_{I=1}^{L} \sum_{J=1}^{L} D2(I)^{\ast} D2(J) \left(B \middle| -\frac{Z_B}{|\textbf{r}-\textbf{R}_B|} \middle| B \right) \\    \end{array}

Electron-electron interactions
\begin{array}{rcl}  V1111 &=& \displaystyle\sum_{I=1}^{L} \sum_{J=1}^{L} \sum_{K=1}^{L} \sum_{L=1}^{L} D1(I)^{\ast} D1(J)^{\ast} D1(K) D1(L) \left(AA \middle| AA \right) \\    V2111 &=& \displaystyle\sum_{I=1}^{L} \sum_{J=1}^{L} \sum_{K=1}^{L} \sum_{L=1}^{L} D2(I)^{\ast} D1(J)^{\ast} D1(K) D1(L) \left(BA \middle| AA \right) \\    V2121 &=& \displaystyle\sum_{I=1}^{L} \sum_{J=1}^{L} \sum_{K=1}^{L} \sum_{L=1}^{L} D2(I)^{\ast} D1(J)^{\ast} D2(K) D1(L) \left(BA \middle| BA \right) \\    V2211 &=& \displaystyle\sum_{I=1}^{L} \sum_{J=1}^{L} \sum_{K=1}^{L} \sum_{L=1}^{L} D2(I)^{\ast} D2(J)^{\ast} D1(K) D1(L) \left(BB \middle| AA \right) \\    V2221 &=& \displaystyle\sum_{I=1}^{L} \sum_{J=1}^{L} \sum_{K=1}^{L} \sum_{L=1}^{L} D2(I)^{\ast} D2(J)^{\ast} D2(K) D1(L) \left(BB \middle| BA \right) \\    V2222 &=& \displaystyle\sum_{I=1}^{L} \sum_{J=1}^{L} \sum_{K=1}^{L} \sum_{L=1}^{L} D2(I)^{\ast} D2(J)^{\ast} D2(K) D2(L) \left(BB \middle| BB \right) \\    \end{array}  .

Reading Szabo’s MO program (part 2)

Posted on Updated on

HFCALC
This is the main routine of the program. In this routine three subroutines are called, those are INTGRL(), COLECT() and SCF(). The optimization process is executed in SCF(). Both INTGRL() and COLECT() are preparation processes for the optimization. Basically if we understand what is going on in INTGRL() and COLECT(), our purpose here is almost achieved, because the procedure to obtain the self consistent field is literally a “routine work” and that is the beauty of Roothaan equation.

Functions
Before investigating each subroutine, let’s take a look at some functions.

F0()
Even for Gaussian Type Orbitals, the integrals are still difficult to calculate. I found a good material about this topic on the internet (http://folk.uio.no/helgaker/talks/SostrupIntegrals_10.pdf). This lecture note covers general formulas of molecular integrals for Gaussian type orbital. In those formulas Boys function defined below plays an important role:

\displaystyle F_{n}(x) = \int_{0}^{1}dt e^{-xt^2}t^{2n}
\displaystyle F_{n}(x) = \frac{2xF_{n+1}(x) + e^{-x} }{2n+1}

In Szabo’s book molecular integrals for 1s orbitals are mentioned in Appendix A. Because our molecular orbitals are expanded in 1s type functions, implementing only F_{0}() function is sufficient for our calculations.

DERF()
This is another implementation of error function. The comment in the program says this implementation is more precise than the built-in one. To call this function, instead of the build-in one, one has to declare “EXTERNAL DEFF” before calling this function.

\displaystyle erf(x) = \frac{2}{\sqrt{\pi}} \int_{0}^{x} e^{-t^2} dt = \frac{2xF_{0}(x^2)}{\sqrt{\pi}}

S()
This is the easiest integral, that is, the function to calculate overlaps between two un-normalized primitives.

For
\displaystyle\tilde{g}_{1s}(\textbf{r} -\textbf{R}_A) = e^{ -\alpha |\textbf{r} -\textbf{R}_A|^2} (A.1),
the overlaps become
\displaystyle S(A,B,R_{AB}^2) \equiv (A|B) = \int d^3 r \tilde{g}_{1s}(\textbf{r} - \textbf{R}_A)\tilde{g}_{1s}(\textbf{r} - \textbf{R}_B) = \left( \frac{\pi}{\alpha + \beta} \right)^{3/2} exp{\left( -\frac{\alpha\beta}{(\alpha + \beta)} | \textbf{R}_A - \textbf{R}_B |^2 \right) } as in (A.9).

T()
The kinetic energy integrals among un-normalized primitives are expressed as
\displaystyle T(A,B,R_{AB}^2) \equiv (A|-\frac{1}{2}\nabla^2 |B) = \frac{\alpha\beta}{\alpha + \beta} \left( 3-2\frac{\alpha\beta}{\alpha+\beta}| \textbf{R}_A - \textbf{R}_B |^2 \right) \left( \frac{\pi}{\alpha+\beta} \right)^{3/2} exp{\left( -\frac{\alpha\beta}{(\alpha + \beta)} | \textbf{R}_A - \textbf{R}_B |^2 \right) } as in (A.11).

V()
The nuclear attraction energy integrals among un-normalized primitives are expressed as
\displaystyle V(A,B,R_{AB}^2,R_{CP}^2,Z_C) \equiv (A|-\frac{Z_C}{r_C} |B) = -\frac{2\pi}{\alpha + \beta} Z_C exp \left( -\frac{\alpha\beta}{\alpha + \beta} | \textbf{R}_A - \textbf{R}_B |^2 \right) F_{0} \left( (\alpha + \beta)| \textbf{R}_P - \textbf{R}_C |^2 \right) as in (A.33) where \textbf{R}_P = (\alpha \textbf{R}_A + \beta \textbf{R}_B)/(\alpha + \beta)

TWOE()
The two-electron integrals among un-normalized primitives become

\displaystyle TWOE(A,B,C,D,R_{AB}^2,R_{CD}^2,R_{PQ}^2) \equiv (AB|CD) \\ = \frac{2\pi^{5/2}}{(\alpha + \beta)(\gamma + \delta)(\alpha + \beta + \gamma + \delta)^{1/2}} exp{\left( -\frac{\alpha\beta}{(\alpha + \beta)} | \textbf{R}_A - \textbf{R}_B |^2 -\frac{\gamma\delta}{(\gamma + \delta)} | \textbf{R}_C - \textbf{R}_D |^2\right) } F_{0} \left( \frac{(\alpha + \beta)(\gamma + \delta)}{\alpha + \beta + \gamma + \delta} | \textbf{R}_P - \textbf{R}_Q |^2  \right)
as in (A.41) where \textbf{R}_P = (\alpha \textbf{R}_A + \beta \textbf{R}_B)/(\alpha + \beta) and \textbf{R}_Q = (\delta \textbf{R}_C + \gamma \textbf{R}_D)/(\delta + \gamma)

Whew! A lot of formulas. But this is exactly why we use gaussian type orbitals, by which we can calculate the physical values analytically to some extent.

Reading Szabo’s MO program (part 1)

Posted on Updated on

Since I finished reading Szabo’s “Modern Quantum Chemistry”, I’ve wanted to understand the FORTRAN program in Appendix B. The program is a simple MO program to solve Hartree-Fock equation for HeH+ system. The book is so famous that someone might have try to explain the program in his or her blog. But who cares? I’ll try to analyse the program here to teach myself 😉

FORTRAN builder
Since I didn’t have FORTRAN environment, I introduced [Photran](http://www.eclipse.org/photran/), a FORTRAN plugin for Eclipse. With the help of eclipse’s modern tools you can write a code and debug it easily.
eclipse
Following [this page](http://wiki.eclipse.org/PTP/photran/documentation/photran8installation), I downloaded [Eclipse for Parallel Application Developers package](http://www.eclipse.org/downloads/packages/node/822). At the first time I tried to debug my code I noticed that my breakpoints are all ignored. It took some time to find that there are two types of breakpoints in the package, C/C++ Breakpoints and Global Breakpoints. If you are writing a code for parallel computing, you must use the latter ones. If you are writing a code for single process, use the former ones or the breakpoint will be skipped. You can change the types from Run/Breakpoint types.

The Code
You can type the FORTRAN code from Szabo’s book by hand. Or you can download it from [GitHub](https://github.com/hkamis/szabo).

Conditions for calculation

HeHplus

* Helium nuclear with electric charge ZA = +2
* Hydrogen nuclear with electric charge ZB = +1
* The distance between Helium nuclear and Hydrogen nuclear is a fixed value of R = 1.4632D0
* Two circulating electrons with charge e = -1 each
* Closed shell (the two electrons will be accomodated in a same molecular orbit)
* Internal OPtion is controlled by the IOP variable
* Atomic Orbit(AO) of each nuclear is explicitly solved for 1s, which is expressed as in (3.202);
\phi_{1s}^{SF}(\zeta,\textbf{r}-\textbf{R}_A) = (\zeta^3/\pi)^{1/2}e^{-\zeta|\textbf{r}-\textbf{R}_A|}
* For Helium nucleus \zeta_1=2.0925D0
* For Hydrogen nucleus \zeta_2=1.24D0
* AOs are approximated by Contracted Gaussian functions(CGF).
\phi_{1s}^{SF}(\zeta,\textbf{r}-\textbf{R}_A) \simeq \phi_{1s}^{CGF}(\zeta,\textbf{r}-\textbf{R}_A) = \displaystyle\sum_{i=1}^{N}d_{i\mu}g_{1s}(\alpha_{i,1s}\zeta^2,\textbf{r}-\textbf{R}_A)
where N=1,2 and 3 corresponds to STO-1G, STO-2G and STO-3G each. And
g_{1s}(\tilde{\alpha},\textbf{r}-\textbf{R}_A) = (2\tilde{\alpha}/\pi)^{3/4}e^{-\tilde{\alpha} r^2} .
* Using AOs, Molecular Orbits(\psi_i) are expressed as in (3.282):
\psi_i(\textbf{r}) = \displaystyle\sum_{\mu=1}^K C_{\mu i}\phi_{\mu}^{CGF} = C_{1i}\phi_{1s}^{CGF}(\zeta_1,\textbf{r}-\textbf{R}_A) + C_{2i}\phi_{1s}^{CGF}(\zeta_2,\textbf{r}-\textbf{R}_B)

Those equations may seem intimidating, but because we know the exact form of \phi_{1s}^{CGF}, all we have to do is to get C_{\mu i} s which are determined by a matrix equation, Roothaan equation:
FC = SC\varepsilon
Once C_{\mu i} s are determined, \psi_i s are determined, which means the problem is solved! That is because you can calculate all the physical observables  using \psi_i s.

Exporting eps files from sdfile

Posted on

I learned that pybel is able to export svg files. But when I tried to convert the svg files into eps files programatically, I found that it was not so easy. After looking around so many pages, I finally found the way.

import pybel
import rsvg, cairo

mols = list( pybel.readfile("sdf","approved.sdf") )
print "Compound Total:",len(mols)

molnames = []
for mol in mols[100:105]:
    base_name = mol.data['DRUGBANK_ID']
    mol.title = base_name
    
    # make svg handle
    svg_data = mol.write("svg")
    h_svg = rsvg.Handle(data=svg_data)
    
    # make cairo context
    srf = cairo.PSSurface(base_name + ".eps", h_svg.props.width, h_svg.props.height)
    srf.set_eps(True)
    ctx = cairo.Context( srf )

    # make the rsvg handle draw on the cairo context
    h_svg.render_cairo(ctx)
    
    #finish drawing
    srf.finish()

If you need png files, you can replace the rendering part with this:

srf.write_to_png(base_name + ".png")

Altanatively, you can create PostScript files with rdkit.

from rdkit.Chem import SDMolSupplier,AllChem, Draw

suppl = SDMolSupplier("approved.sdf")

for i in range(100,105):
    mol = suppl[i]
    t = AllChem.Compute2DCoords(mol)
    base_name = mol.GetProp('DRUGBANK_ID')
    Draw.MolToFile(mol, base_name + ".ps", imageType="ps")

This is much simpler. But you have to convert the ps-files with another tool like “ps2eps”, which is not what I meant.