The problem with quantum gravity I

OK, there are lots of problems with coming up with a theory of quantum gravity. The particular one we’re going to discuss in this series of posts is the non-quadracity of the free part of the Lagrangian for the general theory of relativity.

Put simply, all the classic theories of quantum mechanics, including QED and QCD, have a ‘free’, non-interacting part of the Lagrangian which is quadratic in the fields, and quadratic Lagrangians -> Gaussian integrals -> good times, as Gaussian integrals are basically the only ones we can solve explicitly. However, the Lagrangian for general relativity is not quadratic in the relevant fields, and this becomes a bit of a pain if we want to solve anything analytically.

In this series of posts we’ll dive into this, do some approximations, and see if we can find anything that will help.

First off, this Lagrangian: the Lagrangian for general relativity takes the pretty simple form \mathcal{L}=\sqrt{-\mathrm{det}g}g^{ij}R_{ij}, where g is the metric (with components g_{ij} and components of its inverse g^{ij}) and R_{ij} is the Ricci tensor:

\displaystyle R_{ij} = \frac{\partial}{\partial x^l} \Gamma^{l}_{ij} - \frac{\partial}{\partial x^j}\Gamma^{l}_{il}+\Gamma^m_{ij}\Gamma^l_{lm} - \Gamma^{m}_{il}\Gamma^l_{jm},

which can be written in terms of the metric and its inverse as

\displaystyle \Gamma^m_{\,ij}=\frac{1}{2}g^{mk}\left(g_{ki,j}+g_{kj,i}-g_{ij,k}\right),

which also highlights that throughout these posts we shall we switching freely between using \frac{\partial y}{\partial x^l} and y_{,l} to represent the partial derivative of y with respect to x^l.

We can immediately see that the Lagrangian is a horribly non-linear second-order partial differential equation in g. We’re not even going to try playing with the Lagrangian in its raw form: instead, we’re going to try the simplest and easiest approximation we can think of, which is to expand the Lagrangian in terms of small perturbations of the metric about a Minkowski background, g_{ij}=\eta_{ij} + h_{ij}, where all |h_{ij}|\ll 1, so g^{ij} = \eta^{ij} - h^{ij} + O(h^2).

Because quantum theories that are quadratic in their fields are at least a bit easier to solve, we’re going to expand \mathcal{L} up to quadratic terms in h. Let’s get started!

Manipulations

\setlength\arraycolsep{2pt}\begin{array}{rl}\displaystyle \Gamma^m_{ij} &\displaystyle= \frac{1}{2}\eta^{mk}\left(h_{ki,j} + h_{kj,i}-h_{ij,k}\right) \\ &\displaystyle - \frac{1}{2}h^{mk}\left(h_{ki,j} + h_{kj,i}-h_{ij,k}\right)\end{array},

which is an exact relationship, which is a good start. For simplicity we identify these two terms as

\displaystyle \Gamma^m_{ij} = {\Gamma^{(1)}}^m_{ij} - {\Gamma^{(2)}}^m_{ij},

with the superscript (1) and (2) referring to the corresponding powers of h.

Keeping terms just up to second order in h, this results in the Ricci tensor looking like

\setlength\arraycolsep{2pt}\begin{array}{rl}\displaystyle R_{ij} &\displaystyle= \frac{\partial}{\partial x^l} \left({\Gamma^{(1)}}^l_{ij} - {\Gamma^{(2)}}^l_{ij}\right) \\ &\displaystyle - \frac{\partial}{\partial x^j}\left({\Gamma^{(1)}}^l_{il}-{\Gamma^{(2)}}^l_{il}\right) \\ & \displaystyle+ {\Gamma^{(1)}}^m_{il}{\Gamma^{(1)}}^l_{lm} - {\Gamma^{(1)}}^m_{il}{\Gamma^{(1)}}^l_{jm} + O(h^3)\end{array}.

Multiplying by g^{ij} = \eta^{ij} - h^{ij} + O(h^2) results in

\setlength\arraycolsep{2pt}\begin{array}{rl}\displaystyle g^{ij}R_{ij} &\displaystyle= \eta^{ij}\frac{\partial}{\partial x^l} \left({\Gamma^{(1)}}^l_{ij} - {\Gamma^{(2)}}^l_{ij}\right) -  h^{ij}\frac{\partial}{\partial x^l}{\Gamma^{(1)}}^l_{ij} \\ &\displaystyle- \eta^{ij}\frac{\partial}{\partial x^j} \left({\Gamma^{(1)}}^l_{il}-{\Gamma^{(2)}}^l_{il}\right) + h^{ij}\frac{\partial}{\partial x^j}{\Gamma^{(1)}}^l_{il} \\ &\displaystyle+ \eta^{ij}{\Gamma^{(1)}}^m_{il}{\Gamma^{(1)}}^l_{lm} - \eta^{ij} {\Gamma^{(1)}}^m_{il}{\Gamma^{(1)}}^l_{jm} + O(h^3)\end{array}.

The remaining component of the Lagrangian is the (square-root of) the determinant of the metric. Now using \frac{\partial \mathrm{det} g}{\partial g_{\mu\nu}} = (\mathrm{det}g) g^{\mu\nu}, and the Taylor series definition

\displaystyle \mathrm{det}g = \mathrm{det}\eta + h_{\mu\nu}\left.\frac{\partial \mathrm{det} g}{\partial g_{\mu\nu}}\right|_{g=\eta}+O(h^2),

along with \mathrm{det}\eta=-1, we get that

\setlength\arraycolsep{2pt}\begin{array}{rl}\displaystyle \sqrt{-\mathrm{det}g} &\displaystyle= 1 + \frac{1}{2} \eta^{\mu\nu}h_{\mu\nu}+O(h^2)\\&\displaystyle = 1 + \frac{1}{2} h + O(h^2)\end{array},

where h=\eta^{\mu\nu}h_{\mu\nu} is the trace of the metric variation.

We’re now ready to write down our quadratic expression for the Lagrangian,

\displaystyle \mathcal{L} = \mathcal{L}^{(1)} + \mathcal{L}^{(2)} = \eta^{ij} {\chi^{(1)}}_{ij} + \eta^{ij} {\chi^{(2)}}_{ij} + \left(\frac{1}{2} \eta^{ij} h - h^{ij}\right){\chi^{(1)}}_{ij} + O(h^3),

where

\displaystyle {\chi^{(1)}}_{ij} = \frac{\partial}{\partial x^\rho}{\Gamma^{(1)}}^\rho_{ij} - \frac{\partial}{\partial x^j}{\Gamma^{(1)}}^\rho_{\rho i}

and

\setlength\arraycolsep{2pt}\begin{array}{rl}\displaystyle {\chi^{(2)}}_{ij} &\displaystyle = \frac{\partial}{\partial x^\rho}{\Gamma^{(2)}}^\rho_{ij} - \frac{\partial}{\partial x^j}{\Gamma^{(2)}}^\rho_{\rho i} \\ &\displaystyle + {\Gamma^{(1)}}^\rho_{\rho \lambda}{\Gamma^{(1)}}^\lambda_{ij} - {\Gamma^{(1)}}^\rho_{\lambda j}{\Gamma^{(1)}}^\lambda_{i \rho}\end{array}.

Linearised EFE

We have a quadratic Lagrangian now, and can begin to make progress on solving the theory. But first, let’s check that our Lagrangian reproduces a known result, the linearised Einstein Field Equations.

The Einstein Field Equations are the equations of motion of the Lagrangian, given by

\displaystyle \frac{\delta \mathcal{L}}{\delta h^{\alpha\beta}} - \frac{\partial}{\partial x^\gamma}\frac{\delta \mathcal{L}}{\delta h^{\alpha\beta}_{,\gamma}}=0.

First, let us define some useful derivatives:

\displaystyle \frac{\delta {\Gamma^{(1)}}^m_{ij}}{\delta h^{\alpha\beta}} = 0,

\displaystyle \frac{\delta {\Gamma^{(1)}}^m_{ij}}{\delta h^{\alpha\beta}_{,\gamma}} = -\frac{1}{2} \eta^{mk}\left(\delta^\gamma_j g_{ki} + \delta^\gamma_i g_{kj} - \delta^\gamma_k g_{ij}\right)g_{\alpha\beta},

\displaystyle \frac{\delta {\Gamma^{(2)}}^m_{ij}}{\delta h^{\alpha\beta}} = \frac{1}{2} \delta^m_\alpha \delta^k_\beta \left(h_{ki,j}+h_{kj,i}-h_{ij,k}\right),

\displaystyle \frac{\delta {\Gamma^{(2)}}^m_{ij}}{\delta h^{\alpha\beta}_{,\gamma}} = -\frac{1}{2} h^{mk}\left(\delta^\gamma_j g_{ki} + \delta^\gamma_i g_{kj} - \delta^\gamma_k g_{ij}\right)g_{\alpha\beta}.

Now, to make progress on the equations of motion, we break the required analysis up into terms of different order and first analyse \frac{\delta \mathcal{L}^{(1)}}{\delta h^{\alpha\beta}} - \frac{\partial}{\partial x^\gamma}\frac{\delta \mathcal{L}^{(1)}}{\delta h^{\alpha\beta}_{,\gamma}}.

The first term is simple: \frac{\delta \mathcal{L}^{(1)}}{\delta h^{\alpha\beta}} = 0. The second term can be expanded and analysed without too much difficult, getting

\setlength\arraycolsep{2pt}\begin{array}{rl}\displaystyle \frac{\delta \mathcal{L}^{(1)}}{\delta h^{\alpha\beta}} - \frac{\partial}{\partial x^\gamma}\frac{\delta \mathcal{L}^{(1)}}{\delta h^{\alpha\beta}_{,\gamma}} & \displaystyle =-\eta^{ij}\frac{\partial}{\partial x^\gamma}\frac{\delta {\chi^{(1)}}_{ij}}{\delta h^{\alpha\beta}_{,\gamma}} \\ & \displaystyle = \eta_{\alpha \beta} \left(h^{ij}_{,i,j} - \eta^{ij} h_{,i,j}\right)\end{array}

Note that this term is not zero, so the linear term in the Lagrangian is giving a contribution to the classical solution of the theory.

The second-order contribution is more complex:

\setlength\arraycolsep{2pt}\begin{array}{rl} \displaystyle \frac{\delta \mathcal{L}^{(2)}}{\delta h^{\alpha\beta}} & \displaystyle = \eta^{ij} \frac{\delta {\chi^{(2)}}_{ij}}{\delta h^{\alpha\beta}} - \frac{1}{2} \eta^{ij} \eta^{\mu\nu} \eta_{\mu \alpha}\eta_{\nu\beta} {\chi^{(1)}}_{ij} - {\chi^{(1)}}_{\alpha\beta} + O(h^2) \\ &\displaystyle = \frac{1}{2}\left(\eta^{ij} h_{\alpha\beta ,i,j} + h_{,\alpha ,\beta} - h^i_{\alpha ,\beta ,i} - h^i_{\beta ,\alpha ,i} + \left(h^{ij}_{,i,j} - \eta^{ij}h_{,i,j}\right)\eta_{\alpha\beta}\right)+O(h^2)\end{array}

but maintaining only terms linear in h we find

\setlength\arraycolsep{2pt}\begin{array}{rl} \displaystyle \frac{\delta \mathcal{L}^{(2)}}{\delta h^{\alpha\beta}_{,\gamma}} & \displaystyle = \eta^{ij} \frac{\delta {\chi^{(2)}}_{ij}}{\delta h^{\alpha\beta}_{,\gamma}} +O(h^2) \\ &\displaystyle = \frac{1}{2} \eta_{\alpha\beta} \left(2 h^{ij}_{,i,j} - \eta^{ij} h_{,i,j} - \eta_{lm}\eta^{ij}h^{lm}_{,i,j}\right)+O(h^2)\end{array}

Now to leading order in h we have that \eta_{ij}h^{ij} = \eta^{ij} h_{ij} = h, so bringing everything together we find

\setlength\arraycolsep{2pt}\begin{array}{rl} \displaystyle 0 & \displaystyle = \frac{\delta \mathcal{L}}{\delta h^{\alpha\beta}} - \frac{\partial}{\partial x^\gamma}\frac{\delta \mathcal{L}}{\delta h^{\alpha\beta}_{,\gamma}} \\ &\displaystyle = \frac{1}{2} \left(h_{,\alpha ,\beta} + \eta^{ij} h_{\alpha\beta ,i,j} - h^i_{\alpha ,\beta ,i} - h^i_{\beta ,\alpha ,i} + \eta_{\alpha\beta} h^{ij}_{,i,j} - \eta_{\alpha\beta} \eta^{ij} h_{,i,j}\right) + O(h^2)\end{array}.

Writing the d’Alembert operator \Box = \eta^{\mu\nu} \frac{\partial^2}{\partial x^\mu \partial x^\nu} this can be written to leading order in h as

\displaystyle \frac{1}{2} \left(h_{,\alpha ,\beta} + \Box h_{\alpha\beta} - h^i_{\alpha ,\beta ,i} - h^i_{\beta ,\alpha ,i} + \eta_{\alpha\beta} h^{ij}_{,i,j} - \eta_{\alpha\beta} \Box h\right)  =0,

which agrees with the form of the linearised EFE on, for example, Wikipedia, up to an overall sign.

Great success! Our quadratic Lagrangian successfully recreates the linear equations of motion, and so can be used to derive effects like gravitational waves. In the next post we will move on to examine this quadratic Lagrangian in more detail, and attempting to create a proper quantum theory.

First keras MNIST model

I’ve been working in the machine learning world for a while now, and spent a fair amount of time playing with Session-level TensorFlow models, but until now I had never tried out keras, the higher-level TensorFlow API. I’m now regretting all that wasted time of initialising and saving complex TensorFlow tensors, because for simple models keras is brilliant!

To demonstrate just how easy it is to get started with keras, I’m going to walk through a simple model for categorising the MNIST database of handwritten images. I’m using this as a baseline for testing some analytical ideas, which I might write up here later if they work.

The model we’re going to implement is inspired by the one described in the paper by Hinton et al. that introduced the idea of dropout. We’ll make a model without dropout for now, but as we’ll see it’s very easy to add dropout or other regularisation methods.

To get started, let’s import the images and scale the amplitude of the pixels to 0-1, rather than 0-255 (this standardisation of the length scale means that the activation functions are a bit more under control later on).

import numpy as np
import tensorflow as tf

mnist = tf.keras.datasets.mnist

(x_train, y_train),(x_test, y_test) = mnist.load_data()

x_train, x_test = x_train / 255.0, x_test / 255.0

That’s it, the data is in and available. It’s difficult to imagine anything simpler than that. Similarly, defining the model topology is also incredibly simple:

model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28,28)),
tf.keras.layers.Dense(800, activation=tf.nn.relu),
tf.keras.layers.Dense(800, activation=tf.nn.relu),
tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])

This model takes in the 28×28 MNIST images; flattens them to row-vectors; feeds them through two dense layers of 800 nodes each; and then takes the softmax of the 10 outputs to get probabilities for the image being each of the 10 numbers 0-9.

We could now immediately compile and fit this model: but to show off how much more power keras allows us to wield without much more complexity, let’s implement some slightly non-trivial learning rate and momentum schedules.

A learning rate schedule is where the learning rate of the neural network is not constant over the epochs, but rather changes over time. Hinton et al.’s paper uses a learning rate that exponentially decreases over epochs, multiplying by 0.998 every epoch. This is fairly simple to implement in keras:

import math
def step_decay(epoch):
    initial_lrate = 1.0
    drop = 0.998
    epochs_drop = 1
    lrate = initial_lrate * math.pow(drop, math.floor((1+epoch)/epochs_drop))
    return lrate

lrate = tf.keras.callbacks.LearningRateScheduler(step_decay)

The math.pow function implements the compound effect of the multiplication many times by 0.998. Keras has a built-in method of applying learning rate schedules, through the LearningRateScheduler function, which makes it easy to set your own schedules for changing the learning rate.

Modifying the momentum schedule is a bit more involved, as there isn’t a built-in MomentumScheduler. So let’s build our own!

To do this, I have modified and simplified the underlying source code for the LearningRateScheduler, removing some features we don’t need for this simple example. If you wanted all the features of the LearningRateScheduler, it would not be too difficult to re-implement them. Anyway, without further ado, let’s define the MomentumScheduler class.

class MomentumScheduler(tf.keras.callbacks.Callback):
    def __init__(self, schedule, verbose=0):
        super(MomentumScheduler, self).__init__()
        self.schedule = schedule
        self.verbose = verbose

    def on_epoch_begin(self,epoch,logs=None):
        if not hasattr(self.model.optimizer,'momentum'):
            raise ValueError('Optimizer must have a "momentum" attribute')
        momentum = float(tf.keras.backend.get_value(self.model.optimizer.momentum))
        try:
            momentum = self.schedule(epoch,momentum)
        except TypeError:
            momentum = self.schedule(epoch)
        if not isinstance(momentum, (float,np.float32,np.float64)):
            raise ValueError('The output of the "schedule" function should be float')
        tf.keras.backend.set_value(self.model.optimizer.momentum,momentum)
        if self.verbose > 0:
            print('\nEpoch {}: MomentumScheduler setting momentum to {}'.format(epoch+1,momentum))
    
    def on_epoch_end(self,epoch,logs=None):
        logs = logs or {}
        logs['momentum'] = tf.keras.backend.get_value(self.model.optimizer.momentum)

The try/except clock in this code is designed to handle different formats of scheduler, but we’ll only be using the second version. Exactly the same way we defined the step_decay function above, we can now define our own momentum schedule, inspired again by Hinton et al.‘s paper:

def momentum_schedule(epoch):
    initial_momentum = 0.5
    final_momentum   = 0.99
    epoch_max_value  = 20.0
    if epoch<epoch_max_value:
        return ((final_momentum - initial_momentum)/epoch_max_value) * epoch + initial_momentum
    else:
        return final_momentum

moment = MomentumScheduler(momentum_schedule)

This function gives a linearly increasing momentum between 0.5 and 0.99 over the first epoch_max_value epochs, with a constant value of 0.99 after that. This offers a nice compromise between a low momentum to allow exploration of the cost landscape, and a high momentum to focus down on an identified minimum.

It’s also suggested to modify the learning rate by the local momentum: so change line 6 in the block above:

#lrate = initial_lrate * math.pow(drop, math.floor((1+epoch)/epochs_drop))
lrate = initial_lrate * math.pow(drop, math.floor((1+epoch)/epochs_drop)) * (1.0 - momentum_schedule(epoch))

We’ve now got everything in place to set up our model. Let’s pick an optimiser (simple stochastic gradient descent; it’s unfashionable but works really well) and compile the model, with a cross-entropy cost function.

sgd = tf.keras.optimizers.SGD(clipnorm=15)
model.compile(optimizer=sgd,
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

We’re also going to look at the accuracy as we train the model. Only other thing to mention here is that we’ve set a maximum value of 15 on the norm of the activities in the model: this is again from Hinton et al.‘s paper, and works pretty well to regularise the results.

Let’s train the model.

model.fit(x_train, y_train, epochs=50, batch_size=100, callbacks=[lrate,moment])

We’re using reasonably small batches of 100 images, and only 50 epochs of training so things get done in a sensible time. You could of course train for longer if you so desired. Notice that we’ve also included our callbacks into the fit call to set the learning rate and momentum schedules.

To see how well we’ve done, we’re going to count the number of predictions that we get correct out of the 10,000 test set images.

predictions = model.predict(x_test)
correct = [True if np.argmax(pred)==true else False for pred,true in zip(predictions,y_test)]
print('Number incorrect: {}'.format(len(correct)-np.sum(correct)))
print('Accuracy on test: {:.3f}'.format(np.mean(correct)))

When I run this, for only 50 epochs, I get an accuracy of 0.987, and only 134 misclassifications out of 10,000 test images. Not that long ago that would have been a result competitive with the best in the world!

Hopefully this has shown how simple it is to implement basic models in keras, without having to worry about all the minutiae of low-level TensorFlow. Come back soon for more developments and delvings into machine learning and keras!

mPIMC, v1.00 released!

‘Released’ is a bit of an overstatement in the title there.  What I mean is, I have finally got my Path Integral Monte Carlo code mPIMC in a state where it appears to be working, and can calculate things that look like they might be nearly right.  And so I have put it on the internet, albeit hidden in a heretofore unreferenced GitHub site.

To clarify; ever since my post on developing a Path Integral Monte Carlo (PIMC) code a while back, I’ve been spending a couple of evenings a week writing it, testing it, and documenting it.  It now appears to be working, and so I have given it a name, mPIMC, for Mathematica Path Integral Monte Carlo code.  The name definitely needs work.

I have also released it into the almost-wild.  v1.00 of mPIMC is now available on GitHub, under the GNU General Public License.  So basically, if you go and look for it, you can do anything you like with mPIMC as long as you keep it under that license.  However, caveat emptor (or rather caveat usor): I certainly haven’t found every bug in the code, and I wouldn’t be surprised if there are some bigger mistakes in there as well.

That said, please do go and use mPIMC, at least to have a play around with.  It will not be the quickest code in the world for large simulations, but it is very easy to modify and very quick to obtain sample results with.  That’s what I’ll be using it for over the next few weeks, anyway.

mPIMC_density_graph
A plot of the density of two bosonic particles in an harmonic trap, calculated in mPIMC, compared with the exact density of a single particle (the blue line).  mPIMC automatically calculated lots of expectation values like the density.

Book Waffle: Hyperion

Hyperion by Dan Simmons is one of those books I’ve been meaning to read for a couple of years now.  Every time I go to the bookshop I read the blurb, remember that I’ve read it before and it didn’t really sound like my kind of thing, and buy something else instead.  I now have a very large bone to pick with whoever wrote that blurb, because Hyperion is definitely my kind of thing.

However, the blurb is not entirely inaccurate.  The book is a far-future science fiction novel, it does have nearly omniscient AI overlords, and it is the story of seven(-ish) pilgrims on their way to the Time Tombs on Hyperion.  So far, so pulpy.  But then you read it, and all the pulp falls away.

Because Simmons has taken that most venerable of literary forms, the journey told through its characters, and given it a new lease of life.  In the style of the Canterbury Tales each of the pilgrims takes their turn telling what brought them to this journey, and it is this conceit that makes Hyperion work so well.  Each character has their own personality, and each tells their tale in their own way.  We have a war story for the Colonel, a murder mystery for the detective, and a time-hopping love story (apparently) for the Consul.  The scholar’s tale was so heart-rending it brought me to tears, and the final twists of the Consul’s tale made me want to immediately re-read the whole book to pick up the clues to it I’d missed first time around.  I’m sure there were clues: other twists and turns are hinted at before they appear, with the clues only making sense at the denouement.  I’ll leave it a year or so, but I will return to Hyperion.

Probably not, however, in the sense of the sequels.  Because there are sequels, but I can’t possibly see how they can avoid damaging the enjoyment of this original.  I actually don’t want to know the answer to the ultimate question of where the Shrike comes from: I’m very much enjoying having the same level of knowledge as the characters, who exist in a universe they don’t entirely understand.  Hyperion leaves the reader on a bittersweet cliff edge, watching the protagonists walk away to face what neither we nor they understand.  One clue was given in the Colonel’s story to what happens next, but even that could be read either way: the Colonel saw the bodies of pilgrims in the clutches of the Shrike, but refused to say if it was only one body (possibly accounted for in the book) or more.  And again, I’m perfectly happy with that.

Hyperion is not my favourite book.  It’s not even my favourite science fiction book.  It has flaws: I think the last story could have been given more depth and detail, and the dynamics between the pilgrims were not fleshed out as fully as they could have been (I wouldn’t be surprised if some of them never exchanged a sentence).  But Hyperion is a truly great science fiction book, pushing itself easily into my top three or four.  For its structure, for the brilliance of its world-building, for the bravery of not answering all its questions, I heartily recommend it.

Book Waffle: The Forever War

I recently finished re-reading The Forever War by Joe Haldeman, and thought I’d put down some thoughts on it.  A quote on the back cover says that Peter Hamilton thought that “The Forever War is damn near perfect”, and I’d be very tempted to agree with him.

The Forever War is the story of humanity’s first interstellar war, told from the viewpoint of one of the first soldiers in the war, William Mandella.  I identified strongly with Mandella: he was a physics student before being drafted, and his natural reaction on coming across an unexpected military situation was to try some physical analysis to work out what was going to happen.  But the fates were not kind to Mandella.  When first contact became a shooting war, the best and brightest on earth were drafted into the front-line military.  Presumably this is Haldeman taking a side-swipe at the way the rich, well-connected and well-educated tended to be able to dodge the draft in Vietnam, which was contemporaneous with the writing of the novel, but it was probably the single least believable thing in the book.  Surely the best and brightest would be drafted into back-line, research, roles, rather than as cannon fodder?  But that’s the premise, so I’ll let it slide.

Because Mandella as a physicist is the perfect foil to explain what happens to him, as the realities of fighting an interstellar become clear.  Or rather, the realities of travelling to an interstellar war.  In Haldeman’s universe there is no Star Trek-style warp, or Star Wars hyperspace: there are collapsars, which allow (quasi-)instantaneous travel across the galaxy, but to get to them you have to put in the long slog of real-space travel.  Which means, with the apparently limitless energy available to their spacecraft, an awful lot of flying about near the speed of light, and the commensurate relativistic effects.  So although Manella starts the book in the 1990s, his next few subjective years stretch across millenia of earth-time.

Obviously this isn’t very good for Mandella’s personal relationships.  The first time he comes home on leave his father is dead, and his mother dies soon after, despite them being young when he left.  Soon afterwards the whole world is changed beyond recognition, and when he is reassigned duties separately from his beloved Marygay Potter they age at wildly disparate rates.

But on a grander scale than the sufferings of the protagonist, Haldeman’s vision of the earth’s future is not entirely pessimistic.  He sees war coming close to destroying the ecomony and social cohesion of the planet, but then humanity evolving, sociologically, to overcome these problems.  Mandella’s views on this aren’t always positive (and his natural homophobia, which he works hard to overcome, reads a little strangely from a modern viewpoint), but Haldeman’s future of an increasingly cohesive species could be seen either way.  Humanity overcomes its problems through unity rather than division, at the cost of individual identity.  In a modern telling this would probably be written as a pure dystopia, as individualism has become a defining characteristic of the (Western) human philosophy, but Haldeman constructs a society that can allow regions of difference whilst preferring cohesion for the majority.  And it was this cohesion that, in a way purposefully never fully explained, brings an end to the time-bending Forever War.

So whilst The Forever War might be presented as a novel about an over-qualified soldier aging his way through technologically-extreme warfare, it also comments on the nature of the human condition and the way we communicate with ourselves and others.  It shows the worst of jingoistic humanity, and also some of the best as we come together to solve the problems we ourselves created.  And in the end, there is a sweet, satisfying and unexpectedly emotional resolution for Mandella and his time-stretching partner Potter, which ties up the story perfectly.  Very much recommended.

PIMC, a development story

So a group in the Cavendish is carrying out what sound like some quite interesting experiments with ultra-cold atomic gases in optical traps.  So I think I’d quite like to have a go at numerically modelling that system, see if I can contribute anything useful.  What tool to use?  Other people have done Hubbard-type calculations, not much to be added there.  But I have some experience with diffusion Monte Carlo, a method for calculating ground-state properties starting from a trial wavefunction.  But if I want to model bosons, there aren’t that many freely-available diffusion Monte Carlo codes out there, and none that would be easy for me to modify.  So the conclusion has to be that I need to write my own simulation code.

I could write a diffusion Monte Carlo code.  But it can only do ground states, needs a trial wavefunction, and would be a bit boring because I’ve used diffusion Monte Carlo before.  Time to turn to other methods!

The Monte Carlo bit of diffusion Monte Carlo, evaluating high-dimensional integrals using Markov chain type sampling, is quite effective when you’ve got lots of particles.  And that means the obvious choice for a method to develop is a finite-temperature unbiased Monte Carlo method, of which the most obvious is Path Integral Monte Carlo (PIMC).  This uses the Feynman formulation of quantum mechanics in terms of path integrals instead of the wavefunction directly, completely eliminating the need for a trial wavefunction, and natively includes finite-temperature effects.  Perfect.

Choice of language to write the method in?  Obvious, it’s going to be Mathematica.  It’s not the quickest language in the world, which for a big expensive code isn’t ideal, but I think this is more than made up for by the quick development cycle through ease of use, possibility of rapid sanity checks, and all-in-one structure.  So here we are.

For the basics of the PIMC code I followed this course guide to writing a PIMC code, but found that it quite quickly ran out of suggestions once I got beyond the very basics.  In particular, it has no information on how to calculate energy expectation values (they were included functions for the people taking the course), which it turns out is a bit more complicated than I was expecting.  On to the literature, where these notes by David Ceperley proved very useful and these two old papers helped me with the correct form for the energy expectation value.  I’m still not entirely happy that the code can correctly calculate the energy of a single particle in an harmonic trap: it seems to get the density profile right, but is still a way off on the energy.  Sounds like a fixable problem though.

I’m now off to fiddle with the acceptance rates for proposed Monte Carlo moves, let’s see if that helps at all … might get rid of the highest-energy states that contribute at the moment?

What the &#@; are all those symbols? Part IV

In previous outings, we’ve looked at function definitions, pure functions, and header addition and replacement using the @ symbol.  This time, we’re going to be looking at one of the most misunderstood parts of Mathematica code: the true meaning of the semicolon,;.

If, like me, you come from a C-based background, you might think that this is going to be a really trivial discussion.  I mean, in C the semicolon is just an expression delimiter, nothing more.  You put it at the end of a line to tell the compiler it’s the end of a line.  Isn’t that what we’re doing in Mathematica as well?  Oh, no.

First off, the semicolon in Mathematica is definitely not a delimiter.  I know you might use it to finish off an expression, maybe a function definition, without having it give any output: f[x_]=x^2;.  But I’m afraid to say that actually this expression does give some output.  You just can’t normally see it.

To explain this, we need to know that the semicolon is, like all the other symbols we’ve covered, a shorthand for a full in-built Mathematica function.  In this case, the corresponding Mathematica function is known as CompoundExpression. To examine this, try evaluating FullForm@Hold[a;b;c], which returns the underlying Mathematica functions (through FullForm), without evaluating them (Hold). You should find the output as
Hold[CompoundExpression[a,b,c]].

What this means is that when we think we’re suppressing output using a semicolon, Mathematica thinks that we’re combining expressions into one big compound expression. Luckily, this is one of those rare cases where misunderstanding what we’re actually doing doesn’t make too much difference: the only effect of the CompoundExpression is to return the evaluated form of the last argument, and carry out any internal calculations in the other arguments without returning anything. Which is exactly what we thought we would get by suppressing output.

The difference comes when we put another semicolon in the expression: FullForm@Hold[a;b;c;]. This gives
Hold[CompoundExpression[a,b,c,Null]].
So the trailing semicolon has actually added a whole new expression to Mathematica’s understanding of what’s going on: the blank expression Null. This is what Mathematica then returns when you evaluate the cell containing an expression ending in a semicolon. Luckily, Mathematica doesn’t deign to return
Null,
and so all we as users see is empty screen, and we can quite happily go about our days treating the semicolon as a delimiter.

(As a postscript, I found this StackExchange answer from m_goldberg very useful for getting to understand the semicolon in Mathematica. And for beginners to Mathematica, I’d recommend the whole thread as a great place to learn some tips!)

What the &#@; are all those symbols? Part III

After looking at pattern matching and pure functions, this time we’re going to look at one symbol and its uses: @, the ‘at’ symbol.  There are (at least) four different uses of the @ symbol, and we’ll take them one at a time.  First things first: just one ‘at’ symbol.

@

And just when you thought you’d escaped from functions, here they are again.  The single solitary @ is used to apply a function to an argument: f@x is equivalent to f[x].  As far as I’m aware, the best reason to use an @ instead of the traditional brackets is to avoid the layers and layers of nested brackets that can sometimes result.  Even with smart colouring and grouping in the notebook, those forests of brackets are a bit of a mess.  But that’s one symbol done and dusted, right?  Well, nearly …

@@

When you put two ‘at’ symbols next to each other, things start to get a bit complicated.  There’s definitely an analogy between the effect of one ‘at’ symbol and two of them, but it’s going to take us a little while to get to it.  First, we’re going to take a little diversion into the concept of headers and levels within Mathematica.

Let’s define an object, which we’ll call, for hopefully obvious reasons, vect: let’s say vect={{a,b},c[x],{d,{e,f}}}.  Now in the terminology of Mathematica, the ‘header’ of an object is the top-level construct that it belongs to.  So vect has a header of ‘array’: it is an array of further items. But we can see that there is more structure within vect; some of the elements of the array are themselves structured. This cascade of structure is referred to as the ‘level’ structure of an object in Mathematica. The level-0 structure is the header, so for vect the level zero structure is an array. Each element of the array then gives the level-1 structure: so here, the level-1 structures are an array, a function, and another array. The last element also has level-2 structure, again an array.

So, why are these headers and levels useful?  Well, because Mathematica keeps track of the headers and levels of all the objects we can manipulate them.  In particular, the double ‘at’ symbol @@ is used to replace the header of an expression. So applying the function g to our test array, g@@vect, results in
g[{a,b},c[x],{d,{e,f}}].
As you can see, the outer set of squiggle brackets (that’s a technical term), which demonstrates that the header of vect is an array, has been replaced by the square brackets of the function header of the function g. This can be useful for constructing an array of arguments for a complicated function, and then simply applying the function to them using @@, which automatically changes the header for you.
One further use of @@ is to use the same arguments with several different functions. Because ‘the function f‘ is a header, we can replace it using @@. Evaluating g@@f[x] gives
g[x].
So we can think of the double ‘at’ symbol as a functional, that takes a header and replaces it, in contrast to the single ‘at’ symbol which adds a new (function) header to an expression.

@@@

There are more @s! Three of them is getting a little silly, but there we go, that’s Mathematica. Luckily, using our new knowledge of headers and levels, the effect of @@@ is easy to explain: instead of replacing the header, level-0, structure of an expression, @@@ replaces the level-1 structure of an expression. So, to explain by example, g@@@vect gives
{g[a,b],g[x],g[d,{e,f}]}.
It’s left the header level array intact, but all the level-1 structures are now g functions. Note that this includes the level-1 structure that was already a function, which has been replaced. I’ll leave it up to you to come up with situations where this could be useful.

/@

OK, I cheated, I included another symbol. But this operation, mapping as it’s known, is so useful I really wanted to include it here. It also combines the effect of a single and three ‘at’ symbols, so it fits here conceptually as well.

Recall from above that a single @ symbol adds a new function header to an expression, and @@@ replaces the level-1 structure of an expression. Mapping /@ adds a new level-1 structure to the expression, pushing the current level-1 structure down to form a new level-2 structure, and so on down the cascade. The resulting expression will have one more layer of structure than the starting expression. So to use the example of our array vect={{a,b},c[x],{d,{e,f}}} from above, the action of g/@vect is to give
{g[{a,b}],g[c[x]],g[{d,{e,f}}]}.
Now why was I waxing lyrical about how useful this operation is? Well, mapping is a very elegant way to apply the same function to lots of different arguments in one line of code. Simply construct an array of the arguments, which is not an uncommon occurrence anyway, and then map a function over them. We can even combine this with the pure functions we discussed last time: (1+g[#])&/@{a,b,c,d} gives
{1+g[a],1+g[b],1+g[c],1+g[d]}.
This combination of techniques is incredibly powerful, and doesn’t require the dummy variables of most other methods that give the same result (like using the Table function). The only problem is that it isn’t intrinsically parallelisable; but for most purposes mapping is the simplest and best method to generate an array of outputs from a function.

Anyway, there we go, the @ symbol ticked off. Of the symbols from the title we only have the semi-colon to go. And that’s quite a subtle one, so make sure to check back for it!

What the &#@; are all those symbols? Part II

Last time out we looked at two of the many symbols that make up the syntax of Mathematica; the colon and underscore, both in the context of function definition.  This time round we’ll keep going with function definition, but switch to look at two more symbols; the hash # and ampersand &.  This is not intended to be a full guide to pure functions, which is the Mathematica name for what we’ll be looking at: rather, this is a quick how-to-get-results run through of the simplest methods involved.

Pure functions

The simplest, perhaps most intuitive, way to define a function is the way we discussed previously: f[x_]:=x^2.  This uses pattern matching to identify the argument passed to the function with the variable x on the right hand side.  Whatever you pass as an argument to the function f, the result is its square.

But Mathematica provides another, shorter, syntax to achieve this same result: f1:=#^2 & gives exactly the same result as the function above.  In the figure below I’ve plotted both functions, and you can see their outputs are exactly the same.

pure_function_plot
The traditionally-defined function f and the pure function f1.

You can use the function f1 in exactly the same way as the function f.  Just pass it an argument in the usual way, f1[y] instead of f[y]. In fact these two ways of defining functions are so similar that Mathematica’s check for identity, f1[y]===f[y], returns
True.

So what does the shorter but more symbolic syntax of f1 mean?  Well, at a practical level, it does pretty much the same as the “normal” function definition.  It takes the argument you pass to the function, which is grabbed by the ampersand, and substitutes it in the place of the hash.   There are a couple of slight differences between the definitions though.  Let’s start with some possible benefits, and then some possible drawbacks, of the new pure function definition.

Benefits

The hash # in f1 is actually short for #1, which takes the first argument caught by the ampersand. So although if you pass too many arguments to the basically-defined function, f[x,y], it is just returned unevaluated, if you pass too many arguments to f1 it grabs the first and evaluates using that: f1[x,y] returns
x^2.
This can be useful if you want to pass the same arguments to lots of different functions, which don’t all need all of the information in the arguments. When you decide you need to add one more argument for one function, you can add it on the end of the list and not have to change the definition of all the other functions.

As well as #1, you can also use #2, #3, and so on, for the second and third argument etc. This can be a more elegant way of constructing functions than using explicit arguments like in the function f above, as the name of the argument you come up with as a placeholder will never actually be important. By using pure functions we get rid of information that was never important in the first place. You could consider this as being philosophically similar to the way operators or functions work in “proper” maths. As an example, let’s imagine we want to come up with a function that multiplies all its arguments together. Traditionally you could define this like tradmultiply[x_,y_,z_]:=Times[x,y,z], but the placeholders x, y, and z are already starting to get a little unwieldy. Instead we could write this using a pure function as puremultiply:=Times[#1,#2,#3]&, which does exactly the same job for the first three arguments.  But what if we wanted to use all the arguments of the function, without first having to specify how many there were going to be?  Well, with pure functions you can do that too!  The definition looks like allmultiply:=Times[##]&, with the double hash specifying that all the arguments are substituted there.

The chief benefits of pure functions over the traditional way of defining functions are flexibility in the arguments, and conceptual simplicity in not having to use arbitrary placeholder variables.  But they do also have a couple of disadvantages in their use.

Drawbacks

To turn the argument above on its head, if we have a function whose arguments are all fundamentally distinct from each other, some of them being, say, options or settings, it can be quite difficult to keep track of which one goes where when using a pure function.  In this case the placeholder names in traditional functions serve a useful purpose by reminding you what that argument is meant to represent.

Pure functions also have fairly low priority in the order in which Mathematica evaluates expressions (it would come quite low down BODMAS in an analogy to the order of mathematical evaluation).  So you sometimes need to wrap them in brackets to make sure that each ampersand gets associated with its proper hashes.  This can also sometimes be difficult to debug, as Mathematica will try its best to work out what you’ve written, and occasionally get it wrong.

Conclusions

Taking into account both the advantages and disadvantages of the pure function method of function definition, I’d sum up by saying that they can be a useful method for defining functions that are not too complicated and have arguments that are all in some way “similar”.  They can be useful as the lowest-level functions of a programme that have very simple jobs to do.  But as soon as you get to higher-level functions where the arguments are already complex, the traditional method of defining functions becomes much more readable and controllable, and therefore easier to use.

Next time, we’ll be taking a look at the at symbol @, and its use in (yet more!) function constructions as well as mapping.

What the &#@; are all those symbols? Part I

Complicated Mathematica code can, at first sight, look a little bit like the outcome of a particularly nasty accident in a printing press; various symbols are scattered across the screen, sometimes apparently randomly, sometimes with the same symbol apparently meaning two different things.  I’m not going to try and do a full explanation of what the syntax of Mathematica is really doing.  Instead, I’m going to just give some examples of some of the basic expressions that might initially look a little like gobbledygook, but are actually fairly sensible and useful tools.

Let’s start right at the beginning: function definition.  This most basic of operations already involves (at least) two concepts that can already cause a little bit of confusion: evaluation time, and pattern matching.

Immediate vs delayed evaluation

The two simplest ways to define a function in Mathematica are to write f1[x_]={x,AbsoluteTime[]} and f2[x_]:={x,AbsoluteTime[]}.  Look similar?  They are.  And for most purposes, the different definitions would do the same job.  But let’s quickly unpack the difference that colon makes.

The first function, f1, is evaluated immediately. That is, as soon as you run the cell containing the definition of the function f1, the name f1 is immediately associated with the output {x,AbsoluteTime[]}. Now the function AbsoluteTime[] is an inbuilt function that gives the number of seconds that have passed since the first of January 1900. So as soon as you run the function f1, it looks up what the time is, and associates that with the function. As I’m writing this, this means that whatever argument I pass to f1, the second part of the output stays the same: f1[y] gives
{y, 3.6830157373836465*10^9},
and f1[1] gives
{1, 3.6830157373836465*10^9},
and f1["Hello World!"] gives
{“Hello World!”, 3.6830157373836465*10^9}.
In all cases, the second part of the output is the same.

Now let’s compare that with what the function f2 does. When you run the cell containing this function, it doesn’t immediately evaluate the function. Instead, the definition is kept (held) in an unevaluated form until you use it; the evaluation is delayed. When and only when you do use the function the definition is evaluated. So if you run f2 a few times with different arguments, evaluating it at different times, you’ll get different values for the second argument. For example, I get
{y, 3.6830157474096105*10^9}
or
{y, 3.6830159672879374*10^9},
depending on when I use the function f2[y].

Obviously that was a fairly contrived example. You probably actually don’t care about how many seconds it is since the first of January 1900. But you might care about what else has happened in your notebook before you get round to evaluating your function. Using immediate evaluation (=) means that your function will always give the same result, regardless of what order you evaluate the contents of your notebook in. Using delayed evaluation (:=) can mean that the function can adjust itself to what else has happened in the notebook since you defined the function.

So which version of function definition should you use? As always, that depends what you want to do. But to paraphrase the Mathematica documentation, if you want to set the final value of an expression, use the immediate evaluation syntax.  But if you want to set a “command” for finding the value of an expression, use the delayed evaluation := syntax.  In most cases, delayed evaluation is probably the best choice for function definition.

Pattern matching

The other funny symbol in the definition of the functions above was the underscore next to the argument.  Let’s look a bit closer at that underscore, and see what it is and why we need it.

We’ll start with a fresh new function, the very complicated g[x_]:=x. The underscore tells Mathematica that the x in square brackets is just a placeholder: wherever it finds an x on the right hand side of the equals sign, it should replace it with the argument of the function. So for example, g[y] gives
y,
or g[1] gives
1,
and so on. Basically, this is the definition that does what you might intuitively expect a function to do.

So what happens if we forget about the underscore? Let’s define g1[x]:=x. Then g1[y] gives
g1[y],
and g1[1] gives
g1[1].
So this looks kind of useless, as though it isn’t doing anything at all. But if we try g1[x] we find
x.
So if the argument of the function g1 explicitly matches the form given in the definition, the function does evaluate as expected. And it might not look like it, but this can actually be really useful.

Let’s imagine we have a very expensive function to evaluate. This might be a horrifically complicated integral, or a differential equation, or some other detailed numerical process. For now we’ll use the simple test function func[x_,a_]:={Pause[1];x^a}, which spends a second waiting before returning {x^a}. But let’s also imagine there’s also a corner case where you can do the integral analytically, or the differential equation has a trivial solution, or you can otherwise know the result of the function in a much simpler way than the full definition gives. For our example, let’s say it’s easy to evaluate func[x,0] for any value of x as long as a is identically 0. We’d write the definition of func as

func[x_,a_]:={Pause[1];x^a}
func[x_,0]:={x^0}

to achieve this. Then func[y,1.5]//AbsoluteTiming (sorry about the double slashes … we’ll discuss those in a future entry!) returns
{1.00022, {y^1.5}},
so it’s taken a second to run, but func[y,0]//AbsoluteTiming runs almost instantly,
{7.93066*10^-6, {y^0}}.
By smart use of underscores we’ve given the function more knowledge about possible arguments, which can make it easier to evaluate.

The pattern matching of underscores can be used to give even more information to functions. Instead of just single values, you can pass whole classes of variables to functions and get it to behave differently.

Imagine that as well as 0, the function func2[x,a] is really easy to evaluate for a being any integer. We can give this information to Mathematica by using

func2[x_,a_]:={Pause[1];x^a}
func2[x_,a_Integer]:={x^a}

as the function definition. This works by checking the form of the second argument of the function against Mathematica’s inbuilt knowledge of what constitutes an integer. If a is an integer it uses the second line of the definition, and otherwise it uses the first line.

So, to give a couple of examples, func2[y,1.5]//AbsoluteTiming returns
{1.00102, {y^1.5}},
while func2[y,1]//AbsoluteTiming returns
{7.46415*10^-6, y}.

That’s enough about function definitions for now; although next time we’ll be returning to the topic, to explore a few more symbols in the use of pure functions. But to keep you occupied until then, you could think a bit more about the pattern matching above: for example, why does func2[y,1.]//AbsoluteTiming take a second to run, and return
{1.00042, {y^1.}}?