Backward Automatic Differentiation Explained

reversegear

Today I will try to explain how the forward and backward mode in automatic differentiation work. I will only cover the principle, not actual algorithms and the optimizations they apply. While the so called forward mode is quite intuitive, it is not so easy to wrap your head around the backward mode. I will try to go through all steps and not leave out anything seemingly trivial.

We consider the computation of a function f: \mathbb{R}^n \rightarrow \mathbb{R}^m with independent variables x_1, \dots , x_n and dependent variables y_1, \dots , y_m. The ultimate goal is to compute the Jacobian

J = \begin{pmatrix}  \frac{\partial y_1}{\partial x_1} &  & \dots &  & \frac{\partial y_1}{\partial x_n} \\   & \ddots & & & \\  \vdots &  & \frac{\partial y_i}{\partial x_j} &  & \vdots \\   &  &                                   & \ddots & \\  \frac{\partial y_m}{\partial x_1} & & \dots & & \frac{\partial y_m}{\partial x_n}  \end{pmatrix}

We view the function f as a composite of elementary operations

u_k = \Phi_k( \{u_\kappa\}_{(\kappa,k) \in \Lambda})

for k > n where we set u_k = x_k for k=1,\dots,n (i.e. we reserve these indices for the start values of the computation) and u_k = y_k for k=K-m+1, \dots, K (i.e. these are the final results of the computation). The notation should suggest that u_k depends on prior results u_\kappa with (\kappa,k) in some index set \Lambda. Note that if (k,l)\in\Lambda this refers to a direct dependency of u_l on u_k, i.e. if u_k depends on u_j, but u_j does not enter the calculation of u_k directly then (j,l) \notin \Lambda.

As an example consider the function

f(x_1, x_2) = x_1 + x_2^2

for which we would have u_1 = x_1, u_2 = x_2, u_3 = u_2^2, u_4 = y_1 = u_1+u_3. The direct dependencies are (1,4), (2,3) and (3,4), but not (2,4), because x_2=u_2 does not enter the expression for u_4 directly.

We can view the computation chain as a directed graph with vertices u_k and edges (k,l) if (k,l)\in\Lambda. There are no circles allowed in this graph (it is a acyclic graph) and it consists of K vertices.

We write |i,j| for the length of the longest path from u_i to u_j and call that number the distance from i to j. Note that this is not the usual definition of distance normally being the length of the shortest path.

If u_j is not reachable from u_i we set |i,j| = \infty. If u_j is reachable from u_i the distance is finite, since the graph is acyclic.

We can compute a partial derivative \partial u_m / \partial u_k using the chain rule

\frac{\partial u_m}{\partial u_k} = \sum_{l|(l,m)\in\Lambda} \frac{\partial u_m}{\partial u_l} \frac{\partial u_l}{\partial u_k}

This suggest a forward propagation scheme: We start at the initial nodes u_1, ... , u_n. For all nodes u_l with maximum distance 1 from all of these nodes we compute

c_l = \sum_{i=1,\dots,n} \frac{\partial u_l}{\partial u_i} c_{i}

where we can choose c_i for i=1,\dots,n freely at this stage. This assigns the dot product of the gradient of u_l w.r.t. x_1, \dots, x_n and (c_1,\dots,c_n) to the node u_l.

If we choose c_k=1 for one specific k\in\{1,\dots,n\} and zero otherwise, we get the partial derivative of u_l by u_k, but we can compute any other directional derivatives using other vectors (c_1,\dots,c_n). (Remember that the directional derivative is the gradient times the direction w.r.t. which the derivative shall be computed.)

Next we consider nodes with maximum distance 2 from all nodes u_1,\dots,u_n. For such a node u_l

c_l = \sum_{i=1,\dots,n} \frac{\partial u_l}{\partial u_i} c_i = \sum_{i=1,\dots,n} \sum_{k|(k,l)\in\Lambda} \frac{\partial u_l}{\partial u_k} \frac{\partial u_k}{\partial u_i} c_i = \sum_{k|(k,l)\in\Lambda} \frac{\partial u_l}{\partial u_k} c_k

where we can assume that the c_k were computed in the previous step, because their maximum distance to all initial nodes u_1,\dots,u_n muss be less than 2, hence 1.

Also note that if k \in \{1,\dots,n\}, which may be the case, \partial u_l / \partial u_k = 1 if k=l and zero otherwise, so \sum_{i=1,\dots,n} \partial u_k / \partial u_i c_i = c_k trivially. Or seemingly trivial.

The same argument can be iterated for nodes with maximum distance 3, 4, \dots until we reach the final nodes u_{K-m+1},\dots,u_K. This way we can work forward through the computational graph and compute the directional derivative we seek.

In the backward mode we do very similar things, but in a dual way: We start at the final nodes and compute for all nodes u_l with maximum distance 1 from all of these nodes

\overline{c_l} = \sum_{i=K-m+1,\dots,K} \frac{\partial u_i}{\partial u_l} \overline{c_i}

Note that we compute a weighted sum in the dependent variables now. By setting a specific c_k to 1 and the rest to zero again we can compute the partial derivatives of a single final variable. Again using the chain rule we can compute

\overline{c_l} = \sum_{i=K-m+1,\dots,K} \frac{\partial u_i}{\partial u_l} \overline{c_i} = \sum_{i=K-m+1,\dots,K}\sum_{k|(l,k)\in\Lambda} \frac{\partial u_i}{\partial u_k}\frac{\partial u_k}{\partial u_l} \overline{c_i} = \sum_{k|(l,k)\in\Lambda} \frac{\partial u_k}{\partial u_l} \overline{c_k}

for all nodes u_l with maximum distance of 2 from all the final nodes.

Note that the chain rule formally requires to include all variables u_k on which u_i depends. Howvever if u_k does not depend on u_l the whole term will effectively be zero, so we can drop these summands from the beginning. Also we may include indices k on which u_i does not depend in the first place, which is not harmful for the same reason.

As above we can assume all \overline{c_k} to be computed in the previous step, so that we can iterate backwards to the inital nodes to get all partial derivatives of the weighted sum of the final nodes w.r.t. the initial nodes.

Backward Automatic Differentiation Explained

Local Observers

Hello again. Today I am looking at the observable pattern. The pattern is simple at first sight. Its responsibility is to ensure that changes in values are propagated to all dependent objects.

Like when a swap quote ticks the Quote object that holds it should notify a YieldTermStructure that is built using this quote. Which means that the zero rates should be re-bootstrapped using the new market data.

Here another pattern comes into play that is closely coupled with the observable pattern, which is the lazy object. It has an enjoyable name already and it is very useful. A yield term structure is a lazy object actually and this means it won’t re-bootstrap unless someone is really interested in the updated zero rates. So a lot of ticks might happen in the market quotes, but they only mark the dependent structures “dirty”, so that they know they have to recalculate when asked for some result later on (maybe this will never happen, then it will never recalculate). Without the lazy object pattern and frequent changes in the input data on the other hand you are likely to freeze your cpu with unnecessary computations.

In Germany we have this silly rule that when you have a little crack in your windscreen due to a stone flying around on the Autobahn and when this crack is in a defined area of your windscreen (which is almost surely the case, I can tell), you have to replace the whole thing which is around 500 EUR. However you wouldn’t do that until you have to go to the official biyearly routine checks for your car (which are supervised by our dreaded Technischer Überwachungs Verein). So you just collect the cracks, as many as they occur, and do the repair once, just before the check. Lazy object pattern applied, saves you a lot of money.

But today it is about observables, so let’s have a look at how they work. A dependent object usually derives from Observer which has a registerWith() method that can be invoked on the observed objects, which in turn derive from Observable. An observer will use the registerWith() method to listen to all objects which it is interested in.

When an observable changes its state it fires a notification to all observers by invoking their update() method. This notification is triggered by the notifyObservers() method, that can be invoked from an observable instance whenever its creator feels that a change should send a notification to the interested observers. The update() method on the other hand is virtual and implemented in Observer instances to react appropriately to notifications. For example the LazyObject implements the update() method with the mark-dirty logic described above.

This is nice, however can build complex and huge structures during runtime. For example a quote can be observed by a swap rate helper, which is observed by a yield term structure which in turn is observed by another swap rate helper (using the yield term structure as an exogenous discounting curve) which is observed by a second yield term structure. This is observed by an Euribor index using the yield term structure as its estimation curve. The index is observed by an Euribor floating coupon and this again is observed by a swap instrument. This swap instrument is observed by a swaption instrument as its underlying. And so on … what we get is a directed acyclic (hopefully) graph describing the dependencies between the objects.

Sometimes the mechanics of the observable pattern is too coarse. Let’s take a specific example I recently ran into. The SwaptionVolCube1 class represents a SABR swaption cube. It registers with a lot of observables, which I just copy paste here

        registerWith(atmVol_);
        ...
        registerWith(swapIndexBase_);
        registerWith(shortSwapIndexBase_);
        ...
        registerWith(volSpreads_[j*nSwapTenors_+k][i]);
        ...
        registerWith(Settings::instance().evaluationDate());
        ...
        registerWith(parametersGuessQuotes_[j+k*nOptionTenors_][i]);

All except the last statement are done in the base class SwaptionVolatilityCube. The last statement registers with the initial values for the SABR model which are user input.

Now imagine we build a SABR cube with some parameter guesses, say we set them all to \alpha=0.05, \beta=0.8, \nu=0.5, \rho=0.0 and let the class calibrate \alpha, \nu, \rho to the market smiles starting from these values. \beta is held fixed for the moment. The next step could be that we do a calibration to the cms market using the respective helper class’s CmsMarket and CmsMarketCalibration. They adjust the \beta parameters such that cms swap margins are matched.

The result of all this is a calibrated cube which we can use as a market data object for other calculations.

Now suppose the forward curves embedded in the swap indexes swapIndexBase and shortSwapIndexBase change due to a market tick. What happens is that the SwaptionVolCube1 object is market dirty (one little crack, don’t panic – the lazy object trick) and once a volatility is queried from the cube, perfomCalculations() is invoked to update the cube. This update triggers a recalibration of the cube to the new atm level which ultimately means that \alpha,\nu,\rho are recalibrated.

It is reasonable that the calibration of \beta is not part of this update, because it takes rather long. Also it seems ok to do the calibration to the cms market only once a day. Actually it may be good enough to recalibrate \alpha only to the real-time ticking atm volatility quotes and changing atm levels and update the whole set of SABR parameters only once a day using the quoted smiles and the cms market information.

Anyway what happens (or happened before the fix of that issue) during the update of the cube is that the SABR parameters are initialized with the values in the associated parameter guess quotes, because the notification may have arisen from a change in one of these quotes. We just don’t know. I did not write the original logic, but I assume the rationale for letting the initial parameter guess be quotes is that traders might want to feed the \beta directly into the cube, as an alternative to the calibration on the cms market. In this use case it is perfectly reasonable to observe these \beta values and recalibrate the cube when they change and set the \beta‘s to the prescribed values.

However in our use case – when using the cms market calibration – this destroys the calibrated \beta‘s, even when the parameter guess quotes remain unchanged and the notification actually came from quite another context (the yield term structure our scneario here): The \beta‘s would be overwritten wit the initial parameter guesses, which were 0.8.

The problem is that the observability pattern is just asking “did something change ?” and reacting with “I will update everything“. It does not distinguish the sources of notification and does not narrow down the update process to the parts that are required by the specific source of notification.

A workaround for this are local observers, which are separated observer instances stored in the main class as member variables. In our case we can define

        class PrivateObserver : public Observer {
          public:
            PrivateObserver(SwaptionVolCube1x<Model> *v)
                : v_(v) {}
            void update() {
                v_->setParameterGuess();
                v_->update();
            }
          private:
            SwaptionVolCube1x<Model> *v_;
        };

       boost::shared_ptr<PrivateObserver> privateObserver_;

The member is initialized with

        privateObserver_ = boost::make_shared<PrivateObserver>(this);

in the cube’s constructor and the registration with the parameter guess quotes take place from this local observer like this

        for (Size i=0; i<4; i++)
            for (Size j=0; j<nOptionTenors_; j++)
                for (Size k=0; k<nSwapTenors_; k++)
                    privateObserver_->registerWith(parametersGuessQuotes_[j+k*nOptionTenors_][i]);

This way, the setParameterGuess() method (which reinitializes the parameter guesses) is only called from the local observer in case one of the input guesses actually changes. In addition the main (swaption cube 1) class’s update method is triggered afterwards as well, so that other updates depending on changes of the parameter guesses can take place. After all the swaption cube still observes the parameter guess quotes via this indirection, but the reaction to the change can be made more specific.

What remains to mention is that the call to setParameterGuess() from the local observer breaks the lazy object strategy of doing nothing but marking the object dirty until a request triggers a re-computation. This seems ok in this case, since the setting of the parameter guesses is very lightweight in terms of computational resources.

Under other circumstances one can also replace the local observer by a full local lazy object. However then one must implement the update() method in the local lazy object such that the main class is notified of the change immediately, too (otherwise it would remain in a “green” status despite the change in the – indirect – observable):

void update() {
    LazyObject::update();
    v_->update();
}

In addition the performCalculations() method in the main class must ensure a valid status of the local lazy object by doing

localLazyObject_->calculate();

If a notification marked the local lazy object dirty, this call will trigger its performCalculations() method. If the local lazy object is still “green” during the main class’s performCalculations(), it will do nothing. Just as it should be.

Local Observers

XVA for Bermudan Swaptions

Welcome back. Sorry for the hiatus which was longer than planned. Anyway and finally here is the (presumably) last episode in the series of posts on proxy pricing of exotics for XVA simulations. In my last posts I wrote a bit on the general design principles I came up with and how I used it to implement the FX TaRF. Today I will describe how to do similar things for bermudan swaptions using the same design principles.

Some general thoughts beforehand though. Reflections on what we are doing here at all. What we have is some model M in which we want to compute some XVA or potential exposure figure \Xi for our exotic deal I. Possibly and probably together with a whole portfolio of other deals which interact in a non linear way, i.e.

\Xi( \sum_i I_i ) \neq \sum_i \Xi( I_i)

To do this we need NPVs \nu of each I_i at some time t conditional on some model state m(t) in our model M:

\nu = \nu(I_i,t,m(t))

M might be operated in a real world measure (e.g. for exposure measurement) or a risk neutral measure (e.g. for CVA calculation), dependent on the nature of the figure to compute. The model state m(t) implies a market data scenario like certain discount and forwarding term structures and swaption or cap volatility surfaces and the like.

Ideally the NPV \nu is computed in the same pricing model we use for our regular PL computation or trading activities, calibrated to the market data scenario created by M in the way we would do it for usual pricing. That means in general we have an “outer” model M and an “inner” model N_i for each instrument I_i. Even if the instruments all allow for the same model for the inner evaluation, they will probably be different in their calibrated parameters:

For example if we have a portfolio composed of single currency swaps, callable swaps and swaptions we could choose the Hull White model to evaluate the callable swaps and swaptions and price the swaps directly on the yield curves. But each callable swap / swaption would in general require a different set of calibration instruments, so we would have a set of local, inconsistent models for each individual swaption valuation. We could try to use a global model in which all instruments can be priced consistently and maybe use this model as the outer model for XVA calculation at the same time, provided that the XVA figure \Xi allows for risk neutral valuation at all.

This could be a Libor market model which is one of the most flexible models in terms of global calibration. The global calibration alone would be a delicate task however. If more than one currency is involved and maybe even other asset classes the task gets even more demanding. And what do you do if you want a real world exposure figure? This approach seems quite heavy, not impossible, but heavy.

So it seems reasonable to stick with the concept of separate outer and inner models in general. The question is then how to efficiently compute the inner single NPVs \nu(I_i,t,m(t)) conditional on market scenarios generated by the outer model. Full pricing is not an option even for relatively simple inner models like Hull White one factor models.

Another idea could be to compute some reference NPVs in time direction t assuming some arbitrary market data scenarios (e.g. simply using today’s market data rolled forward), together with a “rich” set of sensitivities which allow to estimate NPVs under arbitrary market scenarios by Taylor approximation and interpolating in time direction (or use a theta sensitivity). Maybe automatic differentiation comes into play here.

The idea I followed in the previous posts is different, more simple, and goes like this. During an initial single deal pricing performed by a monte carlo simulation we collect path data and build a regression model of conditional NPVs on the model state, pricing time and possibly more ancillary variables (like the accumulated amount in the case of the FX TaRF) just as it is done in the Longstaff Schwartz method for early exercise pricing (or almost like this, see below). Then, when asked for an NPV at a pricing time t under a given market scenario m(t) (and possibly more conditions like the already accumulated amount of a target structure) we imply the model state belonging to the market scenario and use our regression model to produce the desired NPV very efficiently.

The limitation of this method is that the given market data can not be replicated in full beauty in general. In case of the FX TaRF we could replicate any given FX spot rate, but not its volatility which is fixed and implied by the original pricing model for example. In case of bermudan swaptions in the Hull White model we will see in this post that we can replicate one given reference rate, i.e. the general level of the yield curve, but not the shape of the curve, the spread to other yield curves involved and in particular not the volatility structure, which are all fixed again and implied by the model used for initial pricing.

Actually what can be replicated is exactly corresponding to what is explicitly modelled as a stochastic quantity in the model: Since for the TaRF we used a Black model, possibly with local volatility but not with a stochastic factor for the volatility, the volatility is fixed. The FX spot on the other hand as the explicitly modelled quantity can be replicated to be any desired value.

In case of the Hull White model we can imply the level of the yield curve, since this is modelled through the one factor in the model, but we can not match a given shape, since the model does not have the flexibility change the shape of the curve (the shape does change in fact, but in a fixed manner, not driven by stochastic factors in the model). The same with the volatility structure, it is implied by the initial pricing model again.

If one accepts these limitations the method offers a very fast and handy way to approximate the desired scenario NPVs however.

Let’s come to an example how this works for bermudan swaptions. I’d like to explain this going through an example I wrote. We start like this

// Original evaluation date and rate level

Real rateLevelOrig = 0.02;
Date refDateOrig(12, January, 2015);
Settings::instance().evaluationDate() = refDateOrig;

// the yield term structure for the original pricing
// this must _not_ be floating, see the warning in
// the proxy engine's header.

Handle<YieldTermStructure> ytsOrig(boost::make_shared<FlatForward>(
         refDateOrig, rateLevelOrig, Actual365Fixed()));

The original pricing date is 12-Jan-2015 and the yield term structure on this date is at 2\% flat. We set QuantLib’s original pricing date to this date and construct the yield term structure.

One important point is that the yield term structure has a fixed reference date, i.e. it does not change when the evaluation date changes. The reason is that this yield term structure will be linked to the initial pricing model which in turn will be used in the proxy pricing engine later on, which finally relies on the fact that the model does not change when shifting the evaluation date for scenario pricing.

It is like having a fixed frame, as seen from the outer model described above, within which we do local pricings at different times and under different market scenarios, but relying on the frame surronding it to not change. Likewise the rates describing the yield term structure should not change, nor should the initial pricing model be recalibrated or change its parameters during the scenario pricings. The pricing model will be a Hull White model

// the gsr model in T-forward measure, T=50 chosen arbitrary here
// the first model uses the fixed yts, used for the mc pricing
// generating the proxy

boost::shared_ptr<Gsr> gsrFixed = boost::make_shared<Gsr>(
            ytsOrig, stepDates, sigmas, reversion, 50.0);

The monte carlo pricing engine will be set up like this

boost::shared_ptr<PricingEngine> mcEngine =
     MakeMcGaussian1dNonstandardSwaptionEngine<>(gsrFixed)
          .withSteps(1) // the gsr model allows for large steps
          .withSamples(10000)
          .withSeed(42)
          .withCalibrationSamples(10000)
          .withProxy(true);

We do not need a fine time grid since the stochastic process belonging to the GSR model allows for exact evolution over arbitrary large time intervals. We use 10000 steps to calibrate the Longstaff-Schwartz regression models and also for the final pricing. The seed is 42, why not. The last attribute says that we want not only an usual pricing of the swaption, but also generate a proxy information object which can be used for scenario pricing later on. Since the generation consumes some additional time, it is optional to do it.

Now we can do a good old pricing on our instrument swaption2 (I skipped how this was created).

  swaption2->setPricingEngine(mcEngine);
  Real npvOrigMc = swaption2->NPV();
  Real errorOrigMc = swaption2->errorEstimate();

What I also did not show is that I created a reference integral engine to verify the prices of the mc engine and later on of the proxy engine also. The reference engine relies on a floating yield term structure, both with respect to the reference date and the rate level. The same holds for the proxy engine, so we can move forward in time, change the rate level and compare the pricings in the two engines.

The result of the inital pricing is

Bermudan swaption proxy pricing example
Pricing results on the original reference date (January 12th, 2015):
Integral engine npv = 0.0467257 (timing: 97723mus)
MC       engine npv = 0.0459135 error estimate 0.000809767 (timing: 5.55552e+06mus)

The reference NPV from the integral engine is 467bp, slightly higher than the monte carlo price 459bp. This is perfectly expected since the regression model based approximate exercise decisions are sub-optimal, so the option NPV will be underestimated systematically. Together will the error estimate (one standard deviation) of 8bp this all looks quite ok.

One word about the computation times. The monte carlo simulation (10000 paths both for calibration and pricing) takes 5.5 seconds. This is somewhat in the expected region and I am completely relying on the standard monte carlo framework of the library here. The integral engine on the other hand takes 97 milli seconds, which is quite slow. This can easily be brought down to 5 milliseconds just by using less integration points and covered standard deviations (here I used 129 points and 7 standard deviations), without loosing too much accuracy. This seems quite ok, since neither the GSR model nor the integral engine are much optimized for speed currently.

To create the proxy engine we can say

  boost::shared_ptr<PricingEngine> proxyEngine =
         boost::make_shared<ProxyNonstandardSwaptionEngine>(
              swaption2->proxy(), rateLevelRef, maturityRef, 64, 7.0, true);

where we just pass the proxy result from the pricing above together with quotes representing the rate level of the scenario pricing and the maturity to which the rate level belongs. Remember that we can only prescribe the level, but not the shape of the yield curve in the scenario pricing. So I allowed to chose a maturity, e.g. 10 years, and prescribe the (continuously compounded) zero yield w.r.t. this maturity. The proxy engine will then imply the Hull White’s model state such that this rate level is matched for the given maturity. The shape of the yield curve is implied by the initial model though and can not be changed. I am repeating myself, ain’t I ? Do I ? Repeat ? God.

The next two parameters refer to the way we do the scnenario pricing between two of the original structure’s exexercise dates: We determine the next available exercise date (if any) and build a grid on the exercise date covering n standard deviations (here 5.0) of the state variable around its mean, all this conditional on the pricing time and the model state (which is implied by the rate level as explained above). We then compute the NPV on each of these grid points using the regression model from the initial monte carlo pricing. Finally we interpolate between the points using cubic splines and integrate the resulting function against the state variable’s density (which is possibly in closed form by the way).

Let’s see what we get under some scenarios for the evaluation date and rates. One scenario is simply created by e.g. saying

 Settings::instance().evaluationDate() = Date(12, June, 2015);
 rateLevelRefQuote->setValue(0.02); 
 maturityRefQuote->setValue(10.5);  

which moves the evaluation date half a year ahead to 12-Jun-2015 and requires the 10.5y zero rate (which corresponds to the maturity of the swaption as the reference point) to be 2%. For the scenarios in the example we get

Pricing results on June 12th, 2015, reference rate 0.02 with maturity 10.5
Integral engine npv = 0.0444243
Proxy    engine npv = 0.0430997 (timing: 127mus)

Pricing results on June 11th, 2019, reference rate 0.025 with maturity 6.5
Integral engine npv = 0.0372685
Proxy    engine npv = 0.0361514 (timing: 170mus)

Pricing results on June 11th, 2019, reference rate 0.02 with maturity 6.5
Integral engine npv = 0.0223442
Proxy    engine npv = 0.0223565 (timing: 159mus)

Pricing results on June 11th, 2019, reference rate 0.015 with maturity 6.5
Integral engine npv = 0.0128876
Proxy    engine npv = 0.0141377 (timing: 169mus)

Pricing results on January 11th, 2020, reference rate 0.02 with maturity 6
Integral engine npv = 0.0193142
Proxy    engine npv = 0.0194446 (timing: 201mus)

Pricing results on January 11th, 2021, reference rate 0.02 with maturity 5
Integral engine npv = 0.0145542
Proxy    engine npv = 0.0137726 (timing: 208mus)

Pricing results on June 11th, 2024, reference rate 0.02 with maturity 1.5
Integral engine npv = 0.00222622
Proxy    engine npv = 0.00292998 (timing: 282mus)

which looks not too bad. The proxy engine’s computation time is also not bad, around 100-300 microseconds. Again, this can be made faster by using less integration points, but already seems competitive enough for XVA simulations.

The proxy regression model deserves some comments. It is not just the Longstaff-Schwartz regression model, since this only looks at states where the exercise value is positive. This is essential for a good fit of the regression function (which I chose to be simply a quadratic function), since left from the exercise point the option value (which equals the continuation value in this area) flattens and would destroy the global fit.

What I did until now is just calibrate two separate models, one in the region corresponding to positive exercise values and another in the complementary region. This is similar to the regression model for the FX TaRF, where I also used two quadratic polynomials, if you remember (if not, you can read about it here or more detailled in the paper).

This solution is not perfect and can be improved probably. The following picture shows the regression models from the example. The different exercise dates are distinguished by color (0 denotes the first exercise date, in dark color, 1 the second, and so on until 9 which denotes the 10th exercise date, in bright color).

lsswaption

What you can see is the discontinuity at the cutoff point between exercise and non-exercise area. It is exactly this jump that one should work on I guess. But not now.

In technical terms I relied on Klaus’ implementation of the Longstaff Schwartz algorithm in longstaffschwarzpathpricer.hpp. This needed some amendments to allow for non-american call rights which it was designed for I believe. Apart of this I mainly inserted a hook in the pricer

virtual void post_processing(const Size i,
                             const std::vector<StateType> &state,
                             const std::vector<Real> &price,
                             const std::vector<Real> &exercise) {}

with an empty default implementation. This function is called during the calibration phase, passing the collected data to a potential consumer, which can do useful things with it like building a regression model for proxy pricing like in our use case here. To do this I derived from Klaus’ class as follows

class LongstaffSchwartzProxyPathPricer
    : public LongstaffSchwartzPathPricer<typename SingleVariate<>::path_type> {
  public:
    LongstaffSchwartzProxyPathPricer(
        const TimeGrid &times,
        const boost::shared_ptr<EarlyExercisePathPricer<PathType> > &pricer,
        const boost::shared_ptr<YieldTermStructure> &termStructure);

    const std::vector<boost::function1<Real, StateType> > basisSystem() const {
        return v_;
    }
    const std::vector<Array> coefficientsItm() const { return coeffItm_; }
    const std::vector<Array> coefficientsOtm() const { return coeffOtm_; }
    const StateType cutoff() const { return cutoff_; }

which offers inspectors for the two regression models coefficients, the cutoff point separating them and the underlying function system. The implementation of the hook function is simple

void LongstaffSchwartzProxyPathPricer::post_processing(
    const Size i, const std::vector<StateType> &state,
    const std::vector<Real> &price, const std::vector<Real> &exercise) {

    std::vector<StateType> x_itm, x_otm;
    std::vector<Real> y_itm, y_otm;

    cutoff_ = -QL_MAX_REAL;

    for (Size j = 0; j < state.size(); ++j) {
        if (exercise[j] > 0.0) {
            x_itm.push_back(state[j]);
            y_itm.push_back(price[j]);
        } else {
            x_otm.push_back(state[j]);
            y_otm.push_back(price[j]);
            if(state[j]>cutoff_)
                cutoff_ = state[j];
        }
    }

    if (v_.size() <= x_itm.size()) {
        coeffItm_[i - 1] =
            GeneralLinearLeastSquares(x_itm, y_itm, v_).coefficients();
    } else {
        // see longstaffschwartzpricer.hpp
        coeffItm_[i - 1] = Array(v_.size(), 0.0);
    }

    if (v_.size() <= x_otm.size()) {
        coeffOtm_[i - 1] =
            GeneralLinearLeastSquares(x_otm, y_otm, v_).coefficients();
    } else {
        // see longstaffschwartzpricer.hpp
        coeffOtm_[i - 1] = Array(v_.size(), 0.0);
    }
}

and does nothing more than setting up the two data sets on which we estimate the regression models.

The coefficients are read in the monte carlo engine and serve as the essential part in the proxy object which can then be passed to the proxy engine. And we are done. You can find the full example code here.

XVA for Bermudan Swaptions

Pricing Engine Design for exotic XVA simulations (using the example of FX TARFs)

In my previous post I wrote about some ideas to efficiently approximate the value of a fx exotic (a fx tarf in fact). One main motivation is to use such a fast pricing in a XVA simulation.

tarf3

This post is dedicated to the design I came up with to fit the idea as accurately as possible into the existing QuantLib architecture.

The next post will then present details about the approximation scheme itself, some numerical examples comparing the approximation with a full pricing under several market and time decay scenarios, and performance tests.

Good design is of utmost importance. Even in a prosperous neighbourhood a broken window, not swiftly repaired, will soon lead to a second and after a while a degenerate area. Luckily Luigi would not allow this to happen in QuantLib city of course.

These are my thoughts on the design in general:

  • we should have two pricing engines, one MC and one Proxy engine. We should not have only one engine with some additional methods to extract the approximate NPVs in a proprietary way
  • the proxy engine should behave just like any other engine, e.g. the time decay should consistently be deduced from the global evaluation date, not through some special parameter, and the relevant market data should be given by standard structures
  • the implementation of new instruments and pricing engines following the same idea should be easy and be based on a common interface
  • the end user interface used in client code should be easy to use and foolproof
  • XVA simulation is an application of the proxy engine, but there is no strict connection to this use case – you can also use the proxy engine “just” to compute npvs faster if high accuracy is not required

Or in short: There should be nothing special about the whole thing. Curb your enthusiasm, just do a plain solid job, don’t try to impress anyone. Okay. Let’s start with the instrument class.

FxTarf(const Schedule schedule, 
       const boost::shared_ptr<FxIndex> &index,
       const Real sourceNominal,
       const boost::shared_ptr<StrikedTypePayoff> &shortPositionPayoff,
       const boost::shared_ptr<StrikedTypePayoff> &longPositionPayoff,
       const Real target, 
       const CouponType couponType = capped,
       const Real shortPositionGearing = 1.0,
       const Real longPositionGearing = 1.0,
       const Handle<Quote> accumulatedAmount = Handle<Quote>(),
       const Handle<Quote> lastAmount = Handle<Quote>());

The constructor takes

  • the structure’s schedule (meaning the value dates),
  • the underlying fx index representing for example an ECB fx fixing (this is a new class too, because there doesn’t seem to be a fx index in QuantLib yet, but I do not go into details about that here),
  • the nominal of the structure (in foreign, asset or source currency, there are many names for that)
  • the counterparties’ payoff profiles, which are normally short puts and long calls, all sharing the same strike
  • the target level where the structure knocks out
  • the coupon type (see the previous post I linked above)
  • the gearing of the two sides of the deal

The last two parameters accumulatedAmount and lastAmount are optional. If not given, the FxTarf computes the already accumulated amount reading the index’s historic fixings.

If specified on the other hand, historic fixings are ignored and the given accumulated amount is used. The lastAmount in this context is needed only in case the last fixing already occured, but the associated payment is still in the future. The reason to introduce these somewhat redundant parameters is as follows. On one hand it may be handy not to set all historic fixings for the fx index, but directly set the accumulated amount. Maybe you get the deal information from some source system and it already provides the current accumulated amount along with the deal data. More importantly, during a XVA simulation you might not want to set all fixings in the IndexManager. You can do that, if you like, but it may not be handy, because after each path you have to erase all fixings again or maybe you do not simulate each fixing date even and just want to interpolate the accumulated fixings. In any case it is just a convenience parameter. Use it or just ignore it.

For a full pricing of the tarf we can use a monte carlo engine which is constructed in the usual way, for example

boost::shared_ptr<PricingEngine> mcEngine 
    = MakeMcFxTarfEngine<>(gkProcess)
      .withStepsPerYear(steps)
      .withSamples(samples)
      .withSeed(42)
      .withProxy();

The parameters here are the (generalized) Black Scholes process, the time steps to simulate per year, the seed for the RNG.

The last modifier .withProxy() is the only special thing here: By default the engine will just compute an npv (and error estimate) as any other mc engine does. If the engine is set up with the proxy flag on the other hand, additional information during the simulation is collected and analyzed to produce some proxy information object that can be used later for approximate pricings. We will see in a minute how.

We need this modifier because the simulation gets slower when creating the proxy, so we should be able to switch it off.

Now we set the engine and can compute the (full) npv:

tarf.setPricingEngine(mcEngine);
std::cout << "MCEngine NPV = " << tarf.NPV() << " error estimate = "
          << tarf.errorEstimate() << std::endl;

If we just want the proxy pricing, we can of course skip the full npv calculation, but it doesn’t hurt since the npv is always produced, even if only calculating the information for the proxy engine.

To use the proxy engine we have to start with an engine which can produce this piece of information. The proxy engine is then feeded with the proxy object:

boost::shared_ptr<PricingEngine> proxyEngine =
    boost::make_shared<ProxyFxTarfEngine>(tarf.proxy(), 
                                          fxQuote, 
                                          gkProcess->riskFreeRate());
tarf.setPricingEngine(proxyEngine);
std::cout << "ProxyEngine NPV = " << tarf.NPV() << std::endl;

The proxy engine is constructed with

  • the proxy description produced by the mc engine, which can be retrieved from the instrument by .proxy() (this is a special result, which seems important enough not to bury it in the additional results heap — this is nothing innovative though, it’s like the bps or fair rate for swaps as an example)
  • the fx quote used for the pricing – i.e. the essential market data needed for the proxy pricing
  • the discount curve for the pricing (which is taken as the domestic rate curve from our black scholes process in our example client code)

In addtion the global evaluation date will determine the reference date for the valuation, just as usual.

If the instrument does not provide a proxy object (e.g. because the mc engine was told so, see above), or if the proxy object is not suitable for the proxy engine to be constructed, an exception will be thrown.

What is going on behind the scenes: I added an interface for instruments that are capable of proxy pricing:

class ProxyInstrument {
  public:
    //! Base class proxy descriptions for approximate pricing engines
    struct ProxyDescription {
        // check if proxy description is valid
        virtual void validate() const = 0;
    };

    virtual boost::shared_ptr<ProxyDescription> proxy() const = 0;
};

The only method to implement is proxy which should return an object containing the information necessary for a compatible proxy engine to compute approximate npvs (see below for what compatible means). The information itself should be an object derived from ProxyInstrument::ProxyDescription. It has to provide a validate method that should check the provided data for consistency.

How is the fx tarf instrument implemented w.r.t. this interface:

class FxTarf : public Instrument, public ProxyInstrument {
  public:
    //! proxy description
    struct Proxy : ProxyDescription {
        struct ProxyFunction {
            virtual Real operator()(const Real spot) const = 0;
        };
        // maximum number of open fixings
        Size maxNumberOpenFixings;
        // last payment date, the npvs are forward npvs w.r.t. this date
        Date lastPaymentDate;
        // buckets for accumulated amonut, e.g.
        // 0.0, 0.1, 0.2, 0.3, 0.4 means
        // [0.0,0.1) has index 0
        // [0.1,0.2) has index 1
        // ...
        // [0.4,target] has index 4
        std::vector<Real> accBucketLimits;
        // proxy functions
        // first index is openFixings-1
        // second index is accAmountIndex
        // A function F should implement
        // operator()(Real spot) = npv
        std::vector<std::vector<boost::shared_ptr<ProxyFunction> > > functions;
        void validate() const {
            QL_REQUIRE(functions.size() == maxNumberOpenFixings,
                       "maximum number of open fixings ("
                           << maxNumberOpenFixings
                           << ") must be equal to function rows ("
                           << functions.size() << ")");
            for (Size i = 0; i < functions.size(); ++i) {
                QL_REQUIRE(functions[i].size() == accBucketLimits.size(),
                           "number of acc amount buckets ("
                               << accBucketLimits.size()
                               << ") must be equal to function columns ("
                               << functions[i].size() << ") in row " << i);
            }
        }
    };
/* ... */

This says that the specific (or one possible) proxy information of a fx tarf consists of some descriptive data, which is

  • the maximum number of open (future) fixings
  • the last payment date of the structure – see below
  • a bucketing of the accumulated amount

which together define a segmentation for the approximating function. On each segment the approximating function is then given by a Proxy::ProxyFunction which is at this level of abstraction just an arbitrary function Real to Real, returning the npv for a given fx spot. The npv is expressed on forward basis as of the last payment date of the structure, so that the proxy engine can discount back from this latest possible pricing date to the evaluation date. Remember that this is in general (and typically) different from the one used for the mc pricing.

The validate method checks if the function matrix is filled consistently with the segmentation information.

The proxy object is part of the results class for the instrument, which is again just using the standard formalism:

class FxTarf::results : public Instrument::results {
  public:
    void reset();
    boost::shared_ptr<FxTarf::Proxy> proxy;
};

The proxy engine expects a proxy object, which is checked to be one that is useful for engine is in the constructor, using a dynamic downcast.

ProxyFxTarfEngine(
    boost::shared_ptr<ProxyInstrument::ProxyDescription> proxy,
    Handle<Quote> exchangeRate, Handle<YieldTermStructure> discount)
    : FxTarfEngine(discount), exchangeRate_(exchangeRate) {
    registerWith(exchangeRate_);
    proxy_ = boost::dynamic_pointer_cast<FxTarf::Proxy>(proxy);

    QL_REQUIRE(proxy_, "no FxTarf::Proxy given");

}

The third level of specialization is in the monte carlo engine, where the specific proxy object’s function type is implemented, derived from the definitions in the FxTarf class:

template <class RNG = PseudoRandom, class S = Statistics>
class McFxTarfEngine : public FxTarfEngine,
                       public McSimulation<SingleVariate, RNG, S> {
  public:
    /*! proxy function giving a function spot => npv for one segment
        (bucket accumulated amount, number of open fixings)
        the function is given by two quadratic polynomials on intervals
        (-\infty,cutoff] and (cutoff,\infty).
        Only the ascending (long calls) or descending (long puts) branch
        is used and then extrapolated flat.
    */
    class QuadraticProxyFunction : public FxTarf::Proxy::ProxyFunction {
      public:
        QuadraticProxyFunction(Option::Type, const Real cutoff, const Real a1,
                               const Real b1, const Real c1, const Real a2,
                               const Real b2, const Real c2);
        Real operator()(const Real spot) const;

      private:
        Option::Type type_;
        const Real a1_, b1_, c1_, a2_, b2_, c2_;
        const Real cutoff_;
        int flatExtrapolationType1_,
            flatExtrapolationType2_; // +1 = right, -1 = left
        Real extrapolationPoint1_, extrapolationPoint2_;
    };

At this place we finally fix the specific form of the approximating functions, which are in essence given by two quadratic polynomials. I will give more details on and a motiviation for this in the next post.

Finally it appeared useful to derive the mc and proxy engines from a common base engine, which handles some trivial boundary cases (like all fixings are done or the determination of the npv of a fixed amount that is not yet settled), so we have the hierachy

class FxTarfEngine : public FxTarf::engine {}

template <class RNG = PseudoRandom, class S = Statistics>
class McFxTarfEngine : public FxTarfEngine,
                       public McSimulation<SingleVariate, RNG, S> {}

class ProxyFxTarfEngine : public FxTarfEngine {}

If you are interested in the code you can look into my repository. It may already work, but I did not test everything yet. More on this next week.

Pricing Engine Design for exotic XVA simulations (using the example of FX TARFs)

Fast Pricing of FX TARFs for XVA Calcuation

Let’s play a game. We flip a coin ten times (here is a particularly nice way to do this – you can take the Greek 2 Euro coin for example, it has Εὐρώπη (both the goddess and the continent) on it). If it is heads I pay you one Euro. If it is tails you pay me two. Oh and if you should win more than three times while we are playing we just stop the whole thing, ok ?

A fx tarf is a sequence of fx forward trades where our counterpart pays a strike rate. If the single forward is in favour of the counterpart she or he executes it on the structure’s nominal (so she or he is long a call). If it is in our favour we execute it on twice the nominal (so we are long a put on twice the nominal). And if the sum of the fixings in favour of the counterpart, \max( S_i - K, 0) with S_i denoting the fx fixing, exceeds a given target, the remaining forwards expire without further payments. In such structures there are several usances for the coupon fixing which triggers the target: Either the full amount for this fixing is paid, or only part of the coupon necessary to reach the target, or no coupon at all.

The valuation of fx tarfs in general depend on the fx smiles for each component’s fixing. The whole smile is important here: Both the strike of the trade and the target minus the accumulated amount are critical points on the smile obviously. Since the accumulated amount is itself a random quantity after the first fixing, the whole smile will affect the structure’s value. In addition the intertemporal correlations of the fx spot on the fixing dates play an important role.

In this and probably one or more later posts I want to write about several things:

  • how a classic, fully fledged monte carlo pricing engine can be implemented for this structure
  • how an approximate npv for market scenarios and time decay assumptions can be calculated very quickly
  • how this can be implemented in a second pricing engine and how this is related to the first engine
  • how QuantLib’s lucent-transparent design can be retained when doing all this

Obviously fast pricing is useful to fill the famous npv cube which can then be used to calculate XVA numbers like CVA, DVA etc.

Today’s post is dedicated to some thoughts on the methodology for fast, approximate pricings. I am heavily inspired by a talk of some Murex colleagues here who implemented similar ideas in their platform for CVA and potential future exposure calculations. Moreover the idea is related to the very classic and simple, but brilliant paper by Longstaff and Schwartz Valuing American Options by Simulation: A Simple Least-Squares Approach, but has a slightly different flavour here.

Let’s fix a specific tarf termsheet. The structure has payment dates starting on 15-Nov-2014, then monthly until 15-Oct-2015. The fx fixing is taken to be the ECB fixing for EUR-USD two business days prior to each payment date. The nominal is 100 million Euro. Our counterpart holds the calls, we the puts and our puts are on 200 million Euro so leveraged by a factor of two. The strike is 1.10, so the counterpart’s calls were in the money at trade date.

The valuation date is 28-Apr-2015 and the remaining target is 0.10. The fx spot as of the valuation date is 1.10. The implied volatility for EUR-USD fx options is 20% (lognormal, not yet a problem in fx markets 😉 …), constant over time and flat and we assume equal Euro and USD interest rates, so no drift in our underlying Garman-Kohlagen process. The payoff mode is full coupon. Of course the assumptions on market data are only partly realistic.

The idea to approximate npvs efficiently is as follows. First we do a full monte carlo pricing in the usual way. Each path generates a npv. We store the following information on each grid point of each path

( # open Fixings , fx spot, accumulated amount so far , npv of the remaining fixings )

The hope is then that we can do a regression analysis of the npv on these main price drivers, i.e.fx spot, the already accumulated amount and the number of open fixings.

Note that this approach implies that the fx spot is taken from the “outside” XVA scenario set, but everything else (the interest rate curves and the volatility) is implied by the pricing model. This is slightly (or heavily) inconsistent with a XVA scenario set where rate curves and maybe also the volatility structure is part of the scenarios.

Let’s fix the simplest case of only one open fixing left, i.e. we put ourselves at a point in time somewhere after the second but last and the last fixing. Also we set the target to +\infty (i.e. we ignore that feature) for the time being and assume a leverage of one. Our structure collapses to a single, vanilla fx forward. We do 250k monte carlo paths and plot the results (the npv is in percent here):

tarf1

Do you see what is happening ? We get a point cloud that – for fixed accumulated amount – conditioned on the spot averages averages to a line representing the fx forward npv. See below where I do this in 2d and where it gets clearer. What we note here as a first observation is that the position of the cloud depends on the accumulated amount: Lower spots are connected with lower accumulated amounts and higher spots with higher accumulated amounts. This is quite plausible, but has an impact on the areas where we have enough data to do a regression.

The next picture shows the same data but projecting along the accumulated amount dimension.

tarf2

Furthermore I added a linear regression line which should be able to predict the npv given a spot value. To test this I added three more horizontal lines that estimate the npv for spot values of 1.0, 1.1 and 1.2 by averaging over all generated monte carlo data within buckets [0.99,1.01], [1.09,1.11] and [1.19,1.21] respectively. The hope is that the horizontal lines intersect with the regression line at x-values of 1.0, 1.1 and 1.2. This looks quite good here.

Let’s look at a real TARF now, i.e. setting the target to 0.15.

tarf3

What is new here is that the cloud is cut at the target level, beyond collapsing simply to a plane indicating a zero npv. Quite clear, because in this area the structure is terminated before the last fixing.

Otherwise this case is not too different from the case before since we assume a full coupon payment and only have one fixing left, so we have a fx forward that might be killed by the target trigger before. More challenging is the case where we pay a capped coupon. Excluding data where the target was triggered before, in this case we get

tarf4

We want to approximate npvs for spots 1.0, 1.1 and 1.2 and an accumulated amount of 0.05 (bucketed by 0.04 to 0.06) now. The target feature introduces curvature in our cloud. I take this into account by fitting a quadratic polynominal instead of only a linear function.

Furthermore we see that the npvs are limited to 15 now and decreasing for higher spots. Why this latter thing ? Actually until now I only used the fixing times as simulation times (because only they are necessary for pricing and the process can take large steps due to its simplicity), so the spot is effectively the previous fixing always. And if this is above 1.1 it excludes the possibility for higher coupons than its difference to 1.1.

Let’s add more simulation times between the fixings (100 per year in total), as it is likely to be the case in the external XVA scenarios asking for npvs in the end:

tarf5

The approximation works quite ok for spot 1.0 but not to well for 1.1 and 1.2 any more (in both cases above). Up to now we have not used the accumulated amount in our npv approximation. So let’s restrict ourselves to accumulated amounts of e.g. 0.02 to 0.08 (remember that we want a prediction conditioned on an accumulated amount of 0.05, I choose a bigger bucket for the regression though to have “more” data and because I think I don’t want to compute too many regression functions on two little data sets in the end).

tarf6

Better. Let’s move to the original termsheet now (leverage 2, remaining target 0.1, full coupon) and to 5 open fixings instead of only 1 to see if all this breaks down in a more complex setting (my experience says, yes, that will happen). The accumulated amount we want to approximation for is now 0.01:

tarf7

Quite ok, puh. We see a new artefact now however: The quadratic regression function starts to fall again for spots bigger than 1.25. This is of course not sensible. So we have the necessity not only to compute different regression functions for different accumulated amounts (and different numbers of open fixings), but also for different spot regions. Let’s compute another quadratic regression for spots bigger than 1.2 for example (the blue graph):

tarf8

That would work for higher spots.

To summarize the experiments, the approach seems sensible in general, but we have to keep in mind a few things:

The number of time steps in the simulation should be larger than for pure pricing purposes, possibly the grid’s step size should be comparable to the XVA simulation.

The regression function can be assumed to be quadratic, but not globally. Instead the domain has to be partitioned by

  • the number of open fixings, possibly even
  • the number of open fixings and the distance to the last fixing,
  • the already accumulated amount
  • the fx spot

The next task would be to think of an algorithm that does a sensible partition automatically. One idea would be to require just a certain minimum percentage of the data generated by the initial monte carlo simulation for pricing available in each partition.

Next post will be on the implementation of the two pricing engines then !

Fast Pricing of FX TARFs for XVA Calcuation

The Pippi Langstrumpf Principle

I received some complaints that my blog lacks seriousness here and there (female compilers), contains misleading references (Bourianochevsky and Jarulamparabalam formula) or presents non-quant-related topics (Brad Mehldau’s CDs). The title of the blog itself is offensive. Quant work is neither fun nor a game. I apologize for these incidents. That won’t happen again.

Pippi Langstrumpf has many things in common with us quants.

Spot-images-pippi-longstocking-673299_404_303

She has superpowers. She is incredibly rich. And if she doesn’t like the real world as it is she just changes it. That’s what we do too. Pippi calls that “ich mach mir die Welt, widewide wie sie mir gefällt”.

For example we do not like negative rates, no problem we just pretend as if they were one percent (EUR swaptions as of April 2015) or three percent (now the shift for EUR caps!) higher as they actually are. That was a topic of one of my recent posts.

And we do other incredible things. We can change the probability of events. For example we can change the probability of winning in the lottery from way-too-low to say 90% or 95% (we don’t actually do this, because we are notoriously overpaid and have enough money already). That’s pretty close to Pippi “zwei mal drei macht vier”.

The only thing we can not do is turn impossible events into possible ones. All this is thanks to the Radon-Nikodym theorem, the Girsanov Theorem, the Novikov condition. All these names sound Russian, but Nikodym was Polish and Radon Austrian in fact.

This changing of measures is totally usual in pricing. But there are some places where the result of a computation still depends on the choice of measure. One such place is the computation of potential future exposure. Another is the extraction of probabilities for the exercise of calls.

I would like to illustrate the latter point. Let’s fix a setting and write

\nu = P(0,T) E^T \left( \frac{\Pi(t)}{P(t,T)} \right)

as a pricing formula under the so called T-forward-measure for a payoff \Pi. What is always happening in pricing is that you express the value of the payoff in terms of a numeraire, you choose a measure that makes this expression a martingale and then take the expectation to get the price. Again in terms of the numeraire, so you have to multiply by the numeraire to get the price in the usual metric. The numeraire in the T-forward-measure is the zero bond with maturity T.

Now assume \Pi represents a long option with expiry t. Sometimes people compute

p := P^T( \Pi(t) > 0)

and label this exercise probability for the option. Does that make sense? Example:

We consider the valuation date 30-04-2013 and price an european at the money payer swaption with expiry in 5 years and underlying swap of 10 years.
The implied volatility is 20\% and the yield curve is flat forward at 2\% (continously compounded rate). The mean reversion in a Hull White
model is set to 1\% and the volatility is calibrated to 0.00422515 in order to reprice the market swaption.

Now we compute p for different choices of T. We get 48% for T=16, 36% for T=60, 21% for T=250. Of course we use QuantLib for this purpose. I will not give code examples today…

Obviously the measures associated to each T are very different.

One can try to fix this. We define a measure-invariant exercise probability as

\pi := \frac{N(0)}{P(0,t)} E^N \left( \frac{1_{\Pi(t)>0}}{N(t)} \right)

which is a forward digital price for a digital triggered by \Pi(t) > 0. This formula is for an arbitraty numeraire N. The naive exercise probability in this notation would be

p = p_N := E^N(1_{\Pi(t)>0})

on the other hand, a quantity dependent on N as denoted by the subscript. Actually both notions coincide for the special choice of the t-forward-measure:

\pi = E^t ( 1_{\Pi(t)>0} )

We compare \pi and p for different T-forward-measure in the example above.

  T      \nu         p       \pi
 16  0.0290412  0.478684  0.481185 
 30  0.0290385  0.435909  0.484888 
 60  0.0290381  0.364131  0.480800 
100  0.0290412  0.30056   0.480948 
250  0.0290415  0.211921  0.479532

First of all, \nu is the same up to numerical differences under all pricing measures, as expected. The naive exercise probability p on the other hand, is obviously not well defined. And different measures do not imply slight differences in the results, they are completely different. The alternative definition \pi again is measure independent.

The following picture shows the underlying npv as a function of the Hull White’s state variable (expressed in normalized form, i.e. with zero expectation and unit variance).

Screenshot from 2015-04-26 19:21:07

For bigger T the payoff is shifted to the right, and the option is exercised less often (only for bigger z’s).

How is that made up for in pricing. This picture shows the factor that is multiplied with \Pi before taking the expectation to get the final NPV.

Screenshot from 2015-04-26 19:21:30

The correction is rather mild for T=16 i.e. around 1. For T=250 on the other hand model states belonging to bigger z’s are weighted with much higher factors.

So can we take \pi instead of p as a definition for the exercise probability ? Not really, because for a bermudan option the exercise probabilities for the different expiries and the probability for no exercise at all will not add up to one. This is evident as well, because we use different measures for the single expiries namely the t-forward-measures (using the representation for \pi as the t-forward p above).

Sure we could fix that as well by renormalizing the vector of p‘s again. But I would just stop here for the moment. Similarly a potential future exposure defined as the quantile of an underlying npv distribution will not be well defined under pricing measures.

The Pippi Langstrumpf Principle

Adjoint Greeks IV – Exotics

Today I feature a sequel to the adjoint greeks drama (see my prior posts on this).

Before I start I would like to point you to a new and excellent blog authored by my colleague Matthias https://ipythonquant.wordpress.com/. You will want to follow his posts, I am certain about that.

I am still on my way to convert the essential parts of the library to a template version. Since this is boring work I created a small friend who helps me. No I did not go mad. It is a semi-intelligent emacs-lisp script that does some regular expression based search-and-replacements which speeds up the conversion a lot. Emacs is so cool.

Today I am going to calculate some derivatives of a bermudan swaption in the Gsr / Hull White model. This is the first post-vanilla application and admittedly I am glad that it works at last.

One point I have to make today is that AD is slow. Or to put it differently, doubles can be tremendously fast. Lookit here:

void multiply(T *a, T *b, T *s) {
    for (int i = 0; i < N; ++i) {
        for (int k = 0; k < N; ++k) {
            for (int j = 0; j < N; ++j) {
                s[i * N + j] += a[i * N + k] * b[k * N + j];
            }
        }
    }
}

This code multiplies two matrices a and b and stores the result in s. It is the same algorithm as implemented in QuantLib for the Matrix class.

When feeding two 1000 x 1000 double matrices the code runs 1.3s on my laptop if compiled with gcc 4.8.2 using -O1. With -O3 I get 950ms. With clang 3.7.0 (fresh from the trunk) I get 960ms (-O1) and 630ms (-O3). With T=CppAD::AD on the other hand the running time goes up to at least (-O3) 10.9s (gcc) and 14.3s (clang). Code optimization seems to be a delicate business.

These timings refer to the situation where we use the AD type without taping, i.e. only as a wrapper for the double in it. This seems to indicate that at least for specific procedures it is not advisable to globally replace doubles by their active counterpart if one is not really interested in taping their operations and calculating the derivatives. Performance may break down dramatically.

What is the reason behind the huge difference ? I am not really the right person to analyze this in great detail. The only thing I spotted when looking into the generated assembler code is that with double there are SIMD (single instruction multiple data) instructions for adding and multiplying in the nested loops (like addpd and mulpd, it’s a long time since I programmed in assembler and it was on a 6510 so I am not really able to read a modern x86 assembler file).

With CppAD::AD there doesn’t seem to be such instructions around. So part of the perfomance loss may be due to the inability of streaming AD calculations. For sure this is not the only point here.

The second point to make today is that AD has pitfalls that may in the end lead to wrong results, if one uses the AD framework blindly. Let’s come back to our specific example. The underlying source code can be found here if you are interested or want to run it by yourself.

It is a bermudan swaption, ten years with yearly exercise dates. The model for pricing will be the Gsr or Hull White model. We just want to compute the bucket vegas of the bermudan, i.e. the change in its NPV when the implied market volatility of the canonical european swaptions used for the model calibration is increased by one percent.

The rate level is set to 3\% and the strike is out of the money at 5\%. The volatilities are produced by SABR parameters \alpha=3\%, \beta=60\%, \nu=12\%, \rho=30\%. All of this is arbitrary, unrealistic and only for explanatory purposes …

The naive AD way of doing this would be to declare the input implied volatilities as our quantities of interest

   CppAD::Independent(inputVolAD);

and then go through the whole model calibration

   gsrAD->calibrateVolatilitiesIterative(
                  basketAD, methodAD, ecAD);

and pricing

   yAD[0] = swaptionAD->NPV();

to get the derivative d bermudan / d impliedVol

    CppAD::ADFun<Real> f(inputVolAD, yAD);
    std::vector<Real> vega(sigmasAD.size()), w(1, 1.0);
    vega = f.Reverse(1, w);

When I first wrote about AD I found it extremely attractive and magical that you could compute your sensitivities like this. That you can actually differentiate a zero search (like in the case of yield curve bootstrapping) or an optimization (like the Levenberg-Marquardt algorithm which is used here).

However there are dark sides to this simplicity, too. Performance is one thing. The calibration step and the pricing including the gradient calculation takes

AD model calibration = 1.32s
AD pricing+deltas    = 7.11s

We can also do it differently, namely calibrate the model in an ordinary way (using ordinary doubles), then compute the sensitivity of the bermudan to the model’s sigma and additionally compute the sensitivity of the calibration instruments to the model’s sigma. Putting everything together yields the bermudan’s bucketed vega again. I will demonstrate how below. First I report the computation time for this approach:

model calibration = 0.40s
AD pricing+deltas = 5.95s
additional stuff  = 0.97s

This leaves us with a performance gain of around 15 percent (7.32s vs 8.43s). This is not really dramatic, still significant. And there is another good reason to separate the calibration from the greek calculation which I will come to below.

Note also, that the pricing which takes 5.95s with AD (including derivatives) is much faster without AD where it only consumes 0.073s. This is a factor of 80 which is much worse than the theoretical factor of 4 to 5 mentioned in earlier posts (remember, we saw 4.5 for plain vanilla interest rate swap npv and delta computation). This is again due to optimization issues obviously.

The background here is that the swaption pricing engine uses cubic spline interpolation and closed-form integration of the resulting cubic polynominals against the normal density for the roll back. Again a lot of elementary calculations, not separated by OO-things that would hinder the compiler from low level optimizations. You would surely need quite a number of sensitivities to still get a performance gain.

But lets follow the path to compute the bucket vegas further down. First I print the bucket vegas coming from the naive AD way above – direct differentiation by the input implied volatilities with the operation sequence going through the whole model calibration and the pricing.

vega #0: 0.0153%
vega #1: 0.0238%
vega #2: 0.0263%
vega #3: 0.0207%
vega #4: 0.0209%
vega #5: 0.0185%
vega #6: 0.0140%
vega #7: 0.0124%
vega #8: 0.0087%

This looks plausible. We started the calibration from a constant sigma function at 1\%. This is actually far away from the target value (which is around 0.40\%), so the optimizer can walk around before he settles at the minimum. But we could have started with a sigma very close to the optimal solution. What would happen then ? With the target value as an initial guess (which is unlikely to have in reality, sure) we get

vega #0: 0.0000%
vega #1: 0.0238%
vega #2: 0.0263%
vega #3: 0.0207%
vega #4: 0.0209%
vega #5: 0.0448%
vega #6: 0.0000%
vega #7: 0.0000%
vega #8: 0.0000%

Some vegas are zero now. vega #5 is even completely different from the value before. This is a no go, because in a productive application you wouldn’t notice if some deals are contributing a zero sensitivity or a false value.

What is happening here is that the function we differentiate depends on more input variables than only the primary variables of interest (the implied vol), like the initial guess for the optimization. These might alter the derivative drastically even if the function value (which is the model’s sigma function on an intermediate level, or ultimately the bermudan’s npv) stays the same.

For example if the initial guess is so good that the optimizers tolerance is already satisfied with it, the output will stay the same, no matter if the input is pertubed by an infinitesimal shift dx. Here pertubed does not really mean bumped and dx really is infinitesimal small. It is only a concept to get a better intuition of what is going on during the process of automatic differentiation.

Another example is the bisection method for zero searching. This method will always yield zero AD derivatives in the sense if x is an input (e.g. a vector of swap quotes) and y is the output (e.g. a vector of zero rates) linked by a relation

f(x,y) = 0

the iff x is pertubed by dx then the checks in the bisection algorithm will yield exactly the same results for x+dx as for x. Therefore the computed y will be exactly the same, thus the derivative zero.

I don’t know if this explanation is good, but this is how I picture it for myself. It seems to be like this: If it feels too magical what you do with AD, you better don’t do it.

What is the way out ? We just avoid all optimization and zero searches, if possible. And it is for interest rate deltas and vegas: Just calibrate your curve in the usual way, then apply AD to compute sensitivities to zero rates (which then does not involve a zero search any more).

If you want market rate deltas, compute the Jacobian matrix of the market instruments in the curve by the zero rates as well, invert it and multiply it with the zero delta vector. You do not even have to invert it, but just only have to solve one linear equation system. We make this procedure explicit with our example above.

The first step would be to compute d bermudan NPV / d model’s sigma. This is done on the already calibrated model. Technically we separate the calibration step (done with the usual doubles) and the AD step (done on a copy of the model feeded with the model parameters from the first model, but not itself calibrated again). This is what we get for d bermudan / d sigma

vega #0: 1.6991
vega #1: 1.2209
vega #2: 0.8478
vega #3: 0.5636
vega #4: 0.4005
vega #5: 0.2645
vega #6: 0.1573
vega #7: 0.0885
vega #8: 0.0347

Next we compute the Jacobian d helperNpv / d sigma

297.4%	0.0%	0.0%	0.0%	0.0%	0.0%	0.0%	0.0%	0.0%
179.7%	183.3%	0.0%	0.0%	0.0%	0.0%	0.0%	0.0%	0.0%
126.9%	129.4%	132.2%	0.0%	0.0%	0.0%	0.0%	0.0%	0.0%
90.8%	92.6%	94.6%	96.7%	0.0%	0.0%	0.0%	0.0%	0.0%
64.9%	66.2%	67.6%	69.1%	71.2%	0.0%	0.0%	0.0%	0.0%
47.5%	48.4%	49.5%	50.6%	52.1%	53.3%	0.0%	0.0%	0.0%
31.9%	32.6%	33.3%	34.0%	35.0%	35.9%	36.3%	0.0%	0.0%
19.2%	19.6%	20.0%	20.4%	21.0%	21.5%	21.8%	22.3%	0.0%
8.7%	8.9%	9.0%	9.2%	9.5%	9.8%	9.9%	10.1%	10.5%

We also use AD for this, CppAD comes with a driver routine that reads

helperVega = f.Jacobian(sigmas);

This is a lower triangular matrix, because the ith calibration helper depends only on the sigma function up to its expiry time. The inverse of this matrix is also interesting, although we wouldn’t need it in its full beauty for our vega calculation, representing d sigma / d helperNpv

33.6%	0.0%	0.0%	0.0%	0.0%	0.0%	0.0%	0.0%	0.0%
-33.0%	54.6%	0.0%	0.0%	0.0%	0.0%	0.0%	0.0%	0.0%
0.0%	-53.4%	75.6%	0.0%	0.0%	0.0%	0.0%	0.0%	0.0%
0.0%	0.0%	-74.0%	103.5%	0.0%	0.0%	0.0%	0.0%	0.0%
0.0%	0.0%	0.0%	-100.4%	140.4%	0.0%	0.0%	0.0%	0.0%
0.0%	0.0%	0.0%	0.0%	-137.2%	187.5%	0.0%	0.0%	0.0%
0.0%	0.0%	0.0%	0.0%	0.0%	-185.1%	275.3%	0.0%	0.0%
0.0%	0.0%	0.0%	0.0%	0.0%	0.0%	-269.1%	448.1%	0.0%
0.0%	0.0%	0.0%	0.0%	0.0%	0.0%	0.0%	-431.6%	953.1%

This says how the different sigma steps go up when the helper belonging to the step goes up (these are the positive diagonal elements) and how the next sigma step would need to go down when the same helper as before goes up, but the next helper stays the same. The contra-movement is roughly by the same amount.

Next we need d bermudan / d sigma. AD delivers

vega #1: 1.6991
vega #2: 1.2209
vega #3: 0.8478
vega #4: 0.5636
vega #5: 0.4005
vega #6: 0.2645
vega #7: 0.1573
vega #8: 0.0885
vega #9: 0.0347

And the last ingredient would be the calibration helpers’ vegas, which can be computed analytically (this is the npv change when shifting the input market volatility by one percent up), d helperNpv / d marketVol

vega #1: 0.0904%
vega #2: 0.1117%
vega #2: 0.1175%
vega #3: 0.1143%
vega #4: 0.1047%
vega #5: 0.0903%
vega #6: 0.0720%
vega #7: 0.0504%
vega #8: 0.0263%

Now multiplying everything together gives

d bermudan / d impliedVol = d bermudan / d sigma x d sigma / d helperNpv x d helperNpv / d impliedVol

which is

vega #0: 0.0153%
vega #1: 0.0238%
vega #2: 0.0263%
vega #3: 0.0207%
vega #4: 0.0209%
vega #5: 0.0185%
vega #6: 0.0140%
vega #7: 0.0124%
vega #8: 0.0087%

same as above. Partly because I just copy-pasted it here. But it actually comes out of the code also, just try it.

Adjoint Greeks IV – Exotics

Gaussian Models

This is going to be a guided tour through some example code I wrote to illustrate the usage of the Markov Functional and Gsr (a.k.a. Hull White) model implementations.

Hull White / Gsr is without any doubt the bread and butter model for rates. It calibrates to a given series of vanilla instruments, it has a parameter (the mean reversion) to control intertemporal correlations (which is important both for bermudan pricing and time travelling), but you can not alter its “factory settings” regarding the smile. At least it is not flat but a skew. Not unrealistic from a qualitative standpoint, but you would have to be lucky to match the market skew decently of course. Markov Functional on the other hand mimics any given smile termstructure exactly as long as it is arbitrage free.

Recently I added Hagan’s internal adjusters to the Gsr implementation, trying to make up for the comparative disadvantage. I coming back to them at the end of this article. Internal adjusters here is in distinction to external adjusters of course, which I am working on as well. More on them later.

Let’s delve into the example. I do not reproduce the whole code here, just the main points of interest. You can find the full example in one of the recent releases of QuantLib under Examples / Gaussian1dModels.

First we set the global evaluation date.

  Date refDate(30, April, 2014);
  Settings::instance().evaluationDate() = refDate;

The rate curves will be flat, but we assume basis spreads to demonstrate that the models can handle them in a decent way.

  Real forward6mLevel = 0.025;
  Real oisLevel = 0.02;

I will omit the code to set up the quotes and termstructures here. Swaption volatilities are chosen flat as well

  Real volLevel = 0.20;

Now we set up a deal that we can price later on. The underlying is a standard vanilla spot starting swap with 4\% fixed rate against Euribor 6M.

  Real strike = 0.04;
  boost::shared_ptr<NonstandardSwap> underlying =
    boost::make_shared<NonstandardSwap>(VanillaSwap(
      VanillaSwap::Payer, 1.0, fixedSchedule, strike, 
      Thirty360(),
      floatingSchedule, euribor6m, 0.00, Actual360()));

Of course there is a reason that we use a NonstandardSwap instead of a VanillaSwap. You will see later.

We define a bermudan swaption on that underlying with yearly exercise dates, where the notification of a call should be given two TARGET days before the next accrual start period.

 boost::shared_ptr<Exercise> exercise =
    boost::make_shared<BermudanExercise>(
                            exerciseDates, false);
 boost::shared_ptr<NonstandardSwaption> swaption =
    boost::make_shared<NonstandardSwaption>(
                            underlying, exercise);

To set up the Gsr model we need to define the grid on which the model volatility is piecewise constant. Since we want to match the market quotes for european calls later on we chose the grid points identical to the exercise dates, except that we do not need a step at the last exercise date obviously. The initial model volatility is set to 1\%

        std::vector<Date> stepDates(exerciseDates.begin(),
                                    exerciseDates.end() - 1);
        std::vector<Real> sigmas(stepDates.size() + 1, 0.01);

The reversion speed is 1\% as well.

        Real reversion = 0.01;

And we are ready to define the model!

  boost::shared_ptr<Gsr> gsr = boost::make_shared<Gsr>(
        yts6m, stepDates, sigmas, reversion);

We will need a swaption engine for calibration

 boost::shared_ptr<PricingEngine> swaptionEngine =
     boost::make_shared<Gaussian1dSwaptionEngine>(
                                    gsr, 64, 7.0, true,
                                    false, ytsOis);

Normally it is enough to pass the model gsr. The 64 and 7.0 are the default parameters for the numerical integration scheme that is used by the engine as well as true and false indicating that the payoff should be extrapolated outside the integration domain in a non-flat manner (this is not really important here). The last parameter denotes the discounting curve that should be used for the swaption valuation. Note that this is different from the model’s “main” yield curve, which is the Eurior 6M forward curve (see above).

We set up a second engine for our instrument we want to price.

boost::shared_ptr<PricingEngine> 
 nonstandardSwaptionEngine =
  boost::make_shared<Gaussian1dNonstandardSwaptionEngine>(
       gsr, 64, 7.0, true, false, Handle<Quote>(), ytsOis);

On top of the parameters from above, we have an empty quote here. This can be used to introduce a flat credit termstructure into the pricing. We will see later how to use this in exotic bond valuations. For the moment it is just empty, so ignored.

Now we assign the engine to our bermudan swaption

  swaption->setPricingEngine(nonstandardSwaptionEngine);

How do we calibrate our Gsr model to price this swaption ? Actually there are some handy methods thanks to the fact that we chose an engine which implements the BasketGeneratingEngine interface, so we can just say

std::vector<boost::shared_ptr<CalibrationHelper> > basket =
     swaption->calibrationBasket(swapBase, *swaptionVol,
                             BasketGeneratingEngine::Naive);

to get a coterminal basket of at the money swaptions fitting the date schedules of our deal. The swapBase here encodes the conventions for standard market instruments. The last parameter Naive tells the engine just to take the exercise dates of the deal and the maturity date of the underlying and create at the money swaptions from it using the standard market conventions.

We can do more involved things and we will below: as soon as the deal specifics are not matching the standard market swaption conventions, we can choose an adjusted basket of calibration instruments! This can be a little thing like five instead of two notification dates for a call, different day count conventions on the legs, a non-yearly fixed leg payment frequency, or bigger things like a different Euribor index, an amortizing notional schedule and so on. I wrote a short note some time ago on this which you can get here if you are interested.

In any case the naive basket looks like this:

Expiry              Maturity            Nominal             Rate          Pay/Rec     Market ivol   
==================================================================================================
April 30th, 2015    May 6th, 2024       1.000000            0.025307      Receiver    0.200000      
May 3rd, 2016       May 6th, 2024       1.000000            0.025300      Receiver    0.200000      
May 3rd, 2017       May 6th, 2024       1.000000            0.025303      Receiver    0.200000      
May 3rd, 2018       May 6th, 2024       1.000000            0.025306      Receiver    0.200000      
May 2nd, 2019       May 6th, 2024       1.000000            0.025311      Receiver    0.200000      
April 30th, 2020    May 6th, 2024       1.000000            0.025300      Receiver    0.200000      
May 3rd, 2021       May 6th, 2024       1.000000            0.025306      Receiver    0.200000      
May 3rd, 2022       May 6th, 2024       1.000000            0.025318      Receiver    0.200000      
May 3rd, 2023       May 6th, 2024       1.000000            0.025353      Receiver    0.200000      

The calibration of the model to this basket is done via

   gsr->calibrateVolatilitiesIterative(basket, method, ec);

where method and ec are an optimization method (Levenberg-Marquardt in our case) and end criteria for the optimization. I should note that the calibration method is not the default one defined in CalibratedModel, which does a global optimization on all instruments, but a serialized version calibrating one step of the sigma function to one instrument at a time, which is much faster.

Here is the result of the calibration.

Expiry              Model sigma   Model price         market price        Model ivol    Market ivol   
====================================================================================================
April 30th, 2015    0.005178      0.016111            0.016111            0.199999      0.200000      
May 3rd, 2016       0.005156      0.020062            0.020062            0.200000      0.200000      
May 3rd, 2017       0.005149      0.021229            0.021229            0.200000      0.200000      
May 3rd, 2018       0.005129      0.020738            0.020738            0.200000      0.200000      
May 2nd, 2019       0.005132      0.019096            0.019096            0.200000      0.200000      
April 30th, 2020    0.005074      0.016537            0.016537            0.200000      0.200000      
May 3rd, 2021       0.005091      0.013253            0.013253            0.200000      0.200000      
May 3rd, 2022       0.005097      0.009342            0.009342            0.200000      0.200000      
May 3rd, 2023       0.005001      0.004910            0.004910            0.200000      0.200000      

and the price of our swaption, retrieved in the QuantLib standard way,

Real npv = swaption->NPV();

is around 38 basispoints.

Bermudan swaption NPV (ATM calibrated GSR) = 0.003808

Now let’s come back to what I mentioned above. Actually the european call rights are not exactly matching the atm swaptions we used for calibration. Namely our underlying swap is not atm, but has a fixed rate of 4\%. So we should use an apdapted basket. Of course in this case you can guess what one should take, but I will use the general machinery to make it trustworthy. The adapted basket can be retrieved by

  basket = swaption->calibrationBasket(
       swapBase, *swaptionVol,
       BasketGeneratingEngine::MaturityStrikeByDeltaGamma);

with the parameter MaturityStrikeByDeltaGamma indicating that the market swaptions for calibration are chosen from the set of all possible market swaptions (defined by the swapBase, remember ?) by an optimization of the nominal, strike and maturity as the remaining free parameters such that the zeroth, first and second order derivatives of the exotic’s underlying by the model’s state variable (evaluated at some suitable central point) are matched.

To put it differently, per expiry we seek a market underlying that in all states of the world (here for all values of the state variable of our model) has the same value as the exotic underlying we wish to price. To get this we match the Taylor expansions up to order two of our exotic and market underlying.

Let’s see what this gets us in our four percent strike swaption case. The calibration basket becomes:

Expiry              Maturity            Nominal             Rate          Pay/Rec     Market ivol   
==================================================================================================
April 30th, 2015    May 6th, 2024       0.999995            0.040000      Payer       0.200000      
May 3rd, 2016       May 6th, 2024       1.000009            0.040000      Payer       0.200000      
May 3rd, 2017       May 6th, 2024       1.000000            0.040000      Payer       0.200000      
May 3rd, 2018       May 7th, 2024       0.999953            0.040000      Payer       0.200000      
May 2nd, 2019       May 6th, 2024       0.999927            0.040000      Payer       0.200000      
April 30th, 2020    May 6th, 2024       0.999996            0.040000      Payer       0.200000      
May 3rd, 2021       May 6th, 2024       1.000003            0.040000      Payer       0.200000      
May 3rd, 2022       May 6th, 2024       0.999997            0.040000      Payer       0.200000      
May 3rd, 2023       May 6th, 2024       1.000002            0.040000      Payer       0.200000      

As you can see the calibrated rate for the market swaption is 4\% as expected. What you can also see is that payer swaptions were generated. This is because always out of the money options are chosen to be calibration instruments for the usual reason. The nominal is slightly different from 1.0, but practically did not change. This is more to prove that some numerical procedure worked for you here.

Recalibrating the model to the new basket gives

Expiry              Model sigma   Model price         market price        Model ivol    Market ivol   
====================================================================================================
April 30th, 2015    0.006508      0.000191            0.000191            0.200000      0.200000      
May 3rd, 2016       0.006502      0.001412            0.001412            0.200000      0.200000      
May 3rd, 2017       0.006480      0.002905            0.002905            0.200000      0.200000      
May 3rd, 2018       0.006464      0.004091            0.004091            0.200000      0.200000      
May 2nd, 2019       0.006422      0.004766            0.004766            0.200000      0.200000      
April 30th, 2020    0.006445      0.004869            0.004869            0.200000      0.200000      
May 3rd, 2021       0.006433      0.004433            0.004433            0.200000      0.200000      
May 3rd, 2022       0.006332      0.003454            0.003454            0.200000      0.200000      
May 3rd, 2023       0.006295      0.001973            0.001973            0.200000      0.200000      

indeed different, and the option price

Bermudan swaption NPV (deal strike calibrated GSR) = 0.007627

almost doubled from 38 to 76 basispoints. Well actually it more than doubled. Whatever. Puzzle: We did not define a smile for our market swaption surface. So it shouldn’t matter which strike we choose for the calibration instrument, should it ?

There are other applications of the delta-gamma-method. For example we can use an amortizing nominal going linear from 1.0 to 0.1. The calibration basket then becomes

Expiry              Maturity            Nominal             Rate          Pay/Rec     Market ivol   
==================================================================================================
April 30th, 2015    August 5th, 2021    0.719236            0.039997      Payer       0.200000      
May 3rd, 2016       December 6th, 2021  0.641966            0.040003      Payer       0.200000      
May 3rd, 2017       May 5th, 2022       0.564404            0.040005      Payer       0.200000      
May 3rd, 2018       September 7th, 2022 0.486534            0.040004      Payer       0.200000      
May 2nd, 2019       January 6th, 2023   0.409763            0.040008      Payer       0.200000      
April 30th, 2020    May 5th, 2023       0.334098            0.039994      Payer       0.200000      
May 3rd, 2021       September 5th, 2023 0.255759            0.039995      Payer       0.200000      
May 3rd, 2022       January 5th, 2024   0.177041            0.040031      Payer       0.200000      
May 3rd, 2023       May 6th, 2024       0.100000            0.040000      Payer       0.200000      

First of all, the nominal of the swaptions is adjusted to the amortizing schedule, being some average over the coming periods respectively. Furthermore, the effective maturity is reduced.

As a side note. The nominal is of course not relevant at all for the calibration step. It does not matter, if you calibrate to a swaption with nominal 1.0 or 0.1 or 100000000.0. But it is a nice piece of information as in the last example anyhow, to see if it is plausible what happens.

Now consider a callable bond. You can set this up as a swap, too, with one zero leg and final notional exchange. The NonStandardSwap allows for all this. The exercise has to be extended to carry a rebate payment reflecting the notional reimbursement in case of exercise. This is handled by the RebatedExercise extension. The delta-gamma calibration basket now looks as follows.

Expiry              Maturity            Nominal             Rate          Pay/Rec     Market ivol   
==================================================================================================
April 30th, 2015    April 5th, 2024     0.984093            0.039952      Payer       0.200000      
May 3rd, 2016       April 5th, 2024     0.985539            0.039952      Payer       0.200000      
May 3rd, 2017       May 6th, 2024       0.987068            0.039952      Payer       0.200000      
May 3rd, 2018       May 7th, 2024       0.988455            0.039952      Payer       0.200000      
May 2nd, 2019       May 6th, 2024       0.990023            0.039952      Payer       0.200000      
April 30th, 2020    May 6th, 2024       0.991622            0.039951      Payer       0.200000      
May 3rd, 2021       May 6th, 2024       0.993111            0.039951      Payer       0.200000      
May 3rd, 2022       May 6th, 2024       0.994190            0.039952      Payer       0.200000      
May 3rd, 2023       May 6th, 2024       0.996715            0.039949      Payer       0.200000      

The notionals are slightly below 1.0 (as well as the maturities and strikes not exactly matching the bermudan swaption case). This is expected however, since the market swaptions are discounted on OIS level, while for the bond we chose to use the 6m curve as a benchmark discounting curve. The effect is small however. Put 6m as discounting to cross check this.

What is more interesting is to assume a positive credit spread. Let’s set this to 100 basispoints for example. The spread is interpreted as an option adjusted spread, continuously compounded with Actual365Fixed day count convention. The calibration basket gets

Expiry              Maturity            Nominal             Rate          Pay/Rec     Market ivol   
==================================================================================================
April 30th, 2015    February 5th, 2024  0.961289            0.029608      Payer       0.200000      
May 3rd, 2016       March 5th, 2024     0.965356            0.029605      Payer       0.200000      
May 3rd, 2017       April 5th, 2024     0.969520            0.029608      Payer       0.200000      
May 3rd, 2018       April 8th, 2024     0.973629            0.029610      Payer       0.200000      
May 2nd, 2019       April 8th, 2024     0.978124            0.029608      Payer       0.200000      
April 30th, 2020    May 6th, 2024       0.982682            0.029612      Payer       0.200000      
May 3rd, 2021       May 6th, 2024       0.987316            0.029609      Payer       0.200000      
May 3rd, 2022       May 6th, 2024       0.991365            0.029603      Payer       0.200000      
May 3rd, 2023       May 6th, 2024       0.996646            0.029586      Payer       0.200000      

Look what the rate is doing. It is adjusted by roughly the credit spread. Again this is natural, since the hedge swaption for the bond’s call right would have roughly 100 basispoints margin on the float side. Here it is coming automatically out of our optimization procedure.

Let’s come to our final example. The underlying is a swap exchanging a CMS10y rate against Euribor 6M. To make the numbers a bit nicer I changed the original example code to include a 10 basispoint margin on the Euribor leg.

We start with the underlying price retrieved from a replication approach. I am using the LinearTsrPricer here, with the same mean reversion as for the Gsr model above. The pricing is

Underlying CMS     Swap NPV = 0.004447
           CMS     Leg  NPV = -0.231736
           Euribor Leg  NPV = 0.236183

so 44.5 basispoints. Now we consider a bermudan swaption (as above, with yearly exercises) on this underlying. A naively calibrated Gsr model yields

Float swaption NPV (GSR) = 0.004291
Float swap     NPV (GSR) = 0.005250

The npv of the option is 42.9 basispoints. The underlying price, which can be retrieved as an additional result from the engine as follows

swaption4->result<Real>("underlyingValue")

is 52.5 basispoints. Please note that the option has its first exercise in one year time, so the first year’s coupons are not included in the exercised into deal. This is why the underlying price is higher than the option value.

What do we see here: The Gsr model is not able to price the underlying swap correctly, the price is around 8 basispoints higher than in the analytical pricer. This is because of the missing smile fit (in our example the fit to a flat smile, which the Gsr can not do). The Markov Functional model on the other hand can exactly do this. We can calibrate the numeraire of the model such that the market swaption surface is reproduced on the fixing dates of the CMS coupons for swaptions with 10y maturity. The goal is to get a better match with the replication price. Let’s go: The model is set up like this

  boost::shared_ptr<MarkovFunctional> markov =
      boost::make_shared<MarkovFunctional>(
          yts6m, reversion, markovStepDates, 
          markovSigmas, swaptionVol,
          cmsFixingDates, tenors, swapBase,
   MarkovFunctional::ModelSettings()
                        .withYGridPoints(16));

It is not that different from the Gsr model construction. We just have to provide the CMS coupons’ fixing dates and tenors (and the conventions of the swaptions behind), so that we can calibrate to the corresponding smiles. The last parameter is optional and overwrites some numerical parameter with a more relaxed value, so that the whole thing works a bit faster in our example. Ok, what does the Markov model spit out:

Float swaption NPV (Markov) = 0.003549
Float swap NPV (Markov)     = 0.004301

The underlying is now matched much better than in the Gsr model, it is up to 1.5 basispoints accurate. A perfect match is not expected from theory, because the dynamics of the linear TSR model is not the same as in the Markov model, of course.

The option price, accordingly, is around 7.5 basispoints lower compared to the Gsr model. This is around the same magnitude of the underlying mismatch in the Gsr model.

To complete the picture, the Markov model also has a volatility function that can be calibrated to a second instrument set like coterminal swaptions to approximate call rights. It is rather questionable if a call right of a CMS versus Euribor swap is well approximated by a coterminal swaption. Actually I tried to use the delta-gamma-method to search for any representation of such a call right in the Markov model. The following picture is from a different case (it is taken from the paper I mentioned above), but showing what is going on in principle

Screenshot from 2015-03-29 18:12:23

The exotic underlying is actually well matched by a market swaption’s underlying around the model’s state y=0, which is the expansion point for the method, so it does what it is supposed to do. But the global match is poor, so there does not seem to be a good reason to include additional swaptions into the calibration to represent call rights. They do not hurt, but do not specifically represent the call rights, so just add some more market information to the model.

Anyhow, we can do it, so we do it. If we just take atm coterminals and calibrate the Markov model’s volatility function to them, we get as a calibration result

Expiry              Model sigma   Model price         market price        Model ivol    Market ivol   
====================================================================================================
April 30th, 2015    0.010000      0.016111            0.016111            0.199996      0.200000      
May 3rd, 2016       0.012276      0.020062            0.020062            0.200002      0.200000      
May 3rd, 2017       0.010534      0.021229            0.021229            0.200001      0.200000      
May 3rd, 2018       0.010414      0.020738            0.020738            0.200001      0.200000      
May 2nd, 2019       0.010361      0.019096            0.019096            0.199998      0.200000      
April 30th, 2020    0.010339      0.016537            0.016537            0.200002      0.200000      
May 3rd, 2021       0.010365      0.013253            0.013253            0.199998      0.200000      
May 3rd, 2022       0.010382      0.009342            0.009342            0.200001      0.200000      
May 3rd, 2023       0.010392      0.004910            0.004910            0.200001      0.200000      
                    0.009959

I am not going into details about the volatility function here, but note, that the first step is fixed (at its initial value of 1\%) and the step after the last expiry date matters. In addition a global calibration to all coterminals simultaneously is necessary, the iterative approach will not work for the model.

The pricing results for the underlying does not change that much, the fit is still good as desired:

Float swap NPV (Markov) = 0.004331

There is one last thing I want to mention and which is not yet part of the library or the example code. Hagan introduced a technique called “internal adjusters” to make the Gsr model work in situations like the one we have here, namely that the underlying is not matched well. He mentions this approach in his paper on callable range accrual notes where he uses his LGM (same as Gsr or Hull White) model for pricing and observes that he does not calibrate to underlying Libor caplets or floorlets very well. He suggests to introduce an adjusting factor to be multiplied with the model volatility in case we are evaluating such a caplet or floorlet during the pricing of the exotic. So the missing model fit is compensated by using a modified model volatility “when needed” (and only then, i.e. when evaluating the exotic coupon we want to match).

This sounds like a dirty trick, destroying the model in a way and introducing arbitrage. On the other hand mispricing the market vanillas introduces arbitrage in a much more obvious and undesirable way. So why not. If Pat Hagan says we can do it, it is safe I guess. We should say however that Hagan’s original application was restricted to adjust Libor volatilities. Here we make up for a completely wrong model smile. And we could even go a step further and match e.g. market CMS spread coupon prices in a Gsr model, although the model does not even allow for rate decorrelation. So one should be careful, how far one wants to go with this trick.

I added these adjusters to the Gsr model. If you don’t specify or calibrate them, they are not used though, so nothing changes from an end user perspective. In our example we would set up a calibration basket like this

 std::vector<boost::shared_ptr<CalibrationHelperBase> > 
                                            adjusterBasket;
 for (Size i = 0; i < leg0.size(); ++i) {
     boost::shared_ptr<CmsCoupon> coupon =
         boost::dynamic_pointer_cast<CmsCoupon>(leg0[i]);
     if (coupon->fixingDate() > refDate) {
         boost::shared_ptr<AdjusterHelper> tmp =
             boost::make_shared<AdjusterHelper>(
                 swapBase, coupon->fixingDate(), 
                 coupon->date());
         tmp->setCouponPricer(cmsPricer);
         tmp->setPricingEngine(floatSwaptionEngine);
         adjusterBasket.push_back(tmp);
     }
 }

The adjuster helper created here corresponds to the CMS coupons of our trade. We set the linear TSR pricer (cmsPricer) to produce the reference results and the Gsr pricing engine in order to be able to calibrate the adjusters to match the reference prices. Now we can say

  gsr->calibrateAdjustersIterative(adjusterBasket,
                                   method, ec);

like before for the volatilities. What we get is

Expiry              Adjuster      Model price         Reference price     
================================================================================
April 30th, 2015    1.0032        2447560.9183        2447560.9183        
May 3rd, 2016       1.0353        2402631.1363        2402631.1363        
May 3rd, 2017       1.0640        2378624.8507        2378624.8507        
May 3rd, 2018       1.0955        2324333.7739        2324333.7739        
May 2nd, 2019       1.1239        2295880.1247        2295880.1247        
April 30th, 2020    1.1643        2261229.3425        2261229.3425        
May 3rd, 2021       1.1948        2228406.7519        2228406.7519        
May 3rd, 2022       1.2214        2196901.0808        2196901.0808        
May 3rd, 2023       1.2732        2177227.7967        2177227.7967        

The prices here are with reference to a notional of one hundred million. The adjuster values needed to match the reference prices from the replication pricer are not too far from one, which is good, because it means that the model is not bent too much in order to produce the CMS coupon prices.

The pricing in the new model is as follows

GSR (adjusted) option value     = 0.003519
GSR (adjusted) underlying value = 0.004452

The underlying match is (by construction) very good thanks to the adjusters. Also the option value is adjusted down as desired, we are now very close to the markov model’s value. Needless to say that this does not work out that well all the time. Also as an important disclaimer, an option on CMS10Y against Euribor6M has features of a spread option, which is highly correlation sensitive. Neither the (adjusted) Gsr nor the Markov model can produce correlations other than one for the spread components, so they will always underprice the option in this sense.

Enough for today, look at the example and play around with the code. Or read the paper.

Gaussian Models

Shifted SABR, CMS and Markov Functional

The other day I found this piece of code in QuantLib

QL_REQUIRE(values[i]>0.0,
  "non positive fixing (" << values[i] <<
  ") at date " << dates[i] << " not allowed");

Puzzle: Where is this snippet taken from ? Anyway, financial markets do not care. We currently have negative Euribor Fixings for 1w and 2w. Brokers started to quote zero strike floors on Euribor 3M and 6M already a year ago or so. Now, market participants start to assign significant premia to zero strike floors on the EUR CMS 10y rate.

The classic market model for options on Euribor or a swap annuity is the Black76 model

dF = F \sigma dW

implying zero probability for negative underlying values, so a zero strike floor is worth zero always. One way to get around this is the shifted Black76 model

dF = (F+\alpha) \sigma dW

with a positive (or of course zero) shift \alpha. For EUR caps and floors \alpha=1\% is a common value currently.

It is quite simple to give a solution of such a shifted model in terms of the original model: Set X:= F+\alpha, then you have the original model back, but in X instead of F. So to get an option price, plug in the shifted forward and the shifted strike into the original option pricing formula.

This is the density of the underlying implied by this model at t=10 (years) with an initial forward level of 0.50\% and a volatility of \sigma=20\%:

shiftedln_density

The model sets the lower bound for the underlying rate to -1\%. One can ask what the classic (i.e. non-shifted) Black76 implied volatility for option prices produced by our shifted model is. Here you go.

implied_nonshifted_vol

The shift parameter has produced a skew. Actually shifted underlyings have been a popular technique to produce skews, e.g. in the Libor market model, for a long time. Here our motivation is different, namely to produce negative rates.

You will have noticed that there is no implied volatility for strikes near zero, even if they are positive. That is because the call option price function in the shifted model looks like this (the red line)

shifted_call_price

starting at 1.5\% which is the sum of the forward and the shift and crossing the y axis (not very prominent in this plot) above the forward level because of its convexity, thus producing non-attainable prices for the non-shifted Black model. The green line marks the maximum price attainable in a non-shifted Black76 model.

I should say that I am always talking about non discounted prices – the discount or in case of swaptions the annuity factor is unity.

Next we need a shifted SABR model. It is constructed the same way as the shifted Black76 model:

dF = (F+d)^\beta \sigma dW \\  d\sigma = \sigma \nu dV

with \sigma(0) = \alpha and dWdV = \rho dt and our shift d\ge 0. Here is a density with same parameters as above and \alpha=0.012, \beta=0.30, \nu=0.15, \rho=-0.10.

shiftedsabr_density

The negative density for strikes near -1\% is nothing special, we already know this defect from the classic non-shifted Hagan model (we are using the standard expansion from 2002 here). The point is again that the density now covers the region from -1\% to zero strike level.

It seems that brokers like ICAP changed their primary model for swaption smiles to this one. They publish an ATM matrix together with a shift matrix assigning an individual shift to each option tenor / swap length pair. The shifts are constant across option tenors, but different for the different underlyings. At the moment the shifts are 0.50\% for the 1y underlying, going down to 0.30\% for the 30y underlying. They also publish smile spreads in the usual way, but as differences of shifted lognormal itm or otm volatilities and the shifted atm volatility.

In QuantLib we need to do several things to adopt these new quotation standards. I will try to package the changes into a pull request soon. The highlights are as follows.

The Smile Section class needs to know what kind of volatility nature it represents.

enum Nature { ShiftedLognormal, Normal };
SmileSection(const Date& d,
             const DayCounter&a dc = DayCounter(),
             const Date& referenceDate = Date(),
             const Nature nature = ShiftedLognormal,
             const Rate shift = 0.0);

As you can see we can also have normal smile sections. This is another standard for volatility quotation using the normal Black model

dF = \sigma dW

as a basis.

The ATM matrix now takes a shift matrix as an optional parameter.

SwaptionVolatilityMatrix::SwaptionVolatilityMatrix(
     const Calendar& cal,
     BusinessDayConvention bdc,
     const std::vector<Period>& optionT,
     const std::vector<Period>& swapT,
     const std::vector<std::vector<Handle<Quote> > >& vols,
     const DayCounter& dc,
     const bool flatExtrapolation,
     const std::vector<std::vector<Real> >& shifts)

The two swaption volatility cubes need to be adapted as well. Their interface does not change, but we have to use a shifted SABR model for the SABR cube for example. Also the moneyness definition for smile spread interpolation has to be adapted. And some other things I guess.

To do CMS pricing we need to get our hands on some CMS coupon pricer. My favourite one is the LinearTsrPricer (it’s fast, it respects the put call parity, it just does not disappoint you). A terminal swap rate model for CMS coupons does not depend on certain strike ranges or a particular model for vanilla swaption valuation. Such a model is given by a mapping \alpha

\alpha( S(t) ) = \frac{P(t,t_P)}{A(t)}

where t_p is the coupon payment date and A(t) the annuity of the underlying swap rate S. Then (integration by parts) the npv of a general CMS coupon

A(0) E^A( P(t,t_p) A(t)^{-1} g(S(t)) )

is given by

A(0)S(0)\alpha(S(0))+\int_{-\infty}^{S(0)} w(k)R(k)dk+\int_{S(0)}^\infty w(k)P(k)dk

with t begin the fixing date of the coupon, R and P prices of market receiver and payer swaptions and weights w(s)=\{\alpha(s)g(s)\}''. This nice and concise derivation is taken from the three volume Piterbarg bible on interest rate derivatives.

The linear terminal swap rate model is given by a linear function \alpha(S) = aS+b. b is determined by a no arbitrage condition already, but a is free, and can for example be linked to a one factor mean reversion. Or just set as a free user choice directly.

The derivation says that you should integrate over the range where you have non zero put or call prices. For a shifted lognormal smile section input this means that we need to integrate over [-\alpha,\infty) for a plain CMS coupon, if \alpha is the shift. The shift can be read from the SmileSection which is input to the CMS coupon pricers and we are done.

The last thing I did so far is to make the Markov Functional Model work with the shifted smile sections. The result is a fully dynamic and consistent model which can reproduce densities given by marginal shifted SABR inputs for example. This way we get consistent with the TSR CMS pricing described above.

Enough for today, I will go into more details about the Markov functional model (and the linear TSR pricer) in one of the next posts.

Shifted SABR, CMS and Markov Functional