The Chambers-Nawalkha Formula

This is about implied volatility. Which can for example be found as \sigma in the Black76 process

dF = \sigma F dW

with an underlying forward rate F and a brownian motion W. It is this \sigma which is often used to express a vanilla option price because is normalizes out the dependency on expiry and strike in a certain way.

It is the same \sigma that makes trouble for caps and swaptions in Euro nowadays because it also rules out negative forwards and tends to explode for low interest rate levels. But there are workarounds for this like normal volatilities (remove the F on the rhs of the equation above) or shifted lognormal volatilities (replace F by F+\alpha for a constant \alpha \geq 0). I will write more about this in a future post.

Today I focus on the implied aspect. This means you start with a price and ask for the \sigma giving this price in the Black76 model.

Why is that important ? Because implied volatilities are something that traders want to see. Or you want to use them for interpolation rather than directly the prices. Or your trading system accepts only them as market data input. Also there are useful models (or model resolutions) that work directly with volatilities, like the SABR-Hagan-2002 model or the SVI model. On the other hand there are sometimes only prices available in the first place. Could be market quotes. Could also be other models that produce premiums and not directly volatilities like other resolutions for the SABR model.

I had this issue several times already. An example is the KahaleSmileSection class which can be used to check a smile for arbitrage and replace the defective regions by an arbitrage free extrapolation. This class works in the call price space. You can retrieve implied volatilities from it, but for this I needed to use

 try {
     Option::Type type = strike >= f_ ? Option::Call : 
                                        Option::Put;
     vol = blackFormulaImpliedStdDev(
            type, strike, f_,
            type == Option::Put ? strike - f_ + c : c) /
               sqrt(exerciseTime());
} catch (...) { /* ... */}

which has c, the call price, as an input and converts that to an implied volatility using the library’s function blackFormulaImpliedStdDev. This function uses a numerical zero search and the usual “forward” black formula to find the implied volatility. With the usual downsides. It would be nicer to have a closed form solution for the implied volatility!

Another example is the No arbitrage SABR model by Paul Doust that also produces option prices and not directly volatilities. Or of course the ZABR model which we already had on this blog here and which has premium outputs at least for some of its resolutions.

On Wilmott forums someone said

there actually is a closed-form inverted Black-Scholes formula due to Bourianochevsky and Jarulamparabalam … but I’ve never seen it since it’s proprietary …

Hmm googling the two names gives exactly one result (the Wilmott thread) although they sound familiar. But he said proprietary didn’t he. And later, the same guy,

oh, and they also claim they have a closed-form formula for the American put option … again proprietary.

Also very cool. Seriously, on the same thread there is a reference to this nice paper: Can There Be an Explicit Formula for Implied Volatility?. A hint that things are difficult to say the least.

What really counts at the end of the day is what is available in QuantLib. There is blackFormulaImpliedStdDevApproximation. Which is also among the longest function names in the library. This uses (citing from the source code)

Brenner and Subrahmanyan (1988) and Feinstein (1988) approximation for at-the-money forward option, with the extended moneyness approximation by Corrado and Miller (1996)

Quite some contributors. However it does not work too well, at least not in all cases. Look at a SABR smile with \alpha=0.04, \beta=0.5, \nu=0.15, \rho=-0.5, a forward of 0.03 and time to expiry 30 years.

blackimpliedstddev2

It is surprisingly easy to improve this following a paper by Chambers and Nawalkha, “An improved Approach to Computing Implied Volatility” (The Financial Review 38, 2001, 89-100). The cost for the improvement is that one needs one more input, namely the atm price. But this is not too bad in general. If you have prices from a model that you want to convert into implied volatilities, you can produce the atm price, no problem then. Also if you have market quotes you will often have the atm quote available because this is usually the most liquid one.

What they do is

  • compute an approximate atm volatility from the given atm price
  • use this to reprice the option in question on the atm volatility level
  • compute the vega difference between atm and the requested volatility by a second order Taylor expansion

For the first step they do the same as in blackFormulaImpliedStdDevApproximation, which is to apply the so called Brenner and Subrahmanyan formula to get the implied atm volatility from the given atm option price.

This formula is very very easy: Freeze the forward at time zero in the Black dynamics to get a normal distributed forward at option expiry. Then integrate the option payoff, which can be done explicitly in this case using school math. This gives

E( (F(t)-K)^+ ) \approx F(0)\sigma\sqrt t / (2\pi)

so that the atm volatility can be easily computed from the atm option price. I hear some people find it “deep” that \pi appears in this equation relating option prices and the implied volatility. A sign of a transcendental link between option markets and math. I don’t find that deep, but who am I to judge.

The second step is easy, just an application of the forward Black formula.

For the third step we have (star denoting atm volatility, subscripts denoting derivatives)

c(K) - c^*(K) = c_\sigma(K) (\sigma - \sigma^*) + \frac{1}{2} c_{\sigma\sigma}(K) (\sigma - \sigma^*)^2 + ...

This is a quadratic equation which can readily be solved for (\sigma - \sigma^*).

You can find working code in my master branch https://github.com/pcaspers/quantlib/. The name of the function is blackFormulaImpliedStdDevApproximationChambers and it is part of the file blackformula.cpp. Remember to add the compiler flag -fpermissive-overlong-identifiers during the configure step.

Let’s try it on the same data as above.

blackimpliedstddev2

Better! Also the resulting smile shape stays natural even in regions which deviates a bit more from the original smile.

Have a very nice weekend all and a good new week.

The Chambers-Nawalkha Formula

Interlude: SFINAE tricks

Hello again. This week I want to write about a point which remained open in a previous post . Jan Ladislav Dussek already gave a solution in response to the post. Thanks for that! However I believe the technology behind is quite fancy so worth a post in its own right.

What are the news apart from that? There is a second project adding AD capabilities to QuantLib following a different approach than described earlier in this blog. The idea is to just replace the Real datatype by active AD types at compile time and use them throughout all calculations. This only requires minimal code changes compared to the template approach.

The downside is that you can not mix AD and non-AD calculations during runtime which might have an impact on performance and / or memory footprint of your application.

My personal experiments show a 20 – 30 % slowdown in some simple examples but maybe this can be optimized. Or maybe this isn’t too bad and a small price to pay in comparison to the work needed to convert large parts of the codebase. Another question is whether the templated code can or should be merged into the official branch earlier or later. If not it would be quite cumbersome to keep an adjoint branch up to date with current developments. But not impossible.

Anyway it is interesting to note that the work on the new approach was initiated by Alexander Sokol and his company CompatibL. So quite a competitor. Like running a marathon with Dennis Kimetto (he ran a new world record in 2:02:57 last year in Berlin, a time in which I can safely run half the way, well a bit more but not much). Why not. We are still on track, aren’t we Cheng (whom I thank for his tireless contributions). 🙂

Back to the main point of this post. I first restate the problem in a bit more abtract way. We have the following classes on stage

template <class T> class BaseCouponPricer {};
template <class T> class DerivedCouponPricer : 
                      public BaseCouponPricer<T> {};

where the base class in QuantLib is really FloatingRateCouponPricer and the derived class something like CmsCouponPricer or IborCouponPricer or the like.

Now we have a utility function assigning coupon pricers to coupons

template <class T>
void setCouponPricer(const boost::shared_ptr<BaseCouponPricer<T> > &) {
    return;
}

taking a shared pointer to some pricer derived from our base class. Without the template stuff this is not a problem because there is an implicit conversion defined from a shared pointer to D to a shared point to B if D inherits from B. Although – and this is important to note – there is no inheritance relationship between the shared pointers themselves implied. And this is the whole source of the problem because user code like this

boost::shared_ptr<BaseCouponPricer<double> > basePricer;
boost::shared_ptr<DerivedCouponPricer<double> > derivedPricer;
setCouponPricer(basePricer);
setCouponPricer(derivedPricer);

will result in a compiler error like this

testTempl.cpp:35:34: error: no matching function for call to ‘setCouponPricer(boost::shared_ptr<DerivedCouponPricer<double> >&)’
     setCouponPricer(derivedPricer);
                                  ^
testTempl.cpp:35:34: note: candidate is:
testTempl.cpp:12:6: note: template<class T> void setCouponPricer(const boost::shared_ptr<BaseCouponPricer<T> >&)
 void setCouponPricer(const boost::shared_ptr<BaseCouponPricer<T> > &) {
      ^
testTempl.cpp:12:6: note:   template argument deduction/substitution failed:
testTempl.cpp:35:34: note:   mismatched types ‘BaseCouponPricer<T>’ and ‘DerivedCouponPricer<double>’
     setCouponPricer(derivedPricer);
                                  ^
testTempl.cpp:35:34: note:   ‘boost::shared_ptr<DerivedCouponPricer<double> >’ is not derived from ‘const boost::shared_ptr<BaseCouponPricer<T> >’

where the essential message is in the last line. This was produced with gcc, clang is a bit less chatty (which is a good quality in general)

testTempl.cpp:35:5: error: no matching function for call to 'setCouponPricer'
    setCouponPricer(derivedPricer);
    ^~~~~~~~~~~~~~~
testTempl.cpp:12:6: note: candidate template ignored: could not match 'BaseCouponPricer' against
      'DerivedCouponPricer'

The implicit conversion mentioned above does not help here because it is not taken into consideration by the compiler’s template substitution machinery.

The idea to solve this is to make the implementation more tolerant, like this

template <template<class> class S, class T>
void setCouponPricer(const boost::shared_ptr<S<T> > &) {
    return;
}

now taking a shared pointer to any type S which itself takes a template parameter T. This compiles without errors. But now the door is open for code that do not make sense because S could be something completely different from a pricer.

What we need is some check at compile time for class inheritance. The boost type traits library (or since C++11 the std library itself) has something for this, namely we can write

bool isBase = boost::is_base_of<BaseCouponPricer<double>, 
                                DerivedCouponPricer<double> >::type::value;

Here we use generic template programming which is something like programming with types instead of values and where the function evaluation is done during compilation, not at run time. The function we use here is boost::is_base_of which takes two arguments and the return value of this function is itself a type which we can retrieve by ::type.

Since there is no such thing like a function taking type arguments and returning a type in C++ (and which in addtion is evaluated at compile time), ordinary struct‘s are used for this purpose taking the input values as template parameters and providing the return value as a typedef within the struct assigning the label type to the actual return value which is, remember, a type.

In our specific case we are expecting a boolean as the return value so the return type has to represent true or false, and it does. Actually there is a wormhole from the meta programming to the regular C++ space, yielding an usual bool, and this is via ::value which unpacks a boolfrom its meta space wrapper which stores it just as a static constant.

If on the other hand we write

bool isBase = boost::is_base_of<BaseCouponPricer<double>, 
                                UnrelatedClass<double> >::type::value;

for an unrelated class (i.e. a class not derived from BaseCouponPricer) we get isBase = false.

But we are not done, the best part is yet to come. I just begin with the end result because I don’t have a clue how people come up with these kind of solutions and am reluctant to think of a motivating sentence

template <template <class> class S, class T>
void setCouponPricer(
    const boost::shared_ptr<S<T> > &,
    typename boost::enable_if<
        boost::is_base_of<BaseCouponPricer<T>, S<T> > >::type *dummy = 0) {
    return;
}

What I usually do when I see code like this for the first time is sit there for a few hours and just stare at it. And then after a while you see the light:

We add a second parameter to our function which is defaulted (to 0, doesn’t matter) so that we do not need to specify it. The only reason for this second parameter is to make sense or not to make sense, in the latter case just removing the function from the set of candidates the compiler considers as an template instantiation for the client code. What I mean by not ot make sense becomes clearer when looking at the implementation of boost::enable_if (I took this from the 1_57_0 distribution):

  template <bool B, class T = void>
  struct enable_if_c {
    typedef T type;
  };

  template <class T>
  struct enable_if_c<false, T> {};

  template <class Cond, class T = void> 
  struct enable_if : public enable_if_c<Cond::value, T> {};

The function (struct) enable_if takes one input parameter, namely Cond standing for condition. This parameter is a type since we are doing meta programming but the implementation derefences the value via ::value just as we did above for testing and passing it to the implementing function enable_if_c. Passing means inheritance here, we are programming with struct‘s (with difficult topics I tend to repeat myself, bear with me please).

The second parameter T defaulted to void does not play any important role, it could just be any type (at least in our application here). The return value from enable_if_c is this type if the input parameter is true or nothing if the input parameter is false. This is implemented by partial template specialization, returning (by declaring a typdef for) T = void in the general definition but nothing in the specialization for B = false.

Now if nothing is what we get as the return value ::type, i.e. there is no typedef at all for the return value, the expression for our second dummy parameter from above

typename boost::enable_if<
        boost::is_base_of<B,D> >::type *dummy = 0

does not make sense since without a ::type we can not declare a pointer to it. If type is void on the other hand the expression makes perfect sense. One could add that a non-existent ::type in the meta programming space is what void is in the regular programming space. Not very helpful and just to confuse everyone a bit.

The first case when ::type is void is clear now, the function is just taken by the compiler to instantiate the client code function. And this is the case if our earlier condition checking for the inheritance relationship is true.

If this is false on the other hand a senseless expression is born during the template substitution process and therefore discarded by the compiler for further use. This is also know as SFINAE which stands for substitution failure is not an error, you can read more about that here.

The test code for the whole thing is as follows

boost::shared_ptr<UnrelatedClass<double> > noPricer;
boost::shared_ptr<BaseCouponPricer<double> > basePricer;
boost::shared_ptr<DerivedCouponPricer<double> > derivedPricer;

setCouponPricer(noPricer); // compile error
setCouponPricer(basePricer); // ok
setCouponPricer(derivedPricer); // ok

This is all a bit crazy and sounds like exploiting undocumented language features, not meant for those purposes. And probably it was in the beginning when meta programming was discovered. But it’s obviously widely used nowadays and even necessary to comply with the already classic C++11 standard.

Which we just discussed yesterday when talking about C++11 – compliant random number generators. Here is an interesting one by Thijs van den Berg. In chapter 26.5.1.4 / table 117 of the C++11 standard n3242 it says that you must provide two seed methods, one taking an integer and an addtional one taking a sequence generator. Since the signature of both methods is overlapping the only way of telling whether you have an integer or a sequence generator is by using techniques like the ones above. And so did Thijs in his code. So better get used to it, I guess.

By the way compile time calculations are quite popular. There is even a template programming raytracer. I might port some QuantLib pricing engines to compile time versions if the adjoint project should die for some reason.

The final implementation of setCouponPricers in the real code now looks as follows

template <template <class> class S, class T>
void setCouponPricers(
    const typename Leg_t<T>::Type &leg,
    const std::vector<boost::shared_ptr<S<T> > > &pricers,
    typename boost::enable_if<boost::is_base_of<FloatingRateCouponPricer_t<T>,
                                                S<T> > >::type *dummy = 0) {
/* ... */
}

Again a good example that templated code together with some dirty generic programming tricks does not necessarily loose readability.

Interlude: SFINAE tricks

Adjoint Greeks III – Vanilla Swaps

Time for a post because next week I will be in Dublin over the weekend. Three days with my beloved wife, just the two of us. Without our four little ones who will have a great time too, with their granny. But also without my laptop, which makes me a bit nervous to be honest. Anyway. Good news from my adjoint greeks project, I reached the first bigger milestone, which is compute the delta vector for a plain vanilla swap.

Or rather for a whole portfolio of vanilla swaps. The setting should be realistic and allow for first performance checks, so I look at portfolios of 1, 10, 100, 1000, 5000 and 10000 swaps with random maturities in a horizon of 10, 20, 30, 50 and 70 years, giving 30 test cases for all combinations in total.

The underlying curve (only one, but it would work with separate discounting and forwarding curves as well) consists of 10 deposits, 5 forward rate agreements and as many swaps as the horizon suggests with yearly spacing, i.e. we have 25, 35, 45, 65 and 85 points on the curve for each of the above horizon scenarios respecticely.

The goal is to

  • compute the NPV of the swap portfolio
  • compute bucket deltas with respect to all instrument in the underlying curve

both in the traditional way (shifting the market quotes of the curve instruments and revalue the portfolio on the new curve) and with AD.

You can see the whole example code here https://github.com/pcaspers/quantlib/blob/adjoint/QuantLib/Examples/AdjointSwapDeltas/adjoint.cpp. Note that no special engine is used to generate the adjoint deltas. We just do a vanilla swap pricing (almost) as always, collecting the market variables of interest in a vector xAD and the result (the NPV) in yAD and then say

CppAD::ADFun<Real> f(xAD, yAD);
std::vector<Real> deltasAD(xAD.size()), w(1, 1.0);
deltasAD = f.Reverse(1, w);

Let’s start looking at one single swap. The delta pillars are numbered from 1 to 45 simply, meaning overnight, tomnext, spotnext, spotweek, 1m, … , 6m, 1m-7m fra, 2m-8m, … , 5m-11m, 1y swap, 2y, … , 30y. The random swap below apparently has a maturity between 23y and 24y from today. The notional of the swap is 100 million (in general the portfolio’s total notional is 100 million in all cases regardless of the number of swaps in it).

The finite difference deltas in the column labeled double are computed with a step size of 1E-6 here. The AD values are in the next column and the rightmost column displays the difference between these two. The CppAD framework once more does not disappoint us and gives reliable results. Note that the deltas are expressed w.r.t. a one basis point (1E-4) shift.

results:                double     AD<double>     difference
   NPV            -64037022.66   -64037022.66           0.00
   Delta #    1          53.94          53.93           0.00
   Delta #    2          17.98          17.98           0.00
   Delta #    3           0.00          -0.00           0.00
   Delta #    4           0.00           0.00          -0.00
   Delta #    5           0.00           0.00          -0.00
   Delta #    6           2.28           2.28          -0.00
   Delta #    7       -2439.17       -2439.17           0.00
   Delta #    8           0.00          -0.00           0.00
   Delta #    9           0.00           0.00          -0.00
   Delta #   10           0.00          -0.00           0.00
   Delta #   11           0.00           0.00          -0.00
   Delta #   12           7.13           7.13          -0.00
   Delta #   13         182.54         182.54          -0.00
   Delta #   14           0.00          -0.00           0.00
   Delta #   15           0.00           0.00          -0.00
   Delta #   16          36.26          36.26           0.00
   Delta #   17         495.51         495.51          -0.00
   Delta #   18         731.54         731.54          -0.00
   Delta #   19         994.67         994.68          -0.00
   Delta #   20        1238.63        1238.63          -0.00
   Delta #   21        1488.41        1488.42          -0.00
   Delta #   22        1745.32        1745.33          -0.01
   Delta #   23        2013.84        2013.85          -0.01
   Delta #   24        2234.96        2234.97          -0.01
   Delta #   25        2535.77        2535.79          -0.01
   Delta #   26        2769.90        2769.92          -0.02
   Delta #   27        3040.39        3040.41          -0.02
   Delta #   28        3321.43        3321.45          -0.02
   Delta #   29        3543.09        3543.12          -0.03
   Delta #   30        3861.57        3861.65          -0.08
   Delta #   31        4109.57        4109.51           0.06
   Delta #   32        4364.59        4364.68          -0.09
   Delta #   33        4659.00        4659.04          -0.05
   Delta #   34        4958.68        4958.73          -0.05
   Delta #   35        5166.84        5166.90          -0.06
   Delta #   36        5529.58        5529.64          -0.06
   Delta #   37        5745.63        5745.69          -0.07
   Delta #   38       60419.04       60419.23          -0.19
   Delta #   39      167048.46      167051.01          -2.55
   Delta #   40           0.00           0.00           0.00
   Delta #   41           0.00           0.00           0.00
   Delta #   42           0.00           0.00           0.00
   Delta #   43           0.00           0.00           0.00
   Delta #   44           0.00           0.00           0.00
   Delta #   45           0.00           0.00           0.00

Now let’s look at timings. In a bigger example, with 70y horizon (85 curve pillars) and 10000 swaps in the portfolio. On my laptop,

maximum maturity           70 years
portfolio size          10000 swaps
delta vector size          85 pillars

timings (ms)            double     AD<double>         factor  eff#NPVs
   pricing                1070           2690
   deltas                99030           2700
   total                100100           5390        18.5714   4.57692

This says that the pricing takes 1070ms with standard double computations and 2690ms with AD. Approximately, because you can not really trust single timings, in particular not for small problems.

The delta computation under the naive bump and revalue approach takes another fatty 99 seconds, while CppAD only consumes 2.7 seconds on top of the 2.7 seconds from the pricing. Obviously more is done in the pricing step compared to a double computation, later on used in the reverse sweep, so ultimately we have to look at the sum of pricing and delta computation. For this we get a speed up of 18.5x with AD.

The last number eff#NPVs is the effective number of NPV calculations during the AD run which is the total time of the AD calculation divided by the average time for one pricing with classic double‘s. From theory we know that this should be bounded by 5, here it is 4.6.

The magic of AD is that with bigger problems, the 4.6 stays constant, not the speed up factor. If we look at 115 pillars instead of the 85 for example,

maximum maturity          100 years
portfolio size          10000 swaps
delta vector size         115 pillars

timings (ms)            double     AD<double>         factor  eff#NPVs
   pricing                1610           4060
   deltas               204630           4100
   total                206240           8160        25.2745   4.55004

Note that swaps with longer term are generated for this example, so a single pricing takes longer than before. Otherwise we stay at 4.6 times one NPV computation for now 115 bucket deltas and get a speed up of over 25 !

It does not matter how many greeks you want, you just get them for free. I already understood that I thought, but seeing it in reality is quite cool. Let your problem grow as it wants, your complexity stays constant. In other circumstances you are glad to be linear. Sometimes you are lucky-logarithmic. Here complexity is constant. Crazy.

Running all the examples mentioned above and plotting the effective number of NPV calculations against the problem size (defined as the product of the delta vector and the portfolio size) we get this picture.

adjointeffnpv

The effective number of NPV calculations seems to stabilize slightly above 4.5. Cough, probably we should produce more points first … but with a sharp eye …

Next steps: Volatility term structures (which I already scratched involuntarily when converting the ibor coupon pricers), Black pricers, Hull White and Markov functional model. The latter both being slow and totally and utterly inaccessible for bump and revalue sensitivities. And of course filling in all the holes I left open so far. Still, help on the conversion process is highly appreciated (including not only pull requests you can send, but also expensive gifts, plain money (no bitcoins please), or a too-good-to-reject QuantLib job offer).

Adjoint Greeks III – Vanilla Swaps

Tag Dispatching, ZABR and Where Do You Start

Recently I added an experimental implementation of Paul Doust’s No arbitrage SABR model to QuantLib. As the name suggests this is an arbitrage free approximation to the famous stochastic \alpha\beta\rho model proposed in 2002 by Patrick Hagan

dF = F^\beta \sigma dW \\  d\sigma = \nu \sigma dV \\  dW dV = \rho dt

for a forward rate F, stochastic volatility \sigma with volatility of volatility \nu starting at an inital value \alpha, CEV parameter \beta and correlation \rho, and – in Doust’s case – assuming an absorbing barrier at zero forward level.

I am not going to talk about low interest rates and possible remedies for your models here, it is weekend after all, so let’s relax and don’t worry about the outside world too much – I think it was just yesterday that the Euro Sabr Volatility Cube in QuantLib ceased to work, telling me that the 1m into 1y forward swap rate against Euribor 3m is negative, so no further activity seems meaningful any more. Thursday was also quite interesting thanks to our swiss neighbours. I mean how are you supposed to stay focussed when everybody around you freaks out ? Really annoying.

Once the no arbitrage model core itself was implemented I needed to integrate it into the existing infrastructure. For the classic SABR implementation there are a lot of useful classes, like an interpolation class and associated factory, a smile section, a swaption cube (the one from above). The quick and dirty solution is too obvious: Copy’n’paste the existing SABR code, rename the classes and replace the SABR formula by the new Doust version. I think, ultimately, there are only three very simple criteria for good code

  • No redundant code – don’t do the same thing at two places in your program
  • Easy to maintain and extend – bug fixing and adding new functionality is easy
  • Understandable – the average colleague of yours is able to read and understand the code

The better approach for integrating the new SABR version seems therefore to take the existing code, extract everything specific to the classic implementation and use what is left to build both SABR versions on top of it. I called the common basis XABR, so for example there is now a class

template <class I1, class I2, typename Model>
class XABRInterpolationImpl : 
           public Interpolation::templateImpl<I1, I2>,
           public XABRCoeffHolder<Model> {
/* ... */

with a generic Model class giving the specification of Hagan’s 2002 SABR approximation, Doust’s 2012 no arbitrage SABR approximation, and possibly other models. Turns out that for example the SVI model also fits into the framework without any difficulties. So the label XABR is already not general enough to cover all use cases. Never mind, the main thing is that the code is generic. The Model class looks like this

    struct SABRSpecs {
        Size dimension() { return 4; }
        void defaultValues(std::vector<Real> &params, std::vector<bool> &,
                           const Real &forward, const Real expiryTIme) {
        /* ... */
        }
        void guess(Array &values, const std::vector<bool> &paramIsFixed,
                   const Real &forward, const Real expiryTime,
                   const std::vector<Real> &r) {
        /* ... */
        }
        Real eps1() { return .0000001; }
        Real eps2() { return .9999; }
        Real dilationFactor() { return 0.001; }
        Array inverse(const Array &y, const std::vector<bool>&,
                      const std::vector<Real>&, const Real) {
        /* ... */
        }
        Array direct(const Array &x, const std::vector<bool>&,
                     const std::vector<Real>&, const Real) {
        /* ... */
        }
        typedef SABRWrapper type;
        boost::shared_ptr<type> instance(const Time t, const Real &forward,
                                         const std::vector<Real> &params) {
            return boost::make_shared<type>(t, forward, params);
        }
    };

telling you the number of parameters of the model, default values to be used as starting values for calibration (if you do not specify any), a guessing algorithm giving meaningful alternative starting values if the first calibration fails or runs into a local minimum, some internal constants (not part of the Model concept directly) and inverse and direct transformation methods that map the full real hypercube to the admissible parameter space to facilitate unconstrained optimization of the model. Last not least, the instance method creates an object of the model that can be used to ask for implied volatilities.

This kind of generalization is also done for volatility cubes. Now there is a

    template<class Model>
    class SwaptionVolCube1x : public SwaptionVolatilityCube {
/* ... */

again taking a generic spec for the desired underlying model. The original Hagan cube and the new Doust cube are then retrieved as

    struct SwaptionVolCubeSabrModel {
        typedef SABRInterpolation Interpolation;
        typedef SabrSmileSection SmileSection;
    };

    typedef SwaptionVolCube1x<SwaptionVolCubeSabrModel> 
                                          SwaptionVolCube1;

    struct SwaptionVolCubeNoArbSabrModel {
        typedef NoArbSabrInterpolation Interpolation;
        typedef NoArbSabrSmileSection SmileSection;
    };

    typedef SwaptionVolCube1x<SwaptionVolCubeNoArbSabrModel> 
                                              SwaptionVolCube1a;

Pretty pretty pretty good. But what does that have to do with “Tag Dispatching” ? And is ZABR just a typo in the headline of this post ? No. I don’t know what the Z stands for (probably for nothing), but it labels a model invented by ingenious Jesper Andreasen aka Qwant Daddy at Danske Bank. I once had the luck to attend a presentation he gave. Very impressive. Anyway, the model is this

dF = F^\beta \sigma dW \\  d\sigma = \nu \sigma^\gamma dV \\  dW dV = \rho dt

so introducing a CEV dynamics for the volatility. With that you can control the wings of your smile in a very flexible way, which is nice if you are into CMS pricing for example. Although I have to mention that too many degrees of freedom are not always helpful. I make do with the classic SABR model for the Euro CMS market currently and ZABR is currently not more (and not less) than an interesting bird of paradise.

I implemented the ZABR paper (it is called expansions for the masses for some reason similarly clear as the Z in ZABR) back in 2012 in our summer holidays on Sardinia. Brutally hot three weeks, forest fires nearby our house still burning just the day before we arrived. I called our landlord, he telling me everything is safe, guess then it is ok to go there with your wife and four little kids, ain’t it. Well after all it was nice on Sardinia and the worst thing that happened was that this little rocker of the L key on my beloved laptop broke. Heroically I finished the implementation nevertheless.

The special thing about the implementation is that the paper suggests more than one solution of the model. For a start there are short maturity expansions in lognormal or normal volatility, comparable in computational complexity to the classic Hagan expansions. Then there is a Dupire style local volatility approach producing an arbitrage free smile yet at the same time being fast to evaluate. And, not in the paper, I added a brute force precise full finite difference solution, which is slow but useful to benchmark the approximations.

Originally I used an enum parameter in the constructor of the model to specify the way of model evaluation. But that doesn’t fit in the above XABR framework. Extend the XABR concept to take an additional parameter that is not needed otherwise ? Not very nice. A light variant of Tag Dispatching can help ! The idea is to have a template parameter that denotes the evaluation and then write the spec for the model like this

template <typename Evaluation> struct ZabrSpecs {
    Size dimension() { return 5; }
    Real eps() { return 0.000001; }
    void defaultValues(std::vector<Real> &params,
                       std::vector<bool> &paramIsFixed, const Real &forward,
                       const Real expiryTime) {
    /* ... */
    }
    /* ... */
    typedef ZabrSmileSection<Evaluation> type;
    boost::shared_ptr<type> instance(const Time t, const Real &forward,
                                     const std::vector<Real> &params) {
        return boost::make_shared<type>(t, forward, params);
    }
/* ... */

The good thing about this is that the spec itself is templated by the evaluation, so that the evaluation does not need to become a parameter in the spec methods (e.g. the instance method), they can just stay the same as before. In this way this approach is not intrusive at all, fully respecting the existing code structure.

To mimic the functionality of an enum I chose the Evaluation parameter to be a tag meaning that it is just a label with no other deeper meaning in its simple and superficial yet useful (so I assume ultimately happy) existence. That is I declare empty classes

struct ZabrShortMaturityLognormal {};
struct ZabrShortMaturityNormal {};
struct ZabrLocalVolatility {};
struct ZabrFullFd {};

which can be plugged into the interpolation class quite naturally like this

ZabrInterpolation<ZabrLocalVolatility> myInterpolation(/*...*/);
/*...*/

But how is the implementation itself done ? For example let’s look at the volatility implementation of the ZABR smile section

    Volatility volatilityImpl(Rate strike) const {
        return volatilityImpl(strike, Evaluation());
    }

It invokes a method with the same name, but a second parameter which is just an instance of our tag class. This parameter is not carrying any information, it is an instance of an empty class declaration. It is just used to make the compiler choose one of the overloaded versions of volatilityImpl appropriate for the kind of evaluation specified. Thuss we have several private overloaded methods

    Volatility volatilityImpl(Rate strike, ZabrShortMaturityLognormal) const;
    Volatility volatilityImpl(Rate strike, ZabrShortMaturityNormal) const;
    Volatility volatilityImpl(Rate strike, ZabrLocalVolatility) const;
    Volatility volatilityImpl(Rate strike, ZabrFullFd) const;

one for each evaluation tag. This produces more efficient code than switch statements dispatching the evaluation mode at runtime. Also the code is more readable because each evaluation is implemented in its own overloaded method – still retaining the possibility of shared code between the different kinds of evaluation of course. No redundant code. And new evaluation modes can be added in a most transparent way.

If you are interested in the full code for the ZABR implementation you can have a look at https://github.com/lballabio/quantlib/pull/168/files. If you are interested in the model itself, there are some slides about that in my presentation at http://quantlib.org/qlws14.shtml. If you find it hard to read and understand Jesper’s and Brian’s original paper, I can send you a rewritten version I created for the masses (and in particular for myself) with all the intermediate steps not crystal clear except you are a risk quant of the year.

As a final and totally unrelated remark. Even if you are not into Jazz music, here is a gem everyone should give a chance (and soon get addicted to): Wonderful Brad Mehldau’s album Where Do You Start http://en.wikipedia.org/wiki/Where_Do_You_Start. The fragile-pessimistic-yet-easy-funky Got Me Wrong, the probably coolest (sorry Jimi) version of Hey Joe ever heard (though I did not listen to all of the apparently more than 1700 versions out there https://heyjoeversions.wordpress.com/hey-joe-versions/), the simply beautiful Samba e Amor evolving into Brad’s own Jam, the so much sensible and consoling Time Has Told Me. Pure art. Give it a try.

Tag Dispatching, ZABR and Where Do You Start

Adjoint Greeks II

Welcome back. I hope you all had some nice Christmas days. I surely had. Still some time to quantlib a bit. In my last post (https://quantlib.wordpress.com/2014/12/20/adjoint-greeks/) I wrote about automatic differentiation (AD) and applied it to delta and vega computation in the Black76 model. Since the results were promising I started to work more seriously on my adjoint branch (https://github.com/pcaspers/quantlib/tree/adjoint). It is still in a proof of concept state. The first milestone would be an engine that computes adjoint interest rate deltas for QuantLib::Swap instruments. And guess what, I am almost there. It is hell of a lot of work though and if someone is interested to help, please contact me. You will get intimate with large parts of the QuantLib code and get your hands dirty at the same time with a specific goal to reach.

The essential work is to replace all quantities w.r.t. which we potentially want sensitivities by a generic type T. Then we can plug in a double if we want to work in classic mode or the active type of an AD framework like the one we already know from the CppAD library

CppAD::AD<double>

in order to leverage automatic differentiation in QuantLib. By the way CppAD’s author Brad Bell has replaced the error function with the one from the std:: namespace (it has become part of the C++11 standard). He helped me also with some cmake problems, very nice. At the moment I only work with CppAD, but the changes in the library are not specific to this framework, so other tools (at least those relying on operator overloading) could be used as well. An important point to keep in mind, don’t make yourself dependent on a single third party library. It is enough that we are all dependent on QuantLib, isn’t it. Anyway, all the CppAD specific things I added are currently collected in one, small, single header file ql/qlcppad.hpp.

To get a flavour of what it means to template’ize QuantLib, look at the example of a simple quote. The public part now looks as follows

template <class T = Real> class SimpleQuote_t :
 public Quote_t<T> {
  public:
    SimpleQuote_t(T value = Null<Real>());
    T value() const;
    bool isValid() const;
    T setValue(T value = Null<Real>());
    void reset();
...

The templated version of the class is market by an underscore t. Actually the old class does not exist any more, but is rather retrieved by a typedef

typedef SimpleQuote_t<Real> SimpleQuote;
...

to keep existing code compiling without any changes (well, not quite in all cases, unfortunately, but see below for this). On another hand you can now instantiate a simple quote using an AD type with another typedef

typedef SimpleQuote_t<CppAD::AD<double>> SimpleQuoteAD;
...

We can then put this T – simple quote into a T – rate helper and create a T – piecewise yield curve, which will bootstrap the curve using a T – brent solver so that you can finally retrieve zero rates as AD doubles (using a T – linear interpolation, not to forget) and at the same time the sensitivity of zero rates to the input market quotes.

This is what I am trying to demonstrate today. It is not exactly the milestone example from above, but technically very close. The many T’s from above already suggest that along the way you have to adapt a lot of code. And this is not always funny. If you have an error in your templated code, the compiler will often just vomit a big flood of long, unreadable error messages over your compiler window. If you are lucky, at some secret point in that ugly thick soup there is a tender hint of what might actually be the true source of the problem. Then once the code compiles with double‘s this does not at all mean that it does so with AD<double>‘s.

What is more is that we have a lot of special things in QuantLib, like the Null type which for double‘s is implicitly converted to a certain value (namely std::numeric_limits<float>::max) to mark invalid values. This has to be made consistent with an AD double null type. Taking this example we have to introduce a Null<CppAD::AD<double>> which starts not too difficult with

template <class Base> class Null<CppAD::AD<Base> > {
  public:
    Null() {}
    operator CppAD::AD<Base>() const {
        return CppAD::AD<Base>(static_cast<Base>(Null<Base>()));
    }
...

but since inside CppAD conversion from any type T to CppAD::AD<Base> (replace Base by double if you want) is done like this

template <class Base>
template <class T>
inline AD<Base>& AD<Base>::operator=(const T &t)
{	return *this = Base(t); }

i.e. via the Base type, we need also a conversion from our new Null type above to Base, which can be done as follows

..
    // this is needed, because in ad_assign.hpp line 124ff
    // assignment from T to AD<Base> is done via conversion from T
    // to Base and then to AD<Base>. If for example
    // T = Null<CppAD::AD<double>> we need to be able to convert
    // to double so that then conversion to AD<double> from this
    // works.
    operator Base() const { return static_cast<Base>(Null<Base>()); }
...

and which does the trick in the end.

Some structural problems arise, because when templating you effectively turn parts of the library into a header-only version, that is, the implementation moves from the cpp to the hpp files. This is because the templated code can only be instantiated when compiling actual client code. Or you instantiate a list of specific types in the library itself, but then you tie yourself to exactly these implementations (e.g. support for CppAD in our context).

At least for iborcoupon.hpp, cmscoupon.hpp and couponpricer.hpp I arrived at circle references, which were not solvable by forward declarations. So I had to split these files into two parts each, one base part, that is included internally at some places in the library instead of the whole thing and a second part with the same name as the whole thing, which in turn includes the base part, to stay backward compatible.

Another source of many headaches was the already highly generic territory around the piecewise yield curve. I really admire this part of the library, it is beautiful and works incredibly robust. But it is challenging to adapt this piece of art to enable AD. In the end I still wanted to write something like

 
PiecewiseYieldCurve<ZeroYield, 
                    Linear,
                    IterativeBootstrap, 
                    CppAD::AD<double>> curve(referenceDate, 
                                             instruments, 
                                             Actual365Fixed());
...

to initialize a curve. The only change is the last template parameter, which can be set optionally to an AD double type (if not given, it is just assumed double, so replicating the old behaviour).

One real non backward consistent change I introduced here is that factory classes for interpolations are now templated. The expression Linear therefore has a different meaning than before, namely taking a template type T that should be used for the calculations. The pro is that in generic classes like the piecewise yield curve we can just still write Linear and behind the scenes the type T (taken from the last template parameter of PiecewiseYieldCurve) is plugged into the factory template so that it can be done what has to be done. The con is that if the factory is explicitly used to create an interpolation object, we have to write

Linear<Real>

now instead of Linear alone. But assuming that the factory classes are mostly used as template parameters and not to actually instantiate classes in client code, this is hopefully not a huge change. Well, let’s see what Luigi has to say about this … 😉

Here is the new piecewise yield curve declaration, compared to the original version now an even nicer example of what you can do with templates in C++ …

template <template <class> class Traits, template <class> class Interpolator,
          template <class, class> class Bootstrap = IterativeBootstrap,
          class T = Real>
class PiecewiseYieldCurve
    : public Traits<T>::template curve<Interpolator>::type,
      public LazyObject {
...

Not only the Interpolator, but also the Traits (which are the Discount, Forward, ZeroYield kind of specifications) and the Boostrap class (IterativeBootstrap or LocalBootstrap) now have a new template parameter.

But to summarize, from outside everything looks and is used as before. If you want to use an AD double type you just have to add a new template parameter to the PiecewiseYieldCurve construction, that’s all.

Another limitation I came across is that there are not templated typedefs in C++. So for example the RateHelper typedef changes to

template <class T> struct RateHelper_t {
    typedef BootstrapHelper<YieldTermStructure_t<T>, T> Type;
};

and the final rate helper types (classic and AD’ized versions respectively) are retrieved via

typedef RateHelper_t<Real>::Type RateHelper;
typedef RateHelper_t<CppAD::AD<Real>>::Type RateHelperAD;
...

i.e. with an additional ::Type suffix.

Regarding templated functions, the type can generally be deduced from the function parameters, so that we can still write code like

Real z = 1.0;
if (close(z, 0.0)) {}

(an exceptionally dumb code example), although the declaration of close has changed to

template<class T> bool close(T x, T y);

However this is not totally backward compatible, because the expression close(z,0) was legal code before and now the compiler goes “hey, I need to deduce a type T here and you are giving me an int and a double, I am confused, I can not work like this, please focus a bit on what you are doing, will you”. This was silently resolved before by an implicit conversion of 0 to a double. There was actually one line in the test suite (or the examples, I don’t remember) which exactly had this problem. Easy to solve (change 0 to 0.), but code that compiled before now doesn’t which is always a major thing. We should say that the code was illegal ever since, but accidentally compiled in the past.

A deeper and more interesting issue in the same direction is this one here:

template <class T = Real>
void setCouponPricer(
    const typename Leg_t<T>::Type &leg,
    const boost::shared_ptr<FloatingRateCouponPricer_t<T> > &pricer) {
...

An utility function which sets coupon pricers. You are already not scared anymore by the first argument’s typename and ::Type stuff, are you ? Now look in the Gaussian1dModel.cpp example file, lines 549ff:

...
        const Leg &leg0 = underlying4->leg(0);
        const Leg &leg1 = underlying4->leg(1);
        boost::shared_ptr<CmsCouponPricer> cmsPricer =
            boost::make_shared<LinearTsrPricer>(swaptionVol, reversionQuote);
        boost::shared_ptr<IborCouponPricer> iborPricer =
            boost::make_shared<BlackIborCouponPricer>();

        setCouponPricer<Real>(leg0, cmsPricer);
        setCouponPricer<Real>(leg1, iborPricer);
...

Everything looks usual here, altohugh Leg, CmsCouponPricer, IborCouponPricer, BlackIborCouponPricer are again all merely typedef‘s for the new template versions with T = Real as above. Maybe interesting to mention, LinearTsrPricer is not converted yet, so old code can be mixed with new one, even if the both worlds are connected by an inheritance relationship. Which is good.

But why does setCouponPricer need the explicit specification of the double type ? This is because the compiler does not recognize that a boost::shared_ptr<U> is a specialization of a boost::shared_ptr<V>, even if U (CmsCouponPricer) specializes V (FloatingRateCouponPricer). He (or she ? … C++ compilers are more likely female creatures, aren’t they ?) doesn’t know that boost::shared_ptr is some kind of a pointer, eh, it is just some user defined class. You may wonder (at least I did), why this kind of code worked until now anyway in other, non-templated-function-contexts. The reason is that implicit conversions are defined for these cases in the boost::shared_ptr class. But these are not even noticed by the compiler when infering template types. Bad luck, she is a bit picky here. Actually this problem can be worked around by using a generic boost::shared_ptr<G> second argument in setCouponPricer (which accepts everything that is a boost shared pointer) and using some very fancy generic programming stuff to check that G is actually a FloatingRateCouponPricer_t during compile time. This is something for later thoughts and at leat one blog post alone.

Time for the full code example. I bootstrap a toy curve from 3 deposit quotes with maturities 1m, 2m, 3m and compute the sensitivity of the 3m zero rate to the underlying deposits’ market quotes. Depending on the self documenting macro YES_I_WANT_TO_USE_AD either AD or a simple finite difference approximation will be used in our example.

#include <ql/quantlib.hpp>
#include <boost/assign/std/vector.hpp>

using namespace QuantLib;
using namespace boost::assign;

// comment or uncomment this macro
//#define YES_I_WANT_TO_USE_AD

// cppad utilities
#ifdef YES_I_WANT_TO_USE_AD
#include <ql/qlcppad.hpp>
#endif

int main() {

// define the double type to be used

#ifdef YES_I_WANT_TO_USE_AD
    std::cout << "Example with AD enabled" << std::endl;
    typedef CppAD::AD<double> dbltype;
#else
    std::cout << "Example with AD disabled, use finite differences"
              << std::endl;
    typedef double dbltype;
#endif

    // some typedefs to keep notation simple

    typedef RateHelper_t<dbltype>::Type RateHelperAD;
    typedef DepositRateHelper_t<dbltype> DepositRateHelperAD;
    typedef SimpleQuote_t<dbltype> SimpleQuoteAD;
    typedef Quote_t<dbltype> QuoteAD;

    // the reference date

    Date referenceDate(2, January, 2015);
    Settings::instance().evaluationDate() = referenceDate;

    // declare the independent variables (sample deposit quotes)
    std::vector<dbltype> x(3);
    x[0] = 0.0035;
    x[1] = 0.0090;
    x[2] = 0.0121;

#ifdef YES_I_WANT_TO_USE_AD
    CppAD::Independent(x);
#endif

    auto rate1m = boost::make_shared<SimpleQuoteAD>(x[0]);
    auto rate2m = boost::make_shared<SimpleQuoteAD>(x[1]);
    auto rate3m = boost::make_shared<SimpleQuoteAD>(x[2]);

#ifndef YES_I_WANT_TO_USE_AD
    Real h = 1e-4;
    auto rate1mp = boost::make_shared<SimpleQuoteAD>(x[0] + h);
    auto rate2mp = boost::make_shared<SimpleQuoteAD>(x[1] + h);
    auto rate3mp = boost::make_shared<SimpleQuoteAD>(x[2] + h);
#endif

    // build a piecewise curve

    RelinkableHandle<QuoteAD> quote1m(rate1m), quote2m(rate2m), quote3m(rate3m);

    auto dp1m = boost::make_shared<DepositRateHelperAD>(
        quote1m, 1 * Months, 2, TARGET(), ModifiedFollowing, false,
        Actual360());

    auto dp2m = boost::make_shared<DepositRateHelperAD>(
        quote2m, 2 * Months, 2, TARGET(), ModifiedFollowing, false,
        Actual360());

    auto dp3m = boost::make_shared<DepositRateHelperAD>(
        quote3m, 3 * Months, 2, TARGET(), ModifiedFollowing, false,
        Actual360());

    std::vector<boost::shared_ptr<RateHelperAD> > instruments;
    instruments += dp1m, dp2m, dp3m;

    PiecewiseYieldCurve<ZeroYield, Linear, IterativeBootstrap, dbltype> curve(
        referenceDate, instruments, Actual365Fixed());

    std::vector<dbltype> y(1);

    Real t3 = curve.timeFromReference(dp3m->latestDate());
    //Real t3 = 2.5 / 12.0;

    y[0] = curve.zeroRate(t3, Continuous).rate();

    std::cout << std::setprecision(16);
    std::cout << "zero rate = " << y[0] << std::endl;

#ifdef YES_I_WANT_TO_USE_AD
    // define the operation sequence
    CppAD::ADFun<Real> f(x, y);
    std::vector<Real> dw(3), w(1, 1.0);
    // gradient computation
    dw = f.Reverse(1, w);
    std::cout << "gradient = (" << dw[0] << "," << dw[1] << "," << dw[2] << ")"
              << std::endl;
#else
    // finite difference values
    quote1m.linkTo(rate1mp);
    Real y0_1 = curve.zeroRate(t3, Continuous).rate();
    quote1m.linkTo(rate1m);
    quote2m.linkTo(rate2mp);
    Real y0_2 = curve.zeroRate(t3, Continuous).rate();
    quote2m.linkTo(rate2m);
    quote3m.linkTo(rate3mp);
    Real y0_3 = curve.zeroRate(t3, Continuous).rate();
    quote3m.linkTo(rate3m);
    std::cout << "gradient = (" << (y0_1 - y[0]) / h << "," << (y0_2 - y[0]) / h
              << "," << (y0_3 - y[0]) / h << ")" << std::endl;
#endif

    return 0;
}

Note that the bootstrap process itself (which is a zero search using the Brent solver) is differentiated when using AD !

The output of this code with and without AD enabled is


Example with AD disabled, use finite differences
zero rate = 0.01188296345959088
gradient = (0.04267701227033543,-7.65533469948565e-11,0.9682251688560017)

Example with AD enabled
zero rate = 0.01188296345959088
gradient = (0.04267719604141054,1.036619161164663e-11,0.9682373688299127)

The interpretation is that the 3m (continuously compounded Actual/365Fixed) zero rate grows by 0.04, 0.0, 0.97 basispoints if the 1m, 2m, 3m deposit market quotes (simply compounded, Actual/360) grow by 1 basispoint respectively. Note that the sensitivity on the 1m quote is coming from the two fixing days involved in the deposits.

Lets check if the linear interpolation is correctly taken into account in the derivatives, too. This is also something that really requires AD: While you could do it for linear interpolations by hand, you wouldn’t really want to differentiate the value of say a cubic spline at some arbitrary point by the splines’ values on its pillars, even if this is still realistic to write down. For our test, we replace t3 by two and a half month (see the commented line in the code). We get


Example with AD disabled, use finite differences
zero rate = 0.01003550346419732
gradient = (0.05551963176645552,0.5617050580845363,0.3946021819303639)

Example with AD enabled
zero rate = 0.01003550346419732
gradient = (0.05551987078094003,0.5617096566300429,0.3946071568278159)

CppAD is doing a very good job here, it seems. See you again in two weeks. Think about contributing to my adjoint branch !

Adjoint Greeks II

Adjoint Greeks

In this second post I want to write about my very first steps in automatic differentiation (AD). Of course applied to quantitative finance where derivatives are called greeks. And with QuantLib of course. AD is a topic Ferdinando Ametrano mentioned during our dinner during the workshop in Düsseldorf a few weeks ago and indeed it sounded interesting: Take a swap which can have hundreds of deltas nowadays and compute all of them in just one sweep with the complexity of at most 4 (four, vier, quattro) present value calculations. Sounds like sorcery, but actually is nothing more than an application of the chain rule of differentiation. This says

\frac{d}{dx} (f \circ g) = \left( \frac{d}{dy} f \circ \frac{d}{dx} g \right)

which means if you want to compute a derivative of the result of a chain of computations by an input variable you can do so by computing the derivatives of the single computations and combining the results appropriately. Note that the formula is more general than it might look at first sight. It is true for functions of several variables, possibly with several outputs also. Derivatives are then matrices (Jacobi matrices) and the \circ means matrix multiplication.

Actually my analysis professor at university introduced the derivative as a bounded linear operator between Banach spaces (which can have infinite dimensions, i.e. infinitely many input and output variables, which do not need to be countable even, boom, brain overflow in the second year …) approximating the function in question with order at least o(\lVert h \rVert). Pure fun, only outperformed by his colleague who started the first lesson in linear algebra by defining what a semi-group is. It is only now that I am working in banks for more than 15 years that I really appreciate this kind of stuff and wished I’d have stayed in academia. Well, my problem, not yours.

Anyway. There a a lot of AD frameworks for C++ and a good starting point is surely the website http://www.autodiff.org/. What they all do is taking your (C++) program, looking at each of the operations and how they are chained together and then compute derivatives using exact formulas in each step.

This is exactly what we all learned at school, namely how to differentiate sums, products, quotients of functions, what the derivative of x^n is, or how to differentiate more complicated functions like e^x or \sin(x) and so on. And how to put everything together using the chain rule! In AD language this is more precisely called forward mode differentiation. There is also a backward mode working from the outside to the inside of a chain of functions. This is a bit unusual and and it useful to work through some examples to get the idea, but in the end it is also nothing more than applying the chain rule. The decision what mode should be used is dependent on the dimensions n and m of the function

f: \mathbb{R}^n \rightarrow \mathbb{R}^m

If m is big and n is small, you should use the forward mode to compute the derivatives of the m output values by the n input values in the most efficient way. If m is small and n is big, you should use the reverse mode. In our application above, computing hundreds of interest rate deltas for a swap, m is one and n is a few hundreds, so this is a problem for the reverse mode.

There are two main procedures how the frameworks do automatic differentiation: One way is source code transformation (SCT). I did not look into this, but as far as I understood the idea is that your source code is enriched in order to gather the necessary information for derivatives computation. The other way is operator overloading (OO). This means that your standard numeric type, typically our beloved 53bit

double

is replaced by a special type, say (notation stolen from the framework I will introduce below)

AD<double>

and each operation (like

+, -, *, / 

is overloaded for this new type, so that during computation of the original code, the operation sequence can be taped and afterwards used for derivatives computation. For the younger readers among you who do not know what “taping” means, this refers to this beautiful kind of device http://upload.wikimedia.org/wikipedia/commons/c/c7/TG1000.ogv (I could watch this movie for hours …). The red button at the lower right corner of the device is for recording I guess. Usually you would have to press two buttons “record” and “play” at the same time to start recording. Only pressing the record button would not work. You had to press the record button a little bit earlier than the play button, because in play mode the record button was blocked.

Now the hard part of this post begins, I am trying to get some AD running for a relevant example. The framework of my choice is CppAD (http://www.coin-or.org/CppAD/). An exciting yet easy enough first example is probably the Black76 formula. This is implemented in blackformula.hpp and the interface looks as follows

/*! Black 1976 formula
    \warning instead of volatility it uses
             standard deviation,
             i.e. volatility*sqrt(timeToMaturity)
*/
Real blackFormula(Option::Type optionType, Real strike, 
                  Real forward,
                  Real stdDev, 
                  Real discount = 1.0,
                  Real displacement = 0.0);

The first step is to do something to allow for our AD-double-type. A possible solution is to turn the original implementation into a templated one like this

/* ! Black 1976 formula, templated */
template <class T = Real>
T blackFormula(Option::Type optionType, T strike, 
                T forward, T stdDev,
                T discount = 1.0, 
                T displacement = 0.0) {
/* ... */
}

That’s not all, unfortunately. In the function body we have a line

T d1 = log(forward / strike) / stdDev + 0.5 * stdDev;

In order to have the logarithm differentiated correctly we have to make sure that if T is the AD type, the log function is taken to be the special implementation in the CppAD library (and the std implementation otherwise). To do so I made both implementations visible by importing them into the current namespace by

using std::log;
using CppAD::log;

Now depending on T being the standard double or the AD type, the appropriate implementation is used. First problem solved.

There is another function, the cumulative normal, used in the black formula, with no implementation in CppAD. Well no problem, there is the error function at least, so I can just replace the cumulative normal with the error function. The first tests were disappointing. For a simple call I got slightly different premia (about 0.01 basispoints different), depending on whether using the conventional double or the AD double. The reason became clear soon (after some hours of testing in the wrong direction): In the CppAD implementation, the error function uses an approximation, which is fast, but inaccurate (relative error of 10^{-4}).

Not that I very much care for super precise results in the context of finance, but the acceptance of the implementation would probably suffer a bit when Black prices are not matching reference results I guess …

Ok, I wrote an email the author of CppAD and complained. He promised to do something about it. In the meantime I decided to help myself by converting the QuantLib implementation of the error function into a templated version as well and using this instead of the CppAD error function. The conversion process already felt a bit more standard now, so I guess the rest of the library will just go smoothly. The code snippet in the Black formula then looks as follows:

ErrorFunction<T> erf;
T nd1 = 0.5 + 0.5 * erf(optionType * d1 * M_SQRT1_2);
T nd2 = 0.5 + 0.5 * erf(optionType * d2 * M_SQRT1_2);

Note the error function also gets the template parameter T, which will be double or AD double in the end. Now before we start to write some client code using the new black formula there is some subtlety I want to mention. Functions are often not simple chains of operations but contain conditional expressions sometimes, like

z = x > y ? a : b;

The max function is another example of an (disguised) expression of this kind. Loops also, but they are even a more complex variant actually.

The thing about these conditional expressions is the following: You can just leave them as they are under automatic differentiation, no problem. But as I already mentioned the process is two step: The first is to tape the operation sequence. This is done during an evaluation of the function with certain fixed input parameters. The second step is the derivative computation. The result is the derivative evaluated at the original input parameters.

But you can evaluate the derivative at a different point of input parameters without retaping the operation sequence! This is possible only if the function evaluation is completely captured by the CppAD tape recorder, including all possible paths taken depending on conditional expressions. This is supported, but you need to adapt your code and replace the conditionals by special functions. In the black function we have for example

if (stdDev==0.0)
      return std::max((forward-strike)*optionType,
                      Real(0.0))*discount;

This needs to be replaced by

T result0a = max<T>((forward - strike) * optionType,
                     0.0) * discount;

(in particular not just terminating with returning the value for the border case of zero standard deviation) and at the end of the function

T ret = CondExpEq(stdDev, T(0.0), result0a,
                  CondExpEq(T(strike), T(0.0), result0b,
                            result));
return ret;

Here another conditional is nested for the special case of zero strike. Finally the max function above is also implemented using the CppAD conditional CondExpEq, because not present natively in CppAD:

namespace CppAD {

    template<class Base> CppAD::AD<Base> 
           max(CppAD::AD<Base> x,CppAD::AD<Base> y) {
                return CppAD::CondExpGt(x,y,x,y);
           }

}

In order to keep the code running without CppAD types we have to add an implementation for CondExpEq for regular double types, like this

// used for cppad-ized function implementations
inline double CondExpGt(double x, double y, 
       double a, double b) { return x >= y ? a : b; }
inline double CondExpEq(double x, double y, 
       double a, double b) { return x == y ? a : b; }

The nice thing of all this is that later on we could tape the operation sequence once and store it for reuse in all following computations of the same function.

I just started trying to make QuantLib cooperate with CppAD, so the things above are initial ideas to keep QuantLib backward compatible on the one hand and avoid code duplication on the other hand. And it is hell of a lot of work I guess with many more complications than can be seen so far in this little example, to be realistic.

But let’s try out what we have done so far. Here is a code example that works with my new adjoint branch https://github.com/pcaspers/quantlib/tree/adjoint

#include <iostream>
#include <vector>
#include <ql/quantlib.hpp>

#include <ql/qlcppad.hpp>

using namespace QuantLib;

int main(void) {

    // parameters

    Real tte = 10.0;
    Real sqrtt = std::sqrt(tte);
    Real strike = 0.03, forward = 0.03, volatility = 0.20;

    // declare the inpedendent variables for which
    // we want derivatives

    std::vector<CppAD::AD<Real>> x(2);
    x[0] = forward;
    x[1] = volatility * sqrtt;
    CppAD::Independent(x);

    // tape the operation sequence

    std::vector<CppAD::AD<Real>> y(1);
    y[0] = blackFormula2<CppAD::AD<Real>>(Option::Call, 
                                        strike, x[0], x[1]);
    CppAD::ADFun<Real> f(x, y);

    // compute the partial derivatives using reverse mode

    std::vector<Real> dw(2), w(1, 1.0);
    dw = f.Reverse(1, w);

    // output the results

    std::cout << "price = " << y[0] << " delta = " << dw[0]
              << " vega = " 
              << dw[1] / sqrtt * std::sqrt(100.0) 
              << std::endl;

    // cross check the results against the classic 
    // implementation

    BlackCalculator c(Option::Call, strike, forward, 
                      volatility * sqrtt);
    std::cout << "price = " << c.value() << " delta = " 
              << c.deltaForward()
              << " vega = " << c.vega(tte) << std::endl;

    return 0;
}

I kept the original blackFormula function as is for the moment (for testing purposes) and implemented the new, templated version as blackFormula2. I declared the forward and the standard deviation input as independent variables w.r.t. which I want partial derivatives (i.e. forward delta and vega). The strike input parameter (as well as the discounting and displacement which are invisible here) is kept as a regular double variable. Note that in the black formula implementation it is converted to the CppAD double type, but not included as an independent variable into the operation sequence. We could include it though, getting then the derivative of the call price by the strike (which is useful also, because this is the negative digital price).

The results are compared against the original implementation in BlackCalculator. The output of the test code is

price = 0.007445110978624521 delta = 0.6240851829770754 vega = 0.03600116845290408
price = 0.007445110978624521 delta = 0.6240851829770755 vega = 0.03600116845290408

Good, works. And we did not need to implement the analytical formulas for forward delta and vega to get exactly the same result up to machine epsilon. Sorcery. Needless to say that CppAD can compute higher order derivatives as well, so gamma, vanna, volga, speed, charm and color are just a stone’s throw away.

Have a nice christmas everyone, see you back healthy in 2015 !

Adjoint Greeks

Simulated Annealing Optimization

First of all, as a warning, if you want to read a good blog about QuantLib you should probably go to other places like this one http://implementingquantlib.blogspot.de. However I thought it could be fun to relax from coding all the time by also writing a bit about it. For this first post I take one of the “Erlkönige” from my presentation at the QuantLib User Meeting 2014 http://quantlib.org/qlws14.shtml (hey, why am I not on the picture ?), namely a new optmizer class https://github.com/lballabio/quantlib/blob/master/QuantLib/ql/experimental/math/simulatedannealing.hpp. (If you don’t know what an Erlkönig is, look at slide two of my presentation.) But let’s start with defining a target function. This is done by deriving from CostFunction and implementing (at least) the value and values methods (which are quite similar in our one dimensional example here)

    class target : public CostFunction {
      public:
        target() {}
        Real value(const Array &a) const {
            Real x = a[0];
            return x * x / 100.0 + 0.3 * sin(x) * sin(x);
        }
        Disposable<Array> values(const Array &x) const {
            Array res(1);
            res[0] = value(x);
            return res;
        }
    } aTarget;

This function looks like this (you should click on the image for a better view)

simann_target

It has many local minima, e.g. some near -6, -3, +3, +6 and one, unique global minimum at 0. Many classic optimizers wouldn’t find this global minimum unless you give them a good start value which lies nearby the solution. Our new experimental global optimizer in QuantLib follows Chapter 10.9 in Numerical Recipes in C. This is based on the Nelder Mead algorithm (aka Simplex in the library, not to be confused with the linear programming thing). This adds some exponential distributed noise to the target function which does the trick of finding the global minimum with probability one when applied carefully.

But let’s start with the deterministic version of the optimizer. You can create the optimizer with

SimulatedAnnealing<> sa(0.2,t0,decay,steps);

where I set t0=0.0, decay=0.0 and steps=1 (which just means no noise). The first parameter is the Simplex lambda used to construct the vertices of the simplex which in general are the start value of the optimization (see below), this value plus (1,0,0,…) times the lambda, this value plus (0,1,0,…) times lambda and so on. Since we have a one dimensional problem our simplex consists of two vertices only.

I start the optimization at x=4.8, a bad start value as we will see soon.

    Array initial(1);
    initial[0] = 4.8;

I put no constraint on the problem

    NoConstraint nc;

allow for a lot of iterations (1000000, that’s 1 million, yes !), require 500 stationary state iterations before assuming convergence and set the epsilon for root, function value and gradient to 1E-8

    EndCriteria ec(1000000,500,1E-8,1E-8,1E-8);

As a side note, it is up to the actual optimizer what of these criteria (if any at all !) are used. The simulated annealing optimizer checks for max and stationary state iterations by comparing the simplex size with the root epsilon and ignores function and gradient epsilon. As another example, the Levenberg Marquardt implementation takes not less than three tolerance parameters in its own constructor while only using max iterations and function epsilon from the end criteria and ignoring stationary state iterations and the other two tolerances.

Now create a problem out of the target function, the constraint and the start value

    Problem p(aTarget,nc,initial);

and we are ready to run the optimization

    EndCriteria::Type ret = sa.minimize(p,ec);

You can retrieve the best found value for the independent variable by p.currentValue() and the function value by p.functionValue(). The variable ret contains the status of the optimization which can be one of the following enum values defined in EndCriteria

        enum Type {None,
                   MaxIterations,
                   StationaryPoint,
                   StationaryFunctionValue,
                   StationaryFunctionAccuracy,
                   ZeroGradientNorm,
                   Unknown};

self explaining in a way, aren’t they.

We visualize the optimization run by plotting one (the first) vertex of the simplex (plotting both wouldn’t be instructive) for each step in the optimization.

simann_det

I only plot the first 25 steps of the optimization because after that it gets boring (you see the pattern, don’t you). As I said the starting value was bad, just a bit left from the top of the second hill to the right from the global minimum. Function values are colored here, the minimum (at 0) is black, the next best local minimum blue (near +-3), the next purple (near +-6). The algorithm (the green dots) walks over the top of the hill and then runs into the valley right from +6, there converging nicely, but to a local minimum only.

The green dots lie exactly on the surface of the target function, indicating that the function value without any add on is used for optimization.

Now the idea of simulated annealing comes into play. Instad of zero noise we start with a temperature of 0.05. This means “noise” is added to the target function value during optimization. The noise is defined to be expoentially distributed with parameter 1 / temperature, i.e. with expectation and standard deviation equal to the temperature.

This lifts our green dots from the surface in a random way and in particular helps them get out of valleys that might turn out to be local ones only. During the optimization the temperature is decreased, converging to zero, so that the deeper valleys become visible to the algorighm while the more shallow ones are still filled by the noise. If we start with a temperature high enough and decrease it slow enough we can hope to find the global minimum this way. Of course the art here is to find a good scheme for the cooling and balancing this scheme against the number of necessary function evaluations.

Back to our first try with starting temperature 0.05. We choose to decrease this temperature by a factor of 1 – decay, decay = 0.0001 every step (step = 1). We plot the first 2000 steps of the algorithm’s run (wow, note how many steps are performed compared to the deterministic case).

simann_1

After around 500 steps thanks to the noise the dots jump over to the main valley and stay there, thus finding the global minimum ! The light blue grid floating over the target function surface visualizes the expected noise added to the original target function.

I should mention here how you can explicity specify the random number generator to generate the noise (it has to be one with a next() method giving a U(0,1) random variable).

    SimulatedAnnealing<> sa(0.2,t0,decay,steps,MersenneTwisterUniformRng(42));

would choose the Mersenne twister rng (which is also the default if nothing is given), with seed 42. The default seed is 0 meaning that the system timer is used for initialization, which is bad if you want to reproduce results.

Let’s do the same again, but now cooling down the whole thing a bit faster (decay = 0.0003). The result is as follows

simann_2

Oops. The dots jump over again, but only the the next valley, only giving a local minimum again. We cooled down too fast to find the global minimum in this case.

In practice one would probably start with a much higher temperature, like 0.25. With that and the same decay rate of 0.0001 per step we get

simann_3

We are converging to the global minimum again ! However we already have 30000 steps plotted here and we need even more to actually converge (like 400000 with these parameters !). We should surely fine tune our parameters before using them in a real life application.

Looking at the same picture more from a bird’s view we see the structure more clearly

simann_4

While all 5 valleys are covered in the first few hundred steps, after a while only the three inner ones are visited and finally the algorithm stays in the inner, best valley.

That’s all for today, it was fun to write this. Anyone read until here ?

Appendix. Here is the full code example, if you want to try it by yourself.

#include <ql/quantlib.hpp>

using namespace QuantLib;

int main(int, char *[]) {

    class target : public CostFunction {
      public:
        target() {}
        Real value(const Array &a) const {
            Real x = a[0];
            return x * x / 100.0 + 0.3 * sin(x) * sin(x);
        }
        Disposable<Array> values(const Array &x) const {
            Array res(1);
            res[0] = value(x);
            return res;
        }
    };

    target aTarget;

    double t0 = 0.25;
    double dec = 0.0001;
    int steps = 1;

    SimulatedAnnealing<> sa(0.2, t0, dec, steps, MersenneTwisterUniformRng(42));

    Array initial(1);
    initial[0] = 4.8;

    NoConstraint nc;
    EndCriteria ec(1000000, 500, 1E-8, 1E-8, 1E-8);

    Problem p(aTarget, nc, initial);

    EndCriteria::Type ret = sa.minimize(p, ec);

    std::cout << "found x=" << p.currentValue()[0]
              << " / y=" << p.functionValue() << " with ret code " << ret
              << std::endl;

    return 0;
}

The output running this should look like this

found x=-7.10543e-14 / y=1.5651e-27 with ret code StationaryPoint
Simulated Annealing Optimization