Poor Man’s Knot Reduction

There are many great implementations of unit-root tests in packages such as R, SAS, etc.  For various reasons I need to implement this in Java.   Probably the best known test of this sort is the Augmented Dickey Fuller test.   This is most often used in determining whether a process is non-stationary or rejecting in favor of stationarity.

Loosely speaking, to say a process is stationary implies that all of its moments are largely time invariant.   So for example the mean and variance of a stationary process would be (largely) constant across time.

We know that AR(1) processes with AR coefficient α = 1 have unbounded variance that grows with time.  This can be seen by evaluating its variance:

The ADF test therefore tests the hypothesis that a given series expressed as a AR(p) process has a coefficient of 1 (ie unit root).   The ADF expands upon the original Dickey Fuller test to allow for additional AR lags:

The null hypothesis is that β = 0 (non-stationary process) and alternative hypothesis (may be stationary) otherwise.   Note that  β = 0 in this differenced equation is equivalent to α = 1 in the non-differenced AR(1) equation.

Samples (i.e. the difference between the observed and null hypothesis value) in hypothesis testing often follow a student-t distribution.   In the case of the Dickey Fuller test the distribution is not student-t and therefore presents a problem.  To calculate the p-value for the ADF hypothesis, one must refer to special Dickey-Fuller tables, as there is no known closed form method to calculate this (that I am aware of).

The p-value, we know to be the probability of obtaining a result at least as extreme (far away from the hypothesis value) as the one obtained in our sample.  Basically this involves calculating:

Then to determine the p-value for the ADF, we need to determine the distribution of values around hypothesis β = 0.  The tables provide precalculated mappings between test values and the cumulative probability.

Problem
The Dickey-Fuller tables are sparse, particularly at the tails.   For my problem I was interested in the relative “stationarity” of many different series.   For this, I the needed to calculate the Dickey-Fuller pdf on my own with a Monte-Carlo simulation.

Here is one configuration of the pdf for the Dickey-Fuller hypothesis (produced from a monte-carlo simulation):

Of course monte-carlo simulations are not practical on each evaluation of the ADF test.   The results from the MC simulation generate a high resolution density function as a series of (x,y) points.   I could generate distributions for a number of configurations and store a very large 3D table of function values, but is impractical and unnecessary.

I started thinking about an approximating function for the distribution.   Aside from finding functional form for the distribution through trial and error, a general automated approach is to use splines.   A natural spline will fit the function and be piecewise continuous.

Splines, however, do not reduce the dimensionality of the problem.  A natural spline requires the complete set of knot points to reconstitute the function.   I then started thinking about knot reduction techniques.   With knot reduction the idea is: what is the minimum set of knots required to reproduce the target function within some (small) error range.   So if my function currently has 2000 (x,y) points, perhaps I could reduce it to ~10 points?

There have been many papers written on this topic, often involving very complex algorithms.   As I wanted to keep this simple, decided to come up with a “poor man’s algorithm” for knot reduction.

Algorithm
The algorithm starts with a spline consisting of the 2 end-points  and gradually adding knots at specific positions, iteratively reducing the error between the “reduced spline” and the target function.   The new knots are chosen based on the maximum distance from the target function.

The algorithm is not guaranteed to be optimal in that it may have more knot points than the smallest possible reproducing set.  However, it is very fast and is often close to optimal, depending on the function.   The algorithm is as follows:

  1. start with a spline consisting of 2 points (the start and end point)
  2. loop while R^2 < (1 – eps)
    1. add knot at point with maximum distance from target function
    2. compute new spline and R^2 fit metric
  3. at end of loop, set of knots now matches function within epsilon

The effectiveness of the approach can be enhanced by using a better basis function than the hermite cubic and/or by adjusting tensions in addition to knots.

Here is an example of the function to epsilon = 1e-8.  I’ve slowed this down to 1 iteration per second so you can see how it works.   In practice this involves a few micro-seconds:

9 Comments

Filed under strategies

Theory of Computation, Quantum Computing, etc

I’d like to point to a very interesting set of lectures by Scott Aaronson entitled “Quantum Computing Since Democritus”, here. The lectures have a lot of thought provoking content in addition to being entertaining — a style reminiscent of Richard Feynman.

Leave a comment

Filed under strategies

Algorithmic Differentiation

In the previous post I outlined a portfolio utility function with a penalty for risk:

One of the components of this function, E[r], is specified as a weighted geometric average.   By itself I could transform to use a log sum but does not simplify with the added risk term:

I need to construct a gradient and hessian for the above function:

It is not that the 2nd partial derivative is intractable, indeed it just requires the application of the chain rule.  Unfortunately it expands to an enormously complicated expression that grows quadratically with the number of variables.

Computing the derivative via finite differencing works for some applications, but presents problems where the “perturbations” need to be small.   The 2nd derivative using a central weighing scheme:

One can see that the differenced approach is susceptible underflow.   Choosing a value of h < 1e-7 will produce a value which exceeds the precision of a double (consider the denominator).   Once the precision of double is exceeded, results will be meaningless.

Algorithmic Approach
I began to think of a recursive approach to evaluating the nth partial derivative, as I had trouble coming up with a pattern that fit into a single expression for generalized number of periods and dimension.   It turns out that many others have thought along these lines.   This approach is called “Algorithmic Differentiation”.

The above equation can be simplified with respect to just 2 of its N variables for the purposes of calculating 1 coordinate in the hessian as:

We understand how to differentiate the above by means of the chain rule, which states that compositional functions, such as f(g(x)) can be differentiated as f'(g(x)) g'(x).   This implies a series of basic rules:

We can decompose an expression into an (implicit or explicit) expression tree and use the rules for derivatives to recombine to an ultimate result.   For instance, here is a tree for f(x,y) = x y sin(x^2).   (the graph for my function would be much too large to render here):

There are a number of libraries (for C++ and Java) which allow one to express a function in normal-looking code, but implicitly calculate the first or second derivatives at the same time.   A “dual” representing f(x) and f'(x) is tracked through the evaluation of the expression.

Unfortunately, the size of the expression (tree) for my utility function is enormous, particularly for the 2nd derivative.   I will avoid computing the hessian and use the BFGS minimizer for part of my optimization, as will end up being cheaper than evaluating the hessian.

Leave a comment

Filed under strategies

Strategy Allocation

I’m interested in allocating capital across a series of strategies or assets in a way that balances between maximizing return and minimizing drawdown.   I’ve not spent much time thinking about this previously, so this is a work in progress.

Mean-Variance Approach
It has long been traditional to use the mean-variance approach for portfolio allocation.    A problem with the mean-variance approach is that it penalizes excess returns on the “right” (positive) side of the distribution in addition to the “left” (negative).   This is one reason why many consider the Sharpe ratio to be a flawed measure with respect to how investors see risk.

The mean-variance approach penalizes for sections D and C of the distribution.   An investor is generally happy with excess returns (i.e. D), and unhappy with drawdowns in the distribution (namely areas A and C).  We would prefer to accept the reward of the whole right side of the distribution and attempt to minimize the left.

Lower-Moment Frameworks
To remedy this we can use an asymmetric measure of variance to focus on the part of the distribution that represents our risk.   The family of such moment functions are called the “Lower Moments”.    Lower moment functions allow us to focus on sections A and C in the distribution (ie the drawdowns) and not consider the right side as a contribution towards risk.

Lower moment functions are expressed as follows:

The idea, simply, is that the moment (degree m) is computed on the region of the distribution below some threshold (say 0).

For a risk measure, we may want to penalize returns < 0 or prehaps returns < risk free rate.   We would choose τ according to where we want to penalize.

Application to portfolio
Ok, so the lower-moment looks like a good measure of downside risk.  How do we incorporate it?  Let’s set up the problem:

  1. let R represent a matrix of returns
    One row for each period, number of columns = number of assets or strategies.
  2. let w represent the vector of weights for each asset / strategy
  3. let Σd represent the lower-moment covariance of returns with respect to the set of assets or strategies

Assumptions:

  1. let us assume that past performance is indicative of future performance  (we can relax this later)
  2. let us assume an eliptical (i.e. multivariate normal) distribution for negative  returns
  3. let us assume that the drift grows in proportion to variance x time

Ok, those are pretty aggressive assumptions, but they allow a first look at putting this together.

We need to create a utility function that blends the upside and downside, subject to various constraints.  Here f(w) represents the portion of a convex utility function rewarding positive return, and g(w) a function describing the risk penalty.

Let’s formulate the utility as the expected return for the next period (given the prior returns) and the expected negative return component (our risk) over the period:

The first part of our utility E[r] represents the average observed return and the second part the negative drift (our downside risk).    E[r] can be computed in any number of ways, for instance:

  • as a time weighted average of past returns
  • as a regression model with seasonality
  • from a stochastic model calibrated to past returns

The risk penalty is just ½ E[(τ – <w,r>)^2 | r < τ) · Δt, which approximates the expected downside drift over the period represented by Δt.

Now the above is a bit ad-hoc and I may be double counting the upside and downside with E[r].   I’ll have to think on the implications for mixing a lower moment measure and a full moment measure.

There may be some adjustments to make, but have outlined a first pass at penalizing returns and hopefully arriving at a more balanced weight selection.

Addendum
A more generalized approach could be to compose the utility function from a parameterized sum of upper and lower moments (Farinelli and Tibiletti 2002).   I’ve adjusted this a bit to fit my problem:

The number of moments for the lower portion can be different for the upper portion (or simply let the coefficients of some moments be 0).    What I came up with in the prior section is essentially this function for 2 moments with coefficients {1,0} and {0,1/2}.

Addendum 2
The utility function should have been (drift in proportion to sqrt), my bad:

Leave a comment

Filed under portfolio, statistics, strategies

Approximating |x|

I am changing my portfolio strategy to use an interior point optimizer for non-linear problems with constraints.  The approach solves nonlinear problems of the form:

where f(x) is the primary function to minimize and g(x) represents one or more constraints (equalities or inequalities).   There are a number of good papers on interior point methods for solving such problems so will not go into the method here.

In my case one of the constraints is as follows:

As simple as this looks it presents a problem because the interior points algorithm requires that the constraints be twice differentiable.   |x| is continuous but not differentiable at 0.   In formulating the solution I need to construct the gradient (jacobian) and hessian matrices of the constraints.

I began to think of other ways to describe the constraints.  Basically I want to allow my portfolio to allocate both long and short positions between trading periods.  The above constraint treats long and short capital as symmetrical (which suits my purposes).   I.e., if I have just two assets in a portfolio and allocate -0.75% short to one asset and 0.25% long to the other asset, I consider that to be fully allocated (from a risk capital point of view).

I did a few thought experiments, considered variations of:

But this is clearly wrong, in that will allow for allocation > 100%.   I began to think about the derivative function of |x|.   See (in blue |x| and red the derivative):

It occurred to me that the derivative can be asymptotically approximated by the sigmoid function with a shift and accelerated exponent:

By making β large enough we can approach the derivative of |x| to arbitrary resolution.

Unfortunately the integral of the sigmoid function has no solution in R, and only has a solution on the complex plane, meaning that if we are to deploy this approach will have to evaluate g(x) with Σ|x| and the derivatives with the scaled and translated sigmoid function.

Unfortunately, a hybrid approach with |x| for constraint evaluation and sigmoid for the derivatives will lead to instabilities.   One needs to have an approximation to the derivative that has an integral.

Addendum
It turns out this is a common problem in machine learning and optimisation.   Chen and Mangasarian defined an approximation to the sigmoid integral.    The problem can be reformulated as follows.   Break the problem into two pieces, x ≥ 0 and  x ≤ 0.    If we can solve it for x ≥ 0 then we also have a solution for x ≤ 0:

Based on an approximation to the sigmoid integral, they determined (x)+ to be:

As with the sigmoid function, increasing β allows us to get asymptotically close to |x|.   We now have a function for |x| with derivatives defined around 0.

3 Comments

Filed under strategies

Equity Clusters 2

The correlations between daily returns show, sometimes, surprising relationships.  Looking at relationships on 5-day cumulative returns produced clusters more in line with the traditional view.

This is not surprising as weekly returns will have less “noise” than daily.    A more robust measure than correlation needs to be used to determine whether relationships seen on daily or higher-frequency are bona fide.

3 Comments

Filed under strategies

Equity Clusters

I am putting together some portfolios to be auto-traded using a dynamic portfolio asset allocation algo.   I had put together a maximum spanning tree (shown in a previous posting) to observe relationships between securities.

In this iteration have gone further:

  1. Heatmap colors to indicate average volatility levels for a given security relative to others
  2. Maximum spanning tree clusters to reveal which diversification group a given asset belongs to
  3. Edge thickness indicates strength of relationship

I am interested in both the diversity and the volatility profile of the asset pool.   I will pick the majority of assets from the set with mid-range volatility (oranges), as opposed to low vol (reds), and high vol (yellows).    Classifying the asset set into clusters based on correlations provides an automated way of observing the diversification group the asset belongs to.

Algorithm
The algorithm is loosely as follows:

  1. calculate lower-triangular correlation matrix of returns for, say, s&p 500 stocks
  2. sort in descending order by correlation
  3. set up graph structure
  4. loop through correlations selecting pairs of assets
    1. if neither in graph add as new cluster pair
    2. if one in graph and other not, attach new to existing
    3. if both in graph but size of clusters < min cluster size, merge clusters
  5. repeat until all assets accounted for and all clusters have size >= min cluster size
  6. annotate & plot

Clusters (daily returns)
Here are the clusters the algorithm produced:

Of course this can be applied to any asset set.  Thought is a useful visualization, though there are many other dimensions of interest.

4 Comments

Filed under strategies

End of a Decade

It is tempting to say that the last decade has been an interesting time to live in on Wall Street.   But that would belie the observation that the 80s, and the 90s also came with dramatic changes to the markets and for practitioners.

I was not on the street during the 80s, but the 80s were really the start of the acceleration point towards model and technology driven trading.   The 70s brought us the famous Black-Scholes model, the 80s a variety of synthetic instruments such as swaps, swaptions, CMOs, etc.    The 80s really had not fully ushered in the leveraging of technology as it would do in the 90s and the ultimate step-up in the 2000’s.

The 90s
Arguably without technology we would not have gotten much farther than basic option modeling.   The 90s saw the rise of quantitative modeling and financial application of technology unlike any time previous.   To be a quant or quantitative developer in those days was exciting and rewarding.

I remember joining Lehman Brothers in the early 90s.  Swaps and swaptions though about 10 years old at that point still were traded on the back of HP calculators or maybe lotus 1-2-3 spreadsheets.   The spreads were wide, in the 10s of basis points, today in fractional basis points.

With the full embrace of technology in investment banks it was a matter of time before we saw an explosion in complexity in exotic derivatives.   Over time many of these exotics would become mainstream “vanilla” products.   The interest rate markets moved in the direction of more volume in vanillas and more complexity in exotics.

Of course the equity markets also developed out derivative products, but more interestingly were “quietly” building out increasing sophistication on the “program trading” front (as it was called in those days).   This was mostly unique to the equities markets, whereas the interest rate and FX markets continued to be largely OTC.

Program Trading
Program trading is the grandfather of all we know as “Algo Trading” today.    The NYSE introduced DOT in the 80s as a means to provide automated clearing and semi-automated execution (order routing) to the manned floor.

Program trading facilitated basic execution algo, index arbitrage, and proprietary strategies.    Index arbitrage involved buying or selling index futures against executing a basket of the same or similar components.    The game then as it is now was speed.

In some of the foreign equity markets electronic execution was prohibited, throttled, or limited in various ways.   For instance I remember that the Japanese MOF would not allow electronic execution to the exchange or at least required keying of orders by humans placed in Osaka, perhaps partially to protect such jobs, but maybe also to protect companies with less sophistication.

Morgan Stanley, Goldman, and Lehman had each, individually, hacked the serial lines from such terminals so as to simulate typing orders.   Technically they employed people at the exchange to lend some credibility to the idea that they were following the rules, but was pretty much common knowledge that this was going on.   They just happened to have very fast typists 😉

The “equity guys” pioneered many of the execution strategies that we use today in the 90s on the back of the growing program trading business.

The 90s (and perhaps late 80s) saw a number of (now) well known hedge funds spawn from this environment such as D.E. Shaw, Citadel, Renaissance Technologies, etc.

The 2000s
Thinking about it the 2000s encompass a period with a number of large failures in the market:

  • the end of the internet bubble
  • the explosion and implosion of credit markets
  • the failure of major wall street firms and life-support for the designated survivors

Much has been written on the above, so I would like to focus on innovation and future direction.     Here are some thoughts on what has characterized the last 10 years:

  1. commoditization of derivatives
  2. program trading (now “algo trading”) crossover into other asset classes
  3. automated market making
  4. automated trading via statistical or rule driven strategies
  5. faster, faster, faster;)

What I really read from the above is that the days of the instinct driven prop-trader or market maker are numbered.   A raft of traders are being replaced by teams of quant / traders, usually more on the quant / CS side than time spent in trading.

The new setup is often a group of quant / dev / traders that develop trading strategies and a much smaller number of “execution traders” that manage the strategies day to day.    Now *that* is really exciting, but perhaps shows my bias being in the former group.

I think the big days of derivatives are over (mostly).   Regulations and standardization will push for more commoditization, automated clearing, and eventual exchanges.   The new areas of innovation in the medium term for derivatives need to be in risk management IMO.

The Next Decade
It would be hard to predict the next decade in the markets given the rapid pace and surprises of the last 3 decades.    The markets are a function not only of the practitioners but of global events, regulation, governments, and seen or unforseen technological advancements.

Here are some predictions:

  1. quantum computing goes mainstream in automated trading
  2. fewer traders more quants
  3. algo dominates all asset classes
  4. hard to find job as derivatives quant

My strategies are market ambivalent, however there are some changes in progress:

  1. US$ devaluation, possible move to another cross currency (maybe in 20 yrs)
  2. Dominance of china in market and world politics

Wishful thinking:

  1. Dramatic changes on Wall Street in terms of Risk Practices
  2. Wall Street takes long term investment focus
  3. Transportation economy largely moves to electric by end of decade (more like 20yrs).  End of oil domination.
  4. Reset of compensation and fees on Wall Street; Incentivize more CS / Physics / Mathematicians to do one and the same 😉

Beyond
We note that the street is moving increasingly into, say, automated market making.   Now firms engage in this for their own benefit, but it serves an important function in providing liquidity to the market, a useful function beyond speculation.

A real AI will be achieved, but doubtful in the next couple of decades.  It may take another 50 years or more to achieve.  I suspect the first AI will be achieved as a neural mapping of a human brain to a machine.   Whether it is achieved this way or through an evolution of our machine learning and knowledge algorithms, accelerated with quantum computing, is immaterial.

Such a development will have dramatic consequences not only for mankind but also for the markets.

The financial services as they stand eventually will be coordinated by an AI.   An AI will be developed by a research organization and be deployed at some point in the market.  More than one may be deployed, at which point the game will become so tight that in the end the AIs will effectively be running the market.

Beyond market making, broader responsibilities could be given to manage money supplies, handle capital allocation / investment (now via loans, public stock offerings, etc).    Anything a human can do could eventually be done more efficiently and at magnitudes lower cost by an AI.

This would effectively put a whole industry out of business.   Would it not be better for great minds to be deployed in the pursuit of knowledge, sciences, etc?    Today it is hard to make a living that way.   I can only hope that there would be more balance in the direction of the sciences and progressive commerce.

Leave a comment

Filed under strategies

Asset Selection

I have a portfolio-based strategy that uses a probabalistic model to determine the “optimal” portfolio allocation vector for each trading period.    This can be applied either intra-day, daily, or across longer ranges of time. The dynamics and composition of the portfolio may change depending on holding period.

The algorithm uses an online learning and optimization approach.   Because this is computationally expensive, want to do some up-front analysis to determine the composition of the portfolio, i.e. what set of assets will provide a good degree of balance and information for weighting decisions.

So for instance, if we chose the S&P 500 stocks as the pool from which to select, which of these stocks should be selected?   There are many approaches to this and it really depends on the characteristic of your strategy.   A “buy & hold” style mutual fund manager is going to pick based on fundamentals with a view on growth over a medium – long term period and in keeping portfolio volatility within some limit.

In the case of this strategy, I am market neutral and look to have both appreciating and declining assets (provided there is sufficient liquidity).   However, as with buy & hold style portfolio managers I am also looking to provide a certain amount of diversification and balance in the portfolio.

Maximum Spanning Trees
A nice way to look at the basic relationships amongst assets is via a maximum spanning tree on the correlations between assets.   The idea is that such a graph spans all assets but each asset connects to only its strongest relationship.

One starts with a “seed” asset and determines the strongest relationship with another from the set and traverses recursively.   Additionally only include assets with relationships above a given correlation threshold weeding out assets with weak relationships.

10 Years in FX
With a small number of assets, a full graph is reasonably intelligable, though carrying less information than a spanning tree in terms of relationship strength.

Here is a spanning tree showing the relationships for a selection of currencies on daily returns (with a correlation threshold of 0.4).  In the case of daily returns one does not see strong connections between european currencies and south america currencies:

The same analysis on weekly cumulative returns shows stronger relationships:

10 Years of the S&P 500
Here are the spanning relationships for the daily returns on S&P 500 stocks with absolute correlations > 0.7:


3 Comments

Filed under strategies

Time Dilation

Many measures work best in a homoscedastic volatility regime.   This is not a big secret.    Most regressors, the simplest of which are the ever popular moving averages, are especially biased in the context of a heteroscedastic series.

Probably the best way of normalizing a heteroscedastic series into one with near constant variance is to observe the following.   If we assume our process is roughly a SDE with normally distributed innovations (or alternatively a Hurst constant close to 1/2), we know that:

As a rough measure, we can remove much of the vol of vol by scaling our time axis in proportion to the variance.   I use a duration based local volatility measure with smoothing or alternatively for daily data an EWMA based evaluation of:

We can then change measure:

where ψ(t) is a smoothing / scaling function.   An example of such a scaling (with the red curve in the upper pane indicating the degree of scale from the baseline):

8 Comments

Filed under technical-analysis, volatility