Tr8dr

March 24, 2010 · 2:12 pm

Market Regime

I tend to design strategies that are largely market agnostic, however, I do have a strategy that has behavior conditioned on, what I will loosely call,”market regime”. I am not using “regime” as the basis of a strategy, rather a means to distinguish modality in the distributions used in the strategy. By market regime may include the following:

medium – strong trend upward
medium – strong trend downward
consolidating / range bound
gapping

I’m starting to dust off some work I had done a couple of years ago in detecting these sort of things. The trick, as always, is in detecting with a minimum of lag, determining the window of analysis, and technique.

Timeseries Approaches
I’ve implemented or considered a number of approaches to detecting the above:

signal decomposition
The signal is decomposed into wavelet-like bases. The power of the bases is evaluated to determine which base carries the most power. The most powerful base is the most representative of the market direction and mode. This approach is sensitive to window. May work well with volatility inverse scaled window size.
filter bank
A bank of filters targeting periods of 2^n are evaluated against the series (ewma’s for example). Based on whether series above or below # of filters determines degree of trend. Oscillation and period can also be measured.
technical indicators?
Haven’t done much here, perhaps some readers have some ideas. Observing the average width and direction of a bolinger band can give indications about consolidation, etc. How does one adjust or choose the MA window?

Statistical Approach
Now the above is an explicit “technicals” view on market modes. The idea in the technicals approach is that each of the states implies a different distribution, so depending on what mode you are in, one selects the appropriate distribution (or a blend thereof for a fuzzy approach).

It occurred to me that a more statistical approach requiring no prior assumptions would be to use an algorithm to determine the modes (or categories) within the multivariate distribution.

I need to do this on high-dimensional multivariate distributions. If the distribution is dense or continuous this is relatively straightforward to do. For a dense empirical distribution one could project a smoothing kernel and solve for maximum points by setting the derivative to 0.

Simple 1D case
On a 1 dimensional distribution, this is easy to see:

Even in the simple case, determining the degree of smoothing and hence the number of local maxima requires either a judgement call and/or cross validation against model performance.

High Dimensional Sparse Case
Attempting to determine the modes in a high dimensional multivariate distribution is a much more difficult. It occurred to me that one of the machine-learning clustering algorithms may be the best approach here. K-means is one such algorithm. I have not yet investigated the efficacy of the approach for my requirements, will post more later when I circle back on this one.

Tentative Conclusion
I believe the statistical approach may be superior, particularly as it can capture modes one may not even be aware of. The statistical approach, however, does not attach any meaning to the modes. In other words is an opaque model. Perhaps a mix of determining what factors give rise to the modes via a clustering approach followed by codifying makes more sense.

3 Comments

Filed under strategies

March 12, 2010 · 10:57 am

Preparing Strategy For Go-Live

I’ve been too busy to post for the last few weeks because am trying to get things together to launch the first strategy for trading (as part of my trading startup). Have 4 strategies right now but decided to start with the simplest of the lot. It operates on a basket of equities at medium/low frequency and has excellent returns and drawdown profile.

I won’t jinx it by indicating the performance profile (which is pretty exciting). Will post some results after it has a trading track record. Right now have just done the backtesting and some trial trading (affected manually).

The next steps are negotiating commission with broker(s) and connectivity. Details at this point, but will take time.

9 Comments

Filed under strategies

February 21, 2010 · 3:36 pm

Model Parsimony

Given the complexity of the markets, it is easy to fall into the trap of overfitting when creating a parametric model of some sort. There are a variety of approaches to avoiding this, some heuristic and others theoretical, such as:

cross-validation (in-sample, out-of-sample)
likelihood weighted information criteria

I want to briefly look at information criteria today. The Kullback-Leiber divergence metric is probably the seminal work in this area. Since then there have been a number of measures such as Akaike Information Criterion, Hannon-Quinn, etc. More recently (well 2000) Hamaparsum Bozdogan developed another measure (ICOMP) which has more appeal for me in terms of what it captures. Conceptually the measure weighs the following:

likelihood of the model given the parameters (lack or degree of fit)
the complexity of model parameters (lack of parsimony)
the complexity of model errors (profusion of complexity)

A simplified form of the measure is as follows:

where Σθ and Σε are the parameter and residuals covariance matrices respectively. λ’s are the eigenvalues of each respective matrix.

We look to minimize the above function (effectively maximizing likelihood against the countervailing complexity measures):

The measure of complexity is explained as follows from his paper:

Complexity of a system (of any type) is a measure of the degree of interdependency between the whole system and a simple enumerative composition of its subsystems or parts.

The contribution of the complexity of the model covariance structure is that it provides a numerical measure to assess parameter redundancy and stability uniquely all in one measure. When the parameters are stable, this implies that the covariance matrix should be approximately a diagonal matrix.

In general, large values of complexity indicate a high interaction between the variables, and a low degree of complexity represents less interaction between the variables. The minimum of Complexity(Σ) corresponds to the least complex structure. In other words:

Complexity(Σ) → 0 as Σ→ I

This establishes a plausible relation between information-theoretic complexity and computational effort. Furthermore, what this means is that the identity matrix is the least complex matrix. To put it in statistical terms, orthogonal designs, or linear models with no colinearity, are the least complex, or most informative, and the identity matrix is the only matrix for which the complexity vanishes. Otherwise, Complexity(Σ) > 0, necessarily.

Why bother? Well the most commonly used criterion (the AIC) has does not adequately capture complexity and is know to be biased for some model systems. The approach also has greater intuitive appeal.

5 Comments

Filed under strategies

February 20, 2010 · 12:10 pm

Impulse Response

This is just a quick note on deriving an impulse response function for a VECM system. Basically we want to get the system into a form where we can take the partial derivatives at various lags. Starting with a simplified VECM:

Convert this into a form expressing in terms of X instead of ΔX:

We change variable to simplify the form:

Via Pesaran and Shin (1996) we transform this into the following recursive expression:

We determine the partial derivative of ∂vj / ∂vk (i.e. the impact of a change in the kth variable on the ith) after n time periods (t+n) to be:

where Si is a selection vector with 1 at the ith position and 0 elsewhere.

Normally the cholesky decomposition is used to orthogonalize the covariance (U U’ = Σ), however other decompositions can be used, providing different measures of response such as the Bernanke-Sims approach.

Chinese Bubble

Just a quick note. Wanted to point to this presentation on the Chinese bubble: “China the Mother of All Black Swans“. His conclusions on commodity prices and interest rates are right on the mark, stating:

interest rates will go up
Ok this is a no-brainer for a number of reasons, one being that US interest rates are about as low as they can go. Also, given the large amount of new debt issue and increasing stockpiles in China, Japan, and elsewhere, would expect reduced appetite at current interest levels. That said, in uncertain markets, where else should China or Japan park their $ inflows?
commodity prices will revert
Well, most commodity prices already reverted to historical levels in 2008 during the market crisis. This can be seen in a number of commodity indices. That said, there are specific commodities that are at premiums to historical levels (ones that China has been buying).

Here we can see that commodities, as a broad index, have fallen back to historical levels and stayed there since 2009, however industrial commodities have increased in price by 30%:

I’m going to ignore Gold at the moment, as I don’t believe China is the major driver behind its rise (that said, gold would appear to be ripe for a huge reset, similar to the buildup and reset in the 80s). Copper on the other hand, has had a 50% rise since 2009 that can be attributed to demand from China:

Apparently China is buying copper not only for internal demand, but as a way to invest its huge inflow of $s as opposed to investing entirely in US treasuries. So really we have to be looking at significantly less US consumption of Chinese products or a change in policy around stockpiling industrial commodities, before we see these commodities reverting to historic levels.

4 Comments

Filed under strategies

February 16, 2010 · 11:46 pm

Network Model

I’ve been thinking about the relationships amongst a network of assets. Supposing I have a network of hundreds of assets, what sort of measurements can be made that allow for statements about the future state with a measurable degree of confidence.

Here are some “standard” approaches to looking at the relationship between assets:

Covariance
Covariance is a linear measure, the normalization of which is literally the slope of a least-squares regression line through paired data. Has issues with lagged series and assumes linearity, also only uniquely specifies elliptical distributions.
Cointegration
Cointegration is a measure of relationship between series using an autoregressive error correction model. It avoids many of the issues of Covariance, however, like covariance is not sufficient to uniquely specify the joint distribution. The VECM model and Johansen method give robust estimates of this.
Distribution Estimating Models
Models that estimate the distribution (i.e. provide the most probable price movement in the next sample period based on an estimated high-dimensional distribution); works well on certain portfolios.
SDEs
SDEs that impose a structure / mechanism of price movement, implying future price movements. These models often combine points 1 and 3. I like most other people in this space have a collection of these, some better than others.

Given that I already have models in categories 3 and 4, am interested in a new model based on cointegration — not cointegration for pairs trading, but using the strong error-correcting relationships in a network of assets to determine likely next period moves.

Amongst a number of approaches for determining error-correcting relationships, have found the eigenvectors implied by the Johansen maximum likelihood estimate of the VECM to be the most stable as compared to other alternatives:

heuristic zero crossings maximization
beta estimates from rolling OLS regressor
Various Ornstein-Uhlenbeck models (though with a particle filter the degree of noise can be reduced significantly)

I’m not going to state what I am doing right now, but may write up parts of it along the way.

3 Comments

Filed under strategies

February 7, 2010 · 8:11 pm

Adaptive Regressor

Regression is an important tool in trading (witness the number of traders that rely on moving averages of various sorts). I don’t directly use regressors to generate trading signals, but I do find them useful in denoising signal output.

Aside from the obvious about past predicting the future, there are other issues with regressors:

lag: denoising necessarily involves averaging of some sort, resulting in lag relative to the underlier
parameterization: what parameter settings bring out the features of interest

The simplest regressors are ARMA based FIR or IIR filters. Lag is easy to quantify as phase delay in those systems and harder in others. Rather than focusing on lag, I want to consider the parameterization.

Parameterization
To illustrate the problem of parameterization, consider a simple exponential MA in two market scenarios:

market with strong trends
Long windows mask tradeable market movements. A shorter window (or “tau”) is needed to capture market movements of interest.
market trading sideways
Short windowed MA oscillates on small movements. Long window needed to reduce or eliminate noise that is not tradeable.

While I don’t use MAs for trade entry, the general problem of adapting a regressor to features of interest is important.

Penalized Least Squares
The penalized least-squares spline is known to be the “best linear unbiased predictor” for series that can be modeled by:

Where, f(x) is typically a polynomial based function (typically a high dimensional basis function). Characteristic of the penalized family of splines is the balance between least-squares fit and curvature penalty:

This minimization can be constructed into a matrix based system using the basis design matrix. I’m not going to go into this here, but you can find many papers on this. The formulation is straightforward, but it is very easy to run into numerical instabilities with straightforward solutions (trust me I’ve tried), so best bet is to use one of the tried and tested implementations (such as DeBoor’s).

Ok, the problem with the above is that the parameter λ is a free variable (i.e. an input into the minimization). λ allows us to control the degree of curvature or oscillating behavior. Here is the same series with 4 different levels of λ (underlier in black):

Flexibility is great. Now how do I choose λ appropriately? And how do I define appropriate?

Criteria
As mentioned above, with the incorrect choice of regression parameters result in regressor that is either too noisy or misses features.

Now before I explain the criteria (heuristics really) that I came up with, let me point to some literature tackling the general concept. Tatyana Krivobokova, Ciprian M. Crainiceanu, and Goran Kauermann, “Fast Adaptive Penalized Splines” (2007). Their approach produces an evolving λ, one for each of the truncated basis functions through time, chosen such as to reduce the local error, but keeping enough error to be optimally cross-validated.

Though the above is interesting, and indeed produces some amazing results for certain data sets, the “smoothness criteria” are fundamentally different from what I am looking for.

I decided that my criteria is as follows:

the amplitudes between min/maxima in the spline must meet some minimum amplitude-time
the energy of the spline must be “close” to the energy of the original series

The rationale for the 1st point is that we do not want small oscillations in the spline (signifying that we need to tune for less noise). The second point tunes in the other direction, that is, if the spline is too stiff, missing many features, the energy of the spline will be too low relative to the original series.

Algorithm
The two above criteria break down into:

the integral between a maximum and minimum ≥ threshold
the integral of f(x)^2, where f(x) is the spline

As I did not see an easy way of building into a system of equations took the “poor mans algorithm” approach, namely:

binary-style search between low and high values for λ
if amplitude/area < threshold choose higher lambda else lower
repeat until some granularity

Works well!

3 Comments

Filed under strategies

February 1, 2010 · 2:23 pm

Commissions

It is quite frustrating that commissions arrangements are so opaque. Basically professional trading firms have to negotiate with the venues (if you are big enough to go direct) or with your prime broker.

I was previously on the “sell side” so have a pretty good idea of what commissions were in the FX & Rates markets. I don’t have much idea about this on the equity side though.

I have a new strategy in the equities markets and trying to determine what sort of commissions would be involved. It would be good to know what the sell side and hedge funds “see” in terms of fees as a point of reference. What sort of upside woud one have in terms of commissions should one structure as a well capitalized fund?

So I did a bit of digging on the web. The only information out there that I can find is for retail. On the retail side (I use IB at the moment):

I’ve run across a number of articles like this, indicating commission costs of ~ 2 cents / share for institutional trading (this must be a typo or I am misunderstanding the article).

I’m sure fees on the equity venues are much less than the bid/ask spread (which for liquid issues is 1 cent or less). In fact some venues provide rebates (give you money) if you are providing liquidity rather than aggressing. Now for a buy-side firm with a prime-brokerage arrangement, whose goal is not equities market making, the costs must be quite a bit higher than going direct. I would guess would still be less than 1/100th of retail.

In any case, I am trying to get a better handle on what the true costs are for different situations. If anyone has some indicative #s for commissions on the buy side would be appreciated.

3 Comments

Filed under strategies

January 27, 2010 · 9:36 pm

The ideal Quant Environment

R is a wonderful tool in that one can prototype and test ideas very quickly. If your not using R and doing most of your work in C++ or other low-level language, your missing a lot. The speed of development between a C++ / Java / C# versus R is 10-50x.

R definitely has its warts as a language and environment. I not a huge fan of its matrices, index operators, or lazy expression parsing. More crippling is the fact that R is slow and has memory issues for large data sets. I would estimate that R is 100x slower than java or C++, depending on what you are doing.

My current environment is a combination of R and a number of lower level languages. For much of my post-exploration work I feel compelled to write in a lower level language due to the performance issues with R. My preference is to be using a functional language, as they are generally very concise and elegant.

Ideal Environment
Here is what an ideal environment for me would be:

breadth and depth of R
clean functional language design
concise operations (as close to the math as possible)
excellent rendering facilities
real-time performance
ability to work with large data-sets (memory efficient)
concurrency (I do a lot of parallel evaluation)

Candidates
Here are candidate environments that I’ve used or explored:

python
cleaner, generally faster than R, very little in the way of statistics and poor integration with R. No real concurrency as interpreter locked.
Ocaml
Beautifully concise language, INRIA implementation does not support concurrency
F# variant of Ocaml
Solves Ocaml’s problems but bound to MS platform
Scala
Excellent performance, a bit bleeding-edge, much more complex than Ocaml, but on the JVM.

The Special Blend
It is impractical to consider reimplementing even a subset of R into another language environment. A hybrid approach makes sense. With python you have Rpy and with Java JRI. Neither of these have first class interaction with the language though.

I would like to have the power of a production functional language, but with the same development ergonomics and interaction that R has. It occurred to me that I could do the following:

dump function templates of all R functions in one’s environment into a Scala module
create implicit conversions from R fundamental types into Scala objects (matrices, vectors, data frames, etc)
create specialized mappings between my Scala-side timeseries and R zoo / ts objects
create some wrappings for unusual usage patterns such as the ggplot operators

Scala has an interactive mode where functions, classes, etc. can be created on the fly. Because we’ve dumped proxy function for each R function, we also have near first class access to R functions. There would be some differences in that would not be able to replicate the lazy evaluation aspect of some of the R functions. Functions that use expression() would have to be wrapped specially.

Because scala allows a lot of syntactic magic, the environment would look very much like R and have complete access to R, but with the huge upside of the more powerful Scala. One can write code that is “production ready” from the get-go and/or do very compute intensive operations otherwise prohibitive in R.

Now I need to find the time to put this together …

Addendum
I have not decided on which language / environment to base this on. Scala’s main sell is that it is on the JVM. The other functional language contender on the JVM with above-scripting-level performance is Clojure.

Clojure is basically a dialect of Lisp and there is already a project called Incanter that provides a statistical environment within Clojure. It looks interesting, if early. Clojure does not yet have a performance profile that is close enough to the metal. I would expect to see improvements over time though, but due to Lisp’s lack of static typing or type inference, I am doubtful that will see Clojure at the level of statically typed languages.

Since writing this post and having some conversations, I’ve begun to think that F# may be the best choice. My language preference has been to use Ocaml, but the INRIA Ocaml implementation is handicapped. F# is closely related to Ocaml and therefore may be a fit.

F# on the MS .NET platform has been shown to be as performant as C#. From benchmarking C# a couple years ago, was clear that the CLR is pretty close to the JVM in terms of performance. Given my cross platform constraints, the question has been how viable is F# with Mono from a performance point of view?

It seems like the Mono performance is being addressed. The mono LLVM experiment improved mono benchmarks significantly. I have not been able to test Mono with this extension. Will have to experiment.

To wed this to R would require writing the equivalent of JRI for .NET / R.

36 Comments

Filed under strategies

January 22, 2010 · 5:25 pm

Cointegration Clusters

Previously I had written a simple clustering algorithm using correlations to look at rough “relationships” between equities, whether real or “accidental”. I had evaluated this on the S&P 500.

I decided to do a much wider evaluation on all US equities with average volume > 200K and market cap > 500M + the ETFs. This subset of equities results in about ~2800 assets or 3,780,000 pairs to be evaluated. Evaluating this in R was impractical, so evaluated in Java. The evaluation time was in seconds rather than the days R might have required.

To discover stronger relationships evaluated both the correlation and Augmented Dickey-Fuller test on all pairs, keeping pairs with adf p-value < 0.05. Additionally did cross-validation against prior years, throwing away pairs that did not fit in the cross-validation period.

I then used my clustering algorithm (outlined in a previous post) to determine networks of maximally related assets.

This resulted in ~1000 pairs (of the original ~4 million pairs). Some portion of these pairs “make sense” and others are complete surprises. That said, given the large number of series it would not be a surprise to find “unrelated” assets that are relatively cointegrated over the test period.

It generated ~50 clusters, here is one at random:

Trading The Pairs
The beauty of cointegrated series is that they are much easier to model then series with heteroscedasticity or trending mean. There are a number of approaches to trading pairs (or larger cointegrated portfolios). Before getting to this, first want to illustrate a non-cointegrating spread:

We can see that the spread between NI and ACG has a spread that is growing with time from the axis. The ideal spread is one that oscillates around a constant mean (generally 0). The above can be traded, but would involve, first a view that the there is a long run beta differential of ~0.25 and weighing the basket appropriately.

Many look at the relative “beta” (i.e. the slope of the long run cumulative returns) for each asset and determine weights based on a linear regression. That approach works well if trends follow a near linear path over the observation period.

A better approach is to find the weights such that the spread “spends as much time above the origin as below the origin” (ok, it’s a rough heuristic I came up with). This can be expressed as:

Basically the above is “saying”: find the integral of some weighting of the spread function such that the area is as close as possible to 0 (i.e. we have balanced sweep above and below the origin). The constraints make sure that the weights don’t go to 0.

If it is Cointegrated
In theory this is a much simpler scenario where we can chose equal and offset weights (-1, 1) and then analyse the resulting spread for entry and exit (technically one may still adjust the weights to adjust for drift depending on MR period one is focusing on).

Next, we want to look for mean-reversion patterns or at least identify levels likely to mean revert conditioned on the past. Here is a pair that is cointegrated for this period:

The typical approach is to normalize the spread to standard deviations and enter reverting trade when 2 SD or another suitable threshold is realized. Some basic observations of momentum and vol can be used to decide precisely when to enter.

Another approach used is to calibrate some descendent of the Ornstein-Uhlbleck MR model to the desired level of MR and use as a driver for entry. I’m not trading pairs at this time, so I’m not sure whether it is worth adopting a MR model. From past experience with these models, they are hard to calibrate and require significant modification to match empirical behavior, even loosely.

Beyond Pairs
We use pairs to provide a more desirable process statistically, more amenable to MR analysis. There is a much wider universe of possibilities present in “spread baskets”. By “spread baskets” am refering to collections of more than 2 assets that are fractionally long or short, producing a tightly cointegrated return.

Determining such baskets is very complex for a number of reasons:

size of search grows at roughly O(N^k), where k is the size of basket and N the number of assets
one needs to determine optimal weights (expensive NLP)
optimal weights need to be tested in cross-validation

Mitigating the worst case is:

can throw away assets with low correlations

To give an example, if we consider the 3-asset case on 2000 stocks, the worst case search would involve 2.6 billion combinations to check. The correlation matrix may well make this viable however.

6 Comments

Filed under strategies

Tr8dr

Market Regime

Preparing Strategy For Go-Live

Model Parsimony

Impulse Response

Chinese Bubble

Network Model

Adaptive Regressor

Commissions

The ideal Quant Environment

Cointegration Clusters

Search

Recent Entries

Links

Bitcoin

Resources

Strategy