Transfer Entropy

I am revisiting spanning trees & clusters that express relationships amongst assets.   I am also interested in a related problem: reducing dimensionality in a high-dimensional distribution of asset returns.

Linear Approaches
The most naive approach is to look at the correlations amongst time-series.   Correlations have a number of well-known problems for this purpose:

  1. correlation ≠ causality
  2. correlations can occur between completely unrelated variables for arbitrary sample periods
  3. correlations are a linear measure of similarity

The VECM model offers significant improvement over the correlation approach, at least in terms of identifying causality.   For those unfamiliar with VECM it is similar to an ARMA model, but extended to include lags from the other timeseries.   For 2 variables, a lag-p VECM would be set up as:

That is Δx is described in terms of lagged Δx’s and lagged Δy’s.   Solving for this (usually in matrix form), one arrives at coefficients (assuming statistical significance) which on-average describe the interactions between the series X and Y.

Taking this a step further, one can do the “Granger Causality test”, doing a F-test to determine whether Δx with cross-lags produces significantly less error variance than Δx without cross-lags.   This is performed for various lags to determine the minimum number of lags for which there is “causality” (or none at all).

This is not a bad approach for normally distributed returns, but is flawed for data with non-linearities.

Information Theory
It turns out information theory provides a powerful tool in analyzing causality (or at least temporal flows of information from one market to another).

Shannon measured the information for a particular event “e” as:

Let us associate a symbol with each possible distinct event in a system {A, B, C, … }.  A sampling of these events across time will lead to a sequence of symbols (for example:  ABAAABBBBABABAA).   If the symbol B occurs with p(B) = 1 (i.e. BBBBBBBB), the sequence can carry no information as can only represent 1 state.   Note that I(B) would be 0 in this case.

Shannon went on to define the entropy of the system as the expected information content:

This can be extended to look at the joint entropy of for 2 or more “symbol” generators such as:

Observing that if X and Y are independent, p(x,y) = p(x)p(y), we can determine how much information has been introduced into the joint event space versus the amount of information were the two sequences independent as:

The above is called the “Mutual Information” measure.   This measure does not differentiate between information X provides Y or Y provides X.   In the context of finance it is useful to know more about the directional flow of information than that they simply share information.

Transfer Entropy
Transfer Entropy is a more precise measure than Mutual Information in that it captures information flow direction and temporal relationship.  The Transfer Entropy approach is a nearly 1:1 analog with Granger causality, except that it is applicable for a wider range of systems (as it turns out granger causality and transfer entropy have been shown to be equivalent for data with normally distributed noise).

Like Granger Causality (GC), we look at the entropy (or in the case of GC: error variance) with and without an explanatory variable from the other series.   For a single lag, this results in the following measure:

The above expressed the transfer entropy of y[t] on x[t+1], i.e. how much impact does y[t] have on x[t+1].   Changing the conditional probabilities to express p(y[t+1] | x[t], y[t]) would allow us to explore the other direction.   Of course this can be evaluated for more lags (the above is just for 1 lag).

Finally one needs to consider the level of significance for a given transfer energy to understand at which point there is no further relationship when looking at past lags or other variables.   The approach taken is to measure the baseline entropy in a shuffled series (one that removes the correlations but maintains the symbols and marginal frequencies).

This approach is much more robust than granger if the data set one is working with has non-linearities.

8 Comments

Filed under strategies

Serial Correlation of Winning and Losing Trades

I came across a comment on something the Turtle traders had done, which was to stay out of the market in the next trading period following a 2x size win or loss.   Interesting idea.   They must have made the following observations for their strategy:

  1. unusually large winning trades were followed by losing trades (negative serial correlation or mean reversion for big winners)
  2. unusually large losing trades were often followed by another losing trade (positive serial correlation for large losing trades)

My main strategy is multi-asset so probably doesn’t lend itself as well to a “rule” like this.   Interesting thought though.   I should do some analysis on the pattern of winners and losers and see whether there is a consistent pattern.

Certainly for a single asset, it is not unusual to see mean-reversion following a ramp up in price.

4 Comments

Filed under strategies

Serial Correlation

One of my strategies uses a ML technique to find patterns in the distribution of returns across a portfolio.  Conditioned on the pattern is a highly skewed marginal distribution for next period returns.   The + skew is important and a very good thing, pointing to much more + returns than negative returns.

I had a theory that for this particular pattern, I would see higher negative serial correlation in the bigger winners.   If true would allow further amplification of winners or better selection within.   Indeed it did work out that more negative serial correlation produced higher next period returns on average.

Further, there was another factor that appeared to be relevant in the mean returns.  Was easy to visualize / examine with the rgl package in R:

The is clustering quite visible in 1 corner.   This is good.   I’m sorry, but I can’t go into the background of what this is conditional on.   Thought I’d give a plug for rgl and also note that autocorrelation can be a useful tool in predicting return bias.

2 Comments

Filed under strategies

New Quantitative Finance Community

There is a new Quantitative Finance community in incubation at stackexchange.com.   I’ve a big fan of mathoverflow and stackoverflow, having found both to be the best sources of information and problem solving within their respective areas.   In order for the community to have a permanent and free home, it must show a sizable interest base.  Please visit the link above and provide your email address in support of the community.

Here is a quote from the founding member which best describes the motivations for creating this community:

I’m a researcher at a long running multi-billion quantitative hedge fund. I previously worked at Goldman Sachs, which due to its size was able to foster a strong internal community. As a researcher in a hedge fund, I have fairly limited opportunities for collaboration when it comes to learning more about the investment industry and finance as a whole. In short, there are no good, free learning resources for someone involved in finance.

The best available option has been Wilmott. The Wilmott forums are home to a very large community of scholars and professionals. Unfortunately, I find that it can be very difficult to monitor the forums, and even more difficult to find answers to questions that were asked there in the past. Moreover, the forums frequently include controversial or offensive discussions that can detract from everything else.

As such, I am seeking to create a new community that specifically addresses the needs of finance professionals and academics on the StackExchange network. StackExchange is a network built by the creators of StackOverflow.com.  If you haven’t used StackOverflow, I strongly recommend it for programming questions. It’s much more effective than any other format at posing questions and getting answers in very specific technical areas.  If you have any programming questions, you should give it a try and see how quickly the correct answer is given.

This new Quantitative Finance site would be more akin to MathOverflow, where mathematicians gather to ask and answer specific math questions. The beauty of the MathOverflow is that it is a site for expert, by experts. Any non-specific, ill-formatted, or otherwise offensive question gets closed by the community almost immediately. So the attention can always be focused on interesting math questions.

In short, visit the incubating Quantitative Finance community and lend your support.  Thanks!

3 Comments

Filed under strategies

Fat Finger

I’ve been doing some manual trades now and then.  Was at the gym this morning and thought to buy an equity.

Dumb thing to do:  Sent in the request from my iphone, accidentally multiplying the amount by 10x.   I ended up long with a 600K$ position that did not want.   For whatever reason could not manage to put in the sell from the phone and was getting worried that I could not properly monitor the appropriate exit.    Had to leave mid-exercise and handle.

Luckily  exited with a few thousand in profit, but could have gone the other way.

3 Comments

Filed under strategies

Scientific Computing on the Cloud

I’ve been keeping a close eye on the costs of the various clouds versus the costs of internal cpu farms.   Amazon EC2 pricing for high CPU map-reduce appears to be evenly priced with my costs to host internally.

I calculated this based on depreciating core i7 920s over a 3 year period and accounting for 0.14$ / khw @ 150 watts continuously.   I arrived at a lower cost than Amazon’s, however when adjusting for cpu performance, !/$ ties out or is bettered by the Amazon proposition.

Amazon’s rates for calculations using map-reduce are 1/5th the cost of a normal instance.    I’m estimating that the the high cpu 8 core is performing at SPECfprate2006 of approximately 150, twice the core i7 920.   The cost is $0.12 / hr versus a non map-reduce instance cost of 0.68 / hr.

This is great news for those doing transient large scale scientific computations (such as myself).   I now need to look to map my machine learning and strategy evaluation algorithms to map-reduce.

6 Comments

Filed under strategies

Blending Strategies

In a number of my systematic strategies I evaluate hundreds or thousands of sub-strategies, using a blend of these to decide on the next period trade.

Assume we are trading from a portfolio of possible assets, going fractionally long or short on a sparse subset of possible assets in the portfolio.   For example I may be working with an equity portfolio of 2300 possible equities.   From period to period I may trade anywhere from 0-40 of these.

At time t we want to know the apriori “optimal” weights for time t+1 for the 2300 assets.  Each of our sub-strategies predicts the best weights (trades) for time t+1.   We know the cumulative return, the average return, and risk adjusted return for each of these “paper” strategies.

Approaches
Here were some approaches:

  1. use the top N strategies by highest cumulative historical return, risk adjusted
    This works reasonably well if each strategy is trading with the same frequency.   However, with one of my strategies, some “paper” strategies trade 85% of the time and other maybe 5% of the time.   The cumulative returns of the less frequently traded will be much lower.
  2. use the top N strategies by average return, risk adjusted
    This equalizes the trading frequency problem however the low frequency strategies can dominate the selection if the low frequency have much higher returns.   The problem is that the aggregate strategy will rarely trade because the top N are dominated by low frequency traded strategies.
  3. use the top N non-zero weights strategies by average return, risk adjusted
    This avoids the dominating non-traded strategies above.   However there may be periods where the best trade is no trade at all.
  4. use the top N non-zero weights strategies with average return in the top X percent
    This fixes the above problem by just evaluating the top X percent at most in search of the top N trading strategies.

A Complication
One of my strategies has the following complication.  Many of the “paper” strategies are non-orthogonal in the sense that some of the strategies have overlapping stimuli and responses.   For example if 2 strategies are very similar, they may respond to the same events 80% of the time and have similar performance.   Including both of the strategies in the aggregate response will double count.

To avoid this have resorted to using a modified blending rule that discards strategies that are subsumed by higher performers in the selection.

4 Comments

Filed under strategies

Thursday’s Drop: Manipulation

There is a lot of nervousness over both the bullish market recovery of recent months and credit issues in the eurozone.    Due to this lack of confidence, the market is easily manipulated on the downside.   That is not to say that the market does not require correction (I think it does), but yesterday’s drop appears to be more of a manufactured event.

I was watching a number of stocks (such as AAPL) yesterday.    Saw a 16% drop on the back of no additional news.   It appears that multiple 1 cent sell orders for 100 shares in a basket of popular stocks was put in.    Naturally this knocked over many algos, prompting dramatic selling for a short period.

The claim is that was a fat-finger exercise, but I think it could just as easily be an extreme case of the sort of manipulation that occurs everyday in enticing algorithms and traders to react to small price shocks, revealing their hand or pushing the stock in a given direction.

The timing during one of the least liquid periods of the day made the orders all the more effective.    Algos beware 😉

Addendum
It appears that there were greater than 30 sell orders for 0.0001 on at least 1 stock and probably a similar pattern on the others.   This would clean out the order book and as separate orders push the orderbook lower with each order.   See this link.    The author speculates that dramatic yen buying was the trigger.   It is possible that an algo reacted to strong yen buying and put in limit orders to liquidate a basket of equities (though a very poorly thought out exit strategy).

Errant algo or manipulation for a buying opportunity.  Hard to say, either is equally likely.

4 Comments

Filed under strategies

Goldman and the future of Investment Banking

Some of the dealing on Wall Street falls under the category of being legal but unscrupulous.   That said, it is little different from the buying and selling that a used car dealer does.   The dealers products have to be approached with a buyer-beware mentality for better or worse.

The dealer is looking to buy cheap, repackage or create, and sell at a higher price.   Necessarily the dealer cannot reveal what she perceives to be the real value in buying merchandise and likewise will sell the merchandise for the highest price possible.

Goldman fell a bit over 12% after the news of the SEC civil case, removing about 9B$ in market value.   My view is this reaction is much overdone (hence I am long at the -12% level).   Beyond the civil case there are concerns about possible regulations such as:

  • disallowing retail banks to deal in derivatives (more or less)
    The details of how this would be applied are not clear.  But firms that choose to deal in the restricted products will not have access to depositors insurance and fed funds apparently.
  • rules about leverage (i.e. capital requirements)
    Nothing clear here, but surely would expect that there will be regulation here and/or unwillingness on the part of gov lenders to provide capital for over-leveraged firms.
  • move many OTC derivatives to exchanges
    This will introduce more transparency to the derivatives market.   At the same time, margins for these trades will drop significantly.    It will, however, open up new algo markets.

With the exception of reduced leverage, I expect the above, in whatever form they appear will disadvantage retail banks and less sophisticated players, and hand more of the market to investment banks such as Goldman.   It will take a long time to play out, but I see Goldman winning here.

Personally I welcome the new algo opportunities!

Leave a comment

Filed under strategies

Distribution Estimation

I’ve been travelling for the last 3 weeks so have not had much time to post.   During the trip, I’ve been thinking further about the following problem:

  1. We are interested in determining a representative distribution for some financial factor
  2. suppose we start with start with a “universal” high-dimensional joint distribution of all random variables that might possibly have relevance to the probability of some financial factor
  3. suppose that the number of variables in the universal joint distribution is high enough that the distribution is sparse relative to the sampled data.  We need to reduce the number of degrees of freedom.
  4. Is there a smaller marginal distribution (a subset of variables) that provides representative modes and distribution shape?
  5. How do we determine it?

Some observations:

  1. We should expect clustering around 1 or more modes
  2. Variables with little impact will not introduce new modes or substantially alter the shape of the distribution
  3. Adding a new low-impact variable (dimension) should just stretch the existing modes uniformly along the new dimension

This brings to mind a brute-force approach that would involve:

  1. observing all possible marginal distributions
  2. applying a measure to each, determining the degree of information within, select one that maximizes information and penalizes for number of variables

The first step can possibly be shortcut by reducing incrementally, but may not find the global optimum.   The notion of “information” also needs to be defined.   I’ll post more later on this.

Another approach would be to formulate as an expectation maximisation problem, but I have not worked out how this would be done.

2 Comments

Filed under strategies