Category Archives: volatility

Time Dilation

Many measures work best in a homoscedastic volatility regime.   This is not a big secret.    Most regressors, the simplest of which are the ever popular moving averages, are especially biased in the context of a heteroscedastic series.

Probably the best way of normalizing a heteroscedastic series into one with near constant variance is to observe the following.   If we assume our process is roughly a SDE with normally distributed innovations (or alternatively a Hurst constant close to 1/2), we know that:

As a rough measure, we can remove much of the vol of vol by scaling our time axis in proportion to the variance.   I use a duration based local volatility measure with smoothing or alternatively for daily data an EWMA based evaluation of:

We can then change measure:

where ψ(t) is a smoothing / scaling function.   An example of such a scaling (with the red curve in the upper pane indicating the degree of scale from the baseline):

8 Comments

Filed under technical-analysis, volatility

Learning a Sequence

I had been looking at predicting durations (or the intensity) to model price behavior and variance estimation.    As mentioned previously, the prevalent ACD models in the literature do poorly.   Before moving on to another topic wanted to revisit this, with an idea for future approach.

Here is a sample of durations for a high-frequency price series:

9.30, 0.26, 0.28, 4.21, 0.04, 0.21, 3.23, 0.04, 2.28, ...

I decided that rather than trying to regress for specific durations, where there are an infinite number of possible values (theoretically), transform this into a set of symbols so that there are a finite number of states say:

S1, S2, S3, ...

where S1 might represent durations in [0, 0.25], S8 durations in [3, 3.5], etc.   The sequence of states for the above durations might look like:

12 → 1 → 1 → 9 → 1 → 1 → 8 → 1 → 7 → 6 → 6 → 6 ...

This turned out to be useful.

SVM
SVM on a radial basis kernel did a much better job of predicting the next symbol (duration) in a sequence than the ACD models.   It was still not  a suitable level of prediction however.

The problem with SVM and related approaches in general is that you either need to have a problem that can easily be categorized in high dimensional linear vector space.  A big part of this is finding the kernel that will map your (usually) non-linear vectors into a linearly separable space.    Also, SVM is arguably better suited to binary classification as opposed to multinary classification.

ANNs
In theory, an ANN with enough neurons can asymptotically approximate any function.  There are many problems in arriving at a general solution though:

  1. Calibration
    Standard techniques of backpropagation (essentially gradient descent) solve for a local optimum, which depends on the starting configuration.   A global optimum can be found with meta-heuristic approaches such as GAs, however, at significant computational cost.
  2. Overfitting
    It is very difficult to come up with networks that generalize.   Part of the success in doing this involves choosing training sets and configurations carefully.

Nevertheless, this may be an approach worth exploring.

Probabalistic Graph Models
As our duration pattern is essentially a transition from one state to the next, modeling as a probabalistic finite state machine appeals as model.  The idea with such an approach would be:

  1. empirically observe all chains of length ≤ some maximum
  2. determine the frequency of chains
  3. factorize into the smallest graph that reproduces those chains within some error

The chains, for instance:

A first approach to this problem is to consider whether can be modeled as a markovian state system.  It is, however, doubtful that the states {S1, S2, S3, …} can be modeled in a strictly markovian setting without the use of additional states.

For instance, is  P(S1|S2) the same as P(S1|S2, {prior states})?   The duration data shows dependence beyond the immediate prior state.    Therefore we have to expect that P(S1|S2, {S5,S1}) will differ from P(S1|S2, {S2,S3}), whereas in a markovian model, the probability of S1 can be conditioned purely on the prior state.

Such a markovian system might look like:

The HMM (Hidden Markov Model) combats this assumption by assuming that there is a hidden markovian process (usually with more states than the observed state system).   One can easily prove that a HMM of infinite size can exactly model all possible state chains (sequences) amongst a finite set of states.   Of course we are interested in a much smaller model that can reproduce most of the observed chains with limited error.

Here is a sample structure, where the black lines are edges between hidden states and the red edges indicate correspondence between hidden state and observed state.   The red edges are not traversed:

Aliasing Issues
Remember that we have arbitrarily subdivided durations (which are continuous) into N discrete states.   The idea was that the difference between say 0.25 seconds and 0.22 seconds is not important for our purposes.   One would think that less granular states will allow for  easier modeling of the state sequence.

The problem is that we are dividing these discretely.   We run into an aliasing problem where a specific duration partially belongs to the set represented by S(i) and S(i+1).   For instance for a sequence of length 3 we have 4 possible true state paths, each with associated probability.   Without compensating for aliasing we see the states (naively):

With aliasing we have the following possibilities:

As our path length approaches N, we will have 2^(N-1) possible paths.  One possible implementation of this is train with the M highest probability paths.

Fuzzy HMM
Aliasing is a kind of fuzzy set membership.   Aside from aliasing there are a number of reasons why we should consider fuzzy state membership:

  1. The data may be noisy, obscuring the pattern
  2. Discretisation error (aliasing)

Not surprisingly, other people have thought of fuzzy state membership in the context of HMM.   There are multiple fuzzy HMM models.   To be investigated …

Leave a comment

Filed under machine-learning, volatility

Mean in the context of Mean-Reversion

I want a running mean estimator that acts as a mode through mean reversion cycles of target amplitude or frequency.   The key characteristics should be:

  • adaptation to local volatility
    • determination of diffusion related squared return
    • determination of jump related squared return
    • determination as to how much of the jump should be absorbed into the mean
  • model of mean reversion
    • calibrated to a desired long-run rate of reversion
    • allowance for changes in reversion constant and reversion to long run
  • model of mean
    • autoregressive
    • innovations scaled by sigma term (with MR component and jumps removed)
  • recursive backward estimation of ML
    • implicitly decide how innovation is distributed amongst mean, mean-reversion, and noise

A SDE-based Approach
The model is an expanded variant of the familiar OrnsteinUhlenbeck process, with specialized mean-reversion, mean, and volatility processes.   It also attempts to correct for jumps.    Let’s start with the following SDEs (in continuous time):

Variance
There are many approaches to modeling volatility (all with issues).   Initially I had though to use a predictive model based on:

  • intensity process (based on “first exit” duration)
    This is a very complex process.  First approximations have been to use ACD, a family of AR models for duration.   ACD models perform very poorly on HF data however.    It seems that a markov chain model recognizing the patterns will be most appropriate.
  • amplitude process
    The amplitudes of squared returns seem to follow a largely AR process.   This seems fairly well behaved.

Before fully committing to a complex volatility model thought its makes sense to first try with a non-predictive measure of realized variance.  I will use:

The choice of α determines the degree of smoothing with previous values based on how local (and noisy) we want this function to be.   For example, here is the estimate with a smoothing factor of 60 and a threshold of 3e-5:

Discretising
Using Ito’s lemma we discretise the processes as follows:

Simplifying the volatility term in S(t), we first determine the variance of the SDE:

We reorganize as follows:

Putting it together
We can now model this discretely as a state-space based filter, searching for parameters that fit a-posteriori idealized view on the mode and mean-reversion process.   Post-parameterization, the process can be used in real-time to provide an estimate of the mode.

Final Notes
As you may have seen I took a (useful) 2-3 week diversion before coming back to the SDE based approach.   This is not a final model by any means, but I think a a solid starting point.    The purpose of the above is as a one of a number of factors in a multi-factor  strategy that want to optimize further.

Leave a comment

Filed under mean, state-space-models, statistics, stochatistic, volatility

Duration Estimation

In a prior post mentioned that for intra-day variance prediction it made sense to separate variance into 2 processes:

  1. intensity process
    When is the next event going to occur;  lets call this Tprior + Δt.   This is the more complex process of the two to predict.
  2. power process
    What is the amplitude of the event at time Tnow + Δt.   The power or amplitude process seems to be fairly well behaved.   An ARMA style process seems like a likely candidate.

Towards this end, I have been exploring models for the intensity process.   Very often this is modeled in terms of duration.   Below is a summary of some results:

ACD Models
ACD processes make overreaching assumptions.  In particular ACD models assume a constant AR decay and innovation contribution across time.   Unfortunately this is not supported by empirical observations.   Here are some results for the best-fitting Wiebull ACD model on HF data:

The R^2 level of 0.0091 does not inspire confidence.

SVR Model
I used an iterative non-parametric machine learning approach (SVR) with a training set of 20 prior observations and a lagged series of the derivatives of the prior 20 durations as the input vector.   Training across the entire series, one gets an in-sample prediction R^2 of 0.9980.   Unfortunately, incremental out of sample does not fair as well:

Distribution of Durations
Here are 2 views on the distribution of durations:

Alternative Models
Some possibilities:

  1. markov chain (probabalistic state system)
    We model the patterns by categorizing the durations into K separate levels.   To train we observe the chain of states, say {K1, K8, K1,K1,K1,K4} and determine a graph describing the approximate event chains, factorizing and assigning probabilities.
  2. ANN
    Use a simple feed-forward network, trained with a GA or DE.   This is easy to implement but subject to a variety of problems such as overfitting.

As the ANN is easy to compose, will start there.

1 Comment

Filed under genetic programming, machine-learning, neural networks, statistics, volatility

Durations on Intraday Price Series

As mentioned in a previous post, I intend to model quadratic variation in terms of multiple pairings of intensity (duration) and return level processes.   At a minimum want a pairing for “non-jump” related returns and a pairing for “jump” related returns.

To do this it is necessary to partition returns into the categories based on threshold.   We may further want to disregard price movements below a certain level unless they cumulatively add up to a return with significance within a period.   Towards this end my duration measurement function uses a threshold to determine whether a return is to be considered as an event or not.  In pseudocode:

r ← {0} ∪ diff(log(series))
t ← times (series)
durations ← {}
for (i in 2:length(r))
{
    # determine cumulative return since last acceptance
    cumr ← <cummulative return since last event or max cum window>

    # determine whether qualifying event has occurred
    if (|cumr| ≥ threshold or |r[i]| ≥ threshold)
        durations ← durations ∪ {t[i] - <Tlastevent>}
}

For the diffusion portion of the process, in this 2 second sampled data set (EUR/USD low-liquidity period), a threshold of 3e-5 (equivalent of about 1/2 pip), seems to work well:

The jump portion of the process should be set so as to capture desired jump features and not much more, here I show with a threshold of 2e-4 (equivalent to about 3 pips):

Leave a comment

Filed under statistics, volatility

Anatomy of a Spike

Spikes manifest in intra-day markets frequently.   These are often short-lived and associated with buying/selling programs more often than change in fundamental factors, particularly in low-liquidity periods.   In evaluating duration and variance measures was trying to determine reasonable jump thresholds.

Below is a price series demonstrating a variety of spiking behavior:

Taking a look at the region around the jump at high frequency, we note that the jump did not occur with one trade rather with multiple within a short space of time:

From a duration perspective, if we want to capture the spike as one event of a given magnitude we either need to consider the cumulative return over a given window or sample with a longer period.   Here is the same with a 2 second sample:

Here is the 2 second sampling of the original range.   With the longer sample period, the spikes in return are more evident (compare to the first graph in the post):

 

 

Leave a comment

Filed under statistics, volatility

Misadventures in Variance Estimation

I’ve been looking at a number of different volatility models for both daily and intra-day variance.   In the process have looked at GARCH(1,1) for daily and have developed a number of duration based variance (DV) models for intra-day.

Daily Returns
I had concluded early on that GARCH(1,1) generally fits daily data well but fails miserably with intra-day data.    Unfortunately, for some markets, GARCH performs miserably as well.

I was looking at Canadian 2Y CMT yields over the last 10 years.   There are significant spikes in return for any given year.    For example in 2003, on the raw daily returns, GARCH does poorly (squared returns: black, sigma squared: red):

As an experiment I used (very) lightly smoothed prices as a basis for GARCH testing.  The smoothing gave continuity to the price function and to the 1st derivatives (returns).    GARCH(1,1) calibrated on the same data set, but with minimal smoothing yields a much higher degree of fit (25 x):

Of course we are now projecting sigma^2 on a lightly smoothed price series and not the original series.   Note that the magnitude of returns has reduced as well, as the smoothing allows for more gradual transitions.

Let’s look at a smaller section of returns on the raw price series:

Visually there appears to be a autoregressive component in squared returns, but involves jumps followed by periods of near zero return.   Subsequent jumps appear to follow a near linear decay in magnitude.    You will note that the last set appears to be a poor fit, however, there is a cluster of low jumps, that when considered in intensity space would have a higher combined value (ie, consider the cumulative return over a small window).

This behavior would point to modeling this as a intensity function (equivalent to duration).

Intra-day Squared Returns
Autocorrelation plots of liquid periods and illiquid periods both show significant autocorrelation out to 10+ seconds, indicating that we should be looking at an AR process at some level with long memory:

Thus far the most successful model is a duration based hawkes process which is “infinitely” autoregressive with exponential decay.

Leave a comment

Filed under statistics, volatility

Variance Estimators Revisited

I need to have a reasonably accurate measure of intra-day variance for a strategy I am working on.    In a previous post I indicated that while GARCH(1,1) fits daily returns well, it fits intra-day returns poorly.   In fact so poorly, that it was not possible to arrive at a non-zero maximum likelihood.

The failure of the GARCH model for intra-day is multifold.   Intraday (squared) returns exhibit:

  1. random large jumps followed by sharp drop-off
  2. microstructure noise
  3. apparent gaps in return as a smooth AR process

Garch Revisited
I decided to take another look at GARCH;  Perhaps with some modification could make it useful as a rough measure, short of a more sophisticated model.

The “jumps” present in intra-day data have a near 0 probability in a gaussian error model.   To combat this, added a component to the GARCH model to absorb jump shocks:

The added component is J, which absorbs the excess return above a threshold.   The above basically says that the jump has no predictive value for subsequent measures of the squared return (though this is not strictly what is observed).   More likely, the jumps should be considered as a largely independent random process contributing to overall variance.

The threshold was determined heuristically as:

With the above formulation was able to calibrate to a relatively high likelihood and R^2 of around 0.90 in sample and 0.85 out of sample, that is with truncated jumps.

Duration Based Approach
For a Brownian motion based process, we know that increments Δr over time interval Δt scale as follows, and that one implies the other:

Basically implying that we can estimate the amount of time it will take for Δr to be realized in the price (returns) process.   Engel and Russell developed a well regarded model for price duration, called the Autoregressive Conditional Duration model (ACD).   The model expresses price movements in terms of duration (or alternatively intensity), though does not explicitly provide a variance estimate.

For the purposes of this model, let us assume a granularity constant Δr.  We then let D[t] denote the effective span of time took for the process to move by Δr between [t, t-1].  This is calculated as Δr divided by the change in return (approximately).   For example if Δr is the return represented by 1 pip, and a 3 pip return is seen over a period of 1 second, D[t] would be 1/3rd.

In terms of intensity we can model this as:

We would like to get variance into a form were it can estimated in terms of intensity.   Given that variance is expressed as the expectation on square of returns, we can formulate this probabilistically:

We can use the Poisson pdf to model p(k jumps | λ), and evaluate the discrete integral up to some maximum jump size m.

Handling Jumps
Empirically, it would seem that jumps follow a different process from non-jump price movements.   To address this I’ve added a separate process for jumps, with a single factor guiding exponential decay.   The two processes are combined when the expectation is calculated:

The messy bit is deciding, given a return, whether to consider a jump J or a standard return D (both as a durations).   We can decide that:

D = max (price duration for [t-1,t], jump threshold)
J = max (price duration for [t-1,t] – D, 0)

or decide to categorize the whole return as a jump or normal return.

Seasonality
We might also want to adjust the constant to adjust to different levels depending on the prevalent level for a given period of the trading day.   Durations will be short (variance high) at the opens and closes for instance.   This would require building a seasonality curve for the trading day.

In the FX markets, different trading sessions offer varying degrees of liquidity.   With reduced liquidity we can see periods with very little price action and large (often temporary) jumps in price.  The autocorrelation of intensities is likely to differ in these sessions, not to mention the base intensity.   A markov switching model  may be the way to go to specialize for the different market conditions.

Results
I am now implementing this.   Will see how it compares to the modified GARCH process.   May also look at some variants for the jump process.

Leave a comment

Filed under statistics, strategies, volatility

Trade Exit Strategy

I’ve not put all that much focus on trade exit strategies as usually have relied on trading signal to determine when to reverse a position (in addition to various risk related parameters).   I have a situation, now, where I need to determine the exit independent of the signal.   Thus will need to make an exit decision based on other criteria.

As the saying goes, “cut your losses short and let your profits run”.   As simple as this sounds is not necessarily simple to implement.   Any given price path will have “retracements”, i.e. price will not move in a monotonic fashion.    So how does one set the “trigger” to exit the trade such that can ride over “retracement” noise towards more profit?

Standard Practice
Standard practice is to use a variety of indicators to determine when to exit a profitable trade:

  1. retracement as % of “volatility” (recent trading range)
    Using recent trading ranges (a proxy to variance) to scale the amount of retracement can work well in certain conditions.
  2. arbitrary limits: max profit, maximum trade time
    We may have a view around maximum time or profit opportunity for a strategy, allowing us to avoid a later drawdown to signal our exit.
  3. various technical indicators

Some Problems
When trading intra-day observing the market tick-to-tick, the above rules do not work well.   High frequency data has many short-lived high-amplitude spikes (high relative to typical price movements).  This is particularly evident in periods of the trading day with lower liquidity.

Whereas recent trading ranges usually provide a reasonable view on the current period’s noise level for medium-low frequency data, the same is not generally true for high frequency.

Picture 1

Optimal Approach
An optimal approach to the problem is to use a price path model where we can determine the probability of a retracement swinging back in the direction of profit;  Or for that matter, determine the probability of subsequent price action retracing prior to any significant retracement.    Such a model cannot possibly be right all of the time, but done successfully will have a significant edge over even odds.

The price path model provides the probability of a price going through a level at time t.  Given that can ask:

  • what is the probability of the price going through a level within a time period
  • what is the probability of the price remaining within a corridor within a time period

The above allow us to make optimal exit decisions based on risk considerations (the corridor) and likely (or unlikely) movement in the positive direction.

Embedded in such a model, though, is an accurate view on the price process within a near window, a view and a strategy in its own right.

I developed a general calibration and prediction framework for the price path model, however, the price process SDE needs more work (although it shows good predictive behavior in certain market conditions, does not handle all well as of yet).   Are there better alternatives for the short-term?

Mean-Reversion Collar
Short of a semi-predictive probabalistic model the next best thing might be to make use of our recently developed mean-reversion envelope.   The envelope can be tuned to various cycle periods and amplitudes.

Noise exhibits as a mean-reverting process around some evolving mean.   We can tune the envelope to encompass the level of noise we wish to ignore.   The projected mean therefore indicates the overall direction of price on average.    We can use this then to determine whether to carry the trade forward through a retracement or not.

Picture 3

9 Comments

Filed under money management, probability, strategies, volatility

Intraday volatility prediction and estimation

GARCH has been shown to be a reasonable estimator of variance for daily or longer period returns. Some have adapted GARCH to use intraday returns to improve daily returns. GARCH does very poorly in estimating intra-day variance, however.

The GARCH model is based on the empirical observation that there is strong autocorrelation in the square of returns for lower frequencies (such as daily). This can be easily seen by observing clustering and “smooth” decay of squared returns on daily returns for many assets.

Picture 1

where the second equation is the ML optimization for the parameters.

Here is an example for daily Canadian 2Y CMT yield.  The red is the GARCH(1,1) variance, the black is the series, and the grey is the log return, and the green circle is the predicted variance for the next period:

Picture 1

Contact me if you would like to get the R source code for the above.

Intra-day squared returns, however, have many jumps, with little in the way of autocorrelated decay pattern. Looking at the EUR/USD series, the squared returns have jumps that reduce the ML to the point where GARCH parameterization does not converge.  There does appear to be a longer-term pattern, though, allowing for a model, though not GARCH.

With expanded processing power and general access to tick data, research has begun to focus on intra-day variance estimation. In particular, expressing variance in terms of price duration has become an emergent theme. Andersen, Dobrev, and Schaumburg are among a growing community developing this in a new direction.

At this point have disqualified GARCH as a useful measure for my intra-day strategies but am planning to use for a daily strategy. I am investigating a formulation of a duration based measure for intra-day volatility.

Leave a comment

Filed under statistics, volatility