Category Archives: regression

November 6, 2009 · 8:02 am

Mode of the Signal Envelope

One thing that struck me as clever with the HHT was the use of projecting a spline across the minima and maxima for a given harmonic. In effect this defines the envelope for the series for a given harmonic (level of decomposition). A posteri, the mean or mode should be more or less equivalent to the average of the envelope splines. Interesting!

This is a very appropriate way to model the mean within the context of mean-reversion (ie oscillations around the mode within an envelope). Instead of trying to model the mean directly as a stochastic process, why not model the envelope — this is more appropriate as we can fit the envelope into our view of mean reversion.

Version 1
I used a regressor to estimate the mean and connected minima and maxima with a spline for the envelope. The approach has issues (such as what sort of bias does the mean regressor have with respect to the data). There are some issues below:

Version 2
I took a dfference approach, estimating the inflection points with a regressing “oscillator” (in green) and determining the mid-points between minima and maxima to produce a spline representing the mode (blue). So far looks good. Edge cases, consolidation, and jumps need to be considered:

Strategy Discovery

Today I want to discuss the process of building or discovering a strategy. Generally medium to high-frequency models fall into one of the following catagories:

set of rules / heuristics on top of statistical observations
analysis of price signal
evolving state-based model of prices
spread or portfolio based relationships
technical indicators
some combination of these within an bayesian or amplification framework

These models share a common problem in that they are just crude approximations. They attempt to accurately determine behavior on a macro level.

The market is the emergent behavior of the trades and order activity of all of its participants. The perfect model is one that would have to be able to predict the behavior of each individual participant and be aware of all external stimuli affecting their behavior. This is at worst unknowable and at best would require something akin to an omniscient AI.

The best we can do is have a view or views around how to model market behavior. We can chose one of three approaches towards modelling:

create models that rationalize some statistical or behavioral aspect of the market
create models using a evolved program or regression, without a preconceived rationalization
create models that embody a combination of the above two approaches

Preconceived models have the advantage of being explainable, whereas, generated models often are not. That said, it is intriguing to pursue evolution and/or program generation as a means of discovering strategies in an automated fashion.

Rationale
Manual model development and testing is very time consuming. One will start with a conjecture or skeleton idea for a new strategy. The parameter space or variants of the idea may be large. Each has to be tested, optimized, retested.

Many of my strategies start out as models that digest raw prices and produce some form of “hidden state”. This hidden state is designed to tell us something useful with less noise than the original signal. This state may be multidimensional and may require further regression to map to buy and sell signals.

Obtaining optimal strategies point towards a multivariate numerical or codified regression approach. The testing and discovery of parameters and model variations would best be automated.

Tools
There are a number of approaches used in optimization, regression, or discovery problems:

Regression
ANN (Artificial Neural Nets), SVM (Scalable Vector Machine), RL (Reinforcement Learning)
Optimization
GA (Genetic Algorithms), Gradient Descent, Quadratic Optimization, etc
Discovery
GP (Genetic Programming), perhaps ANN as well

Strategy Discovery
Thus far I have mostly used tools in the Regression and Optimization categories to calibrate models. Genetic Programming represents an interesting alternative, where we generate “programs” or strategies, testing for viability and performance.

The “program” represents a random combination from an algebra of possible operations that operates on a set of inputs to produce an output. In our case, our inputs will be the digested information that our models produce. The program will map this into something that can be used to generate buy/sell/out signals.

Thousands of such programs are generated and evaluated against a fitness function. The fitest programs replicate, perform crossover, and mutate. This can be repeated for thousands of generations until programs with strong trading performance are determined.

An alternative and perhaps simpler approach is to use an ANN coupled with a GA. The GA generates weights / connections between neurons to produce a model between inputs and outputs.

Questions Under Consideration
ANNs and GPs differ in a number of important ways. Need to think further on the following:

ANNs and GPs can represent an infinite number of functions
ANNs accomplish this, though, at the cost of numerous neurons
ANNs and GPs may have a very different search space in terms of volume
We want to choose an approach that will converge more quickly (ie have a smaller search space)
How should we constrain the algebra or permutations to affect convergence
There are many “programs” which are equivalent, there may also be certain permutations we may not want to allow.
What sort of inputs are useful and how do we detect those that are not
Inputs that are not useful should ultimately have very little trace through the model. Will have to determine how to detect and prune these.

More thought needed …

Feed Forward NN in "real life"

Turns out that the Nematode Caenorhabditis Elegans has a nervous system that is similar to a feed forward network. A feed forward network is one where neurons have no backward feedback from neurons “downsignal” (i.e. the neurons and synapses can be arranged as a directed acyclic graph). This is very analogous to the feed forward network first envisaged for Artificial Neural Networks.

The worm has exactly 302 neurons and ~5000 synapses, with little variation in connection between one worm and another. This implies on average less than 20 synapse connections per neuron. This is in contrast to the mammalian brain, where most neurons have a feedback loop back from other neurons downstream of the signal.

I am very enthusiastic about this area of research as it progresses us step-by-step closer to realizing mapping an organism brain onto a machine substrate. The nematode is quite tractable because of the fixed and very finite number of neurons.

ANNs are no longer in vogue, but I use feed forward ANNs for some regression problems. Of course my activation function is likely to be quite different from the biological equivalent. ANNs are not a very active area of research given their limitations, but one does find them convenient for massive multivariate regression problems where one does not understand the dynamics.

The regressions that I solve only have sparse {X,Y} pairs if at all and can only be evaluated as a utility function across the whole data set. This precludes the various standard incremental “learning” approaches. Instead I use a genetic algorithm to find the synapse matrix that maximizes the utility function.

SVM is more likely to be used in this era than ANNs for regression. Its drawback is that it requires one to do much trial and error to determine an appropriate basis function, transforming a nonlinear data set into a reasonably hyperlinear dataset in another space.

Category Archives: regression

Mode of the Signal Envelope

Strategy Discovery

Feed Forward NN in "real life"

Search

Recent Entries

Links

Bitcoin

Resources

Strategy