A
Statistical Method for Evaluating Trading
Systems
Much of technical analysis,
particularly as applied to futures trading, is ad hoc. Most technical
indicators, for example, while written in the language of mathematics, are
essentially someone's view of how the markets should work. The stochastic
oscillator, for example, is based on the idea that in a rising market, the
closes tend to be near the top of the recent high-low range. If prices hover
near the top of this range, then presumably the market is 'overbought' and
likely to fall. This has no more (or less) mathematical justification than
the algebra on which the stochastic is based. This is not to dismiss
technical analysis, some of which can be quite useful.
However, it probably pays not to be overly impressed with
complicated-looking technical formulations, most of which have only
anecdotal justification for their intended use.
There are exceptions, of
course. I would put statistical methods into this category. Anytime you can find
a statistical method that applies to trading, it's probably worth looking into.
In fact, I tend to think of even simple statistical methods, like the one I'm
about to present, as a level above most technical
analysis.
One question that can be
addressed by the use of statistics is whether a trading system is inherently
profitable. We can approach this problem using confidence intervals for the
average trade. If we have a sample of, say, 100 trades from a trading system, we
can compute the average trade, T. Of course, we expect T to be greater than
zero, indicating that the system has been profitable on average. However, if we
took a different sample of 100 trades, we would, in general, find a
different average trade, T. If the variation among the trades is large
enough, it's possible that some of these averages could be less than zero,
indicating that the system was not profitable on average for those trades.
By computing
the confidence intervals for the average, T, we can determine whether it's
likely that the average will be greater than zero. The confidence intervals
specify upper and lower bounds for the average. The true average lies within
those bounds with some specified probability or confidence level, such as 95%.
The equation for the confidence intervals is as follows:
CI =
t * SD/sqrt(N)
where t is the Student's t
statistic, SD is the standard deviation of the trades, N is the number of
trades, and sqrt represents "square root." The average trade is likely to lie
between T - CI and T + CI. For the system to be profitable at our specified
confidence level, we need T > CI.
The value of t depends on the
specified confidence level and the number of trades, N. The exact value can be
found in a statistics table for the t distribution or calculated in software,
such as from the TINV function in Excel. However, provided we have a
reasonably large number of trades, the exact value is not necessary. If N = 60,
the t value for 95% confidence is t = 2.00. For larger values of N, t will get
slightly smaller, dropping to 1.96 for very large N. To be conservative, then,
we can take t = 2.00 as long as we have at least 60 trades. If our actual value
of N is larger than 60, we will have slightly larger intervals than if we used
the exact value of t.
Under this
assumption, then, we have
CI = 2 *
SD/sqrt(N); N >= 60, 95% confidence.
As an
example, I calculated the confidence intervals for MiniMax II on the
E-mini S&P. Going back to 1998, there are 241 trades. The average
trades is $248 with an standard deviation of $990. This gives us a
confidence interval of CI = 2 * 990/sqrt(241), which is equal to $128. So, we
can expect the average trade to lie somewhere between 248 - 128 and 248 + 128 or
between $120 and $376. The lower number, $120, is greater than zero, so the
system should be profitable.
Because the square root of N is
in the denominator, all other things being equal, the more trades we have, the
smaller our confidence intervals will be. This is a way to quantify what most of
us already know from intuition and/or experience: if you want to know
whether a trading system is profitable, the more history the better. In fact, we
can re-write the CI equation to tell us how large N needs to be in order to
demonstrate profitability:
N > 4 *
(SD/T)^2
where the ^2 indicates "square."
This assumes we have a good estimate for the standard deviation and average
trade.
As an example of this equation,
let's take the numbers from above. We had an average trade, T = 248, and a
standard deviation, SD = 990. Plugging these into the equation for N, we get N
> 63. In other words, with these average trade and standard deviation
numbers, there's a 95% chance the average trade will be profitable (i.e.,
greater than zero) provided we have more than 63 trades in our sample.
What are the limitations of this
approach? As usual with statistics, we have to pay careful attention to the
underlying assumptions. On a positive note, we don't have to worry about
normality provided we have at least 30 trades. The central limit theorem of
statistics says that as long as we have at least 30 "observations" (i.e., trades
in our case), the distribution of the averages will be normal even if the trades
themselves are not normally distributed.
The one concern is that the
accuracy of the confidence intervals is dependent on the distribution of trades
remaining the same. In statistics, this is called "stationarity." If the true
average and standard deviation change over time, the confidence intervals may
not be accurate, depending on how much things change. For example, if the trades
are based on a system developed by "over-fitting" the system to the market, then
the system may not hold up well in the future. This will be reflected in a
change in the average and standard deviation of trades. This confidence
interval technique can't tell us anything about how robust a system is or if
it's been over-fit to the market.
Next month, I'm going to draw an
analogy between least-squares curve-fitting in mathematics and trading system
optimization. This can tell us something about how many trades we need when we
optimize the parameters of a trading system to avoid the over-fitting
problem.