Breakout Bulletin - April 2003

The following article was originally published in the April 2003 issue of The Breakout Bulletin.

A Statistical Method for Evaluating Trading Systems

Much of technical analysis, particularly as applied to futures trading, is ad hoc. Most technical indicators, for example, while written in the language of mathematics, are essentially someone's view of how the markets should work. The stochastic oscillator, for example, is based on the idea that in a rising market, the closes tend to be near the top of the recent high-low range. If prices hover near the top of this range, then presumably the market is 'overbought' and likely to fall. This has no more (or less) mathematical justification than the algebra on which the stochastic is based. This is not to dismiss technical analysis, some of which can be quite useful. However, it probably pays not to be overly impressed with complicated-looking technical formulations, most of which have only anecdotal justification for their intended use.

There are exceptions, of course. I would put statistical methods into this category. Anytime you can find a statistical method that applies to trading, it's probably worth looking into. In fact, I tend to think of even simple statistical methods, like the one I'm about to present, as a level above most technical analysis.

One question that can be addressed by the use of statistics is whether a trading system is inherently profitable. We can approach this problem using confidence intervals for the average trade. If we have a sample of, say, 100 trades from a trading system, we can compute the average trade, T. Of course, we expect T to be greater than zero, indicating that the system has been profitable on average. However, if we took a different sample of 100 trades, we would, in general, find a different average trade, T. If the variation among the trades is large enough, it's possible that some of these averages could be less than zero, indicating that the system was not profitable on average for those trades.

By computing the confidence intervals for the average, T, we can determine whether it's likely that the average will be greater than zero. The confidence intervals specify upper and lower bounds for the average. The true average lies within those bounds with some specified probability or confidence level, such as 95%. The equation for the confidence intervals is as follows:

CI = t * SD/sqrt(N)

where t is the Student's t statistic, SD is the standard deviation of the trades, N is the number of trades, and sqrt represents "square root." The average trade is likely to lie between T - CI and T + CI. For the system to be profitable at our specified confidence level, we need T > CI.

The value of t depends on the specified confidence level and the number of trades, N. The exact value can be found in a statistics table for the t distribution or calculated in software, such as from the TINV function in Excel. However, provided we have a reasonably large number of trades, the exact value is not necessary. If N = 60, the t value for 95% confidence is t = 2.00. For larger values of N, t will get slightly smaller, dropping to 1.96 for very large N. To be conservative, then, we can take t = 2.00 as long as we have at least 60 trades. If our actual value of N is larger than 60, we will have slightly larger intervals than if we used the exact value of t.

Under this assumption, then, we have

CI = 2 * SD/sqrt(N); N >= 60, 95% confidence.

As an example, I calculated the confidence intervals for MiniMax II on the E-mini S&P. Going back to 1998, there are 241 trades. The average trades is $248 with an standard deviation of $990. This gives us a confidence interval of CI = 2 * 990/sqrt(241), which is equal to $128. So, we can expect the average trade to lie somewhere between 248 - 128 and 248 + 128 or between $120 and $376. The lower number, $120, is greater than zero, so the system should be profitable.

Because the square root of N is in the denominator, all other things being equal, the more trades we have, the smaller our confidence intervals will be. This is a way to quantify what most of us already know from intuition and/or experience: if you want to know whether a trading system is profitable, the more history the better. In fact, we can re-write the CI equation to tell us how large N needs to be in order to demonstrate profitability:

N > 4 * (SD/T)^2

where the ^2 indicates "square." This assumes we have a good estimate for the standard deviation and average trade.

As an example of this equation, let's take the numbers from above. We had an average trade, T = 248, and a standard deviation, SD = 990. Plugging these into the equation for N, we get N > 63. In other words, with these average trade and standard deviation numbers, there's a 95% chance the average trade will be profitable (i.e., greater than zero) provided we have more than 63 trades in our sample.

What are the limitations of this approach? As usual with statistics, we have to pay careful attention to the underlying assumptions. On a positive note, we don't have to worry about normality provided we have at least 30 trades. The central limit theorem of statistics says that as long as we have at least 30 "observations" (i.e., trades in our case), the distribution of the averages will be normal even if the trades themselves are not normally distributed.

The one concern is that the accuracy of the confidence intervals is dependent on the distribution of trades remaining the same. In statistics, this is called "stationarity." If the true average and standard deviation change over time, the confidence intervals may not be accurate, depending on how much things change. For example, if the trades are based on a system developed by "over-fitting" the system to the market, then the system may not hold up well in the future. This will be reflected in a change in the average and standard deviation of trades. This confidence interval technique can't tell us anything about how robust a system is or if it's been over-fit to the market.

Next month, I'm going to draw an analogy between least-squares curve-fitting in mathematics and trading system optimization. This can tell us something about how many trades we need when we optimize the parameters of a trading system to avoid the over-fitting problem.

That's all for now. Good luck with your trading.