Learning statistical analysis: Step by Step

January 2nd, 2018, 08:17 PM

Have you wondered how people generate those numbers on market data, like 56% of Mondays closed up in 2010? Yea, me too.

One of my goals this year to learn statistical analysis techniques to see if I can use statistics to support my discretionary trades. While I have a general concept of probability, I don't know much about specific statistics or how to go about generating that data. So in this thread, I will document my attempts to understand both statistical concepts and techniques in a step-wise fashion, which will hopefully be beneficial to others as well.

Any comments, feedback, help, or guidance is appreciated!

Happy 2018!

January 2nd, 2018, 08:33 PM

I read Adam Grime's book, "Quantitative Analysis of Market Data: a Primer." It's a very short book, but has a great overview of market math for beginners like me. One concept he talks about is standardizing the price of securities to themselves, in order to really figure out how they moved. There are two similar ways to do that:

1. Percent returns
2. Log returns

Percent returns are calculated as:
(Price today / Price yesterday) - 1

Log returns are calculated as:
Log (price today / price yesterday)

One of the main differences between the two is how you can manipulate them. Percent returns cannot be added, because a 10% followed by a 10% gain does not equal 0. However, log returns can be added together to get an overall picture of true returns (and the effects of volatility) - according to my understanding.

To learn how to do this, I
1. Downloaded daily data for SPY for the year 2010 into notepad
2. Pasted into excel the comma-delimited data. Excel split it up automatically into columns when I selected the correct delimiter (a comma).
3. I then created two columns, using closing prices for calculations. The first column is percent returns, and the 2nd column is log returns.

For percent returns, I took closing price at the last day of the year, and closing price on the first day of the year (maybe I should have used opening price on the first day of the year). For log returns, I summed all the log returns in the column; using the closing prices of the first day and last day also works and gives the same result. What's interesting is that from close to close, SPY gained 10.96%. However, because of the volatility (i.e. negative returns), it only returned 4.52%. If these calculations are incorrect, I'd appreciate any feedback.

January 2nd, 2018, 09:09 PM

Its so much easier to do that in a charting software that has coding facilities . Do one code and you can apply it to any data set easily .. That said i do use Excel to do stock balance sheet analysis , its a very handy tool and if you got VBA skills that shifts across to many chart softwares as well . I think if you dont do at least some Quantitative analysis you are in danger of going the way of the dinosaurs in this game . Lots of chatter about AI machine learning but i still think the human brain will lead the way for a few more years . Just recently got into machine learning and i think a lot of it is advanced curve fitting with minimal if no real reflection of price action > still got a ways to go before thinking is redundant imo

January 2nd, 2018, 10:59 PM

jackbravo

Have you wondered how people generate those numbers on market data, like 56% of Mondays closed up in 2010? Yea, me too.

One of my goals this year to learn statistical analysis techniques to see if I can use statistics to support my discretionary trades. While I have a general concept of probability, I don't know much about specific statistics or how to go about generating that data. So in this thread, I will document my attempts to understand both statistical concepts and techniques in a step-wise fashion, which will hopefully be beneficial to others as well.

Any comments, feedback, help, or guidance is appreciated!

Happy 2018!

Statistics are a great tool and there are ton of ways to use it to help improve your trading. I work in a quantitative field by trade and use advanced statistics all the time, but in terms of how I have applied to trading, I found that what really helped me was a quite simple application of calculating the house edge vs my edge.

Here is how it works: Start with the house edge calculation.

1. You give up the spread if you use market orders. So if nothing else is in play your odds of losing are 2 to 1.
2. If you use a limit order for your profit target and your market order for your stop loss, your odds of hitting your stop loss on first touch are 100%, and your odds of hitting your profit target on first touch are very low, so it may take 2-3 touches on average. But the further away you are from your profit target, the longer you will be waiting in the queue, thus the higher this probability increases. Depending on your profit target you can figure out a good baseline model. For a simple example it may look something like this.

1 tick PT: Full pass through most likely
2 tick PT: 3 to 4 touches or full pass through
3 tick PT 2 to 3 touches or full pass through
etc..... Eventually when you get around a really high target, you will be more towards the front of the queue. But you have to factor this in.

Here is a simple statistical model to illustrate this type of built in house edge against you. If you set a PT of 5 ticks, and a SL of 5 ticks and use limit orders for entries and assume that you somehow come out with 100% perfect fills for your entries what is your odds of winning based only on your exits. Some would say 50% / 50%, but in reality this is more like a 30% probability of winning due to the fact that every stop loss gets hit and filled immediately, whereas your PT needs multiple touches or pass thoughts.

With this simple example in mind, you can back this up to different PT vs. SL combinations and try to calculate your odds, you will find that there is a whole science to setting the house edge, and every trader needs to know this.

So let's talk about quantifying your own edge.... In order to determine if something has an edge to it you need two samples to test it. One with the edge in place and one without it as a control. When you see an improvement from your edge test vs. your control test, you can calculate how much this helped. There are a number of ways to measure it, but most of these are very simple.

a. Did you increase your win / loss ratio
b. Did you decrease your drawdowns
c. Did you increase the number of trades that got filled
d. Did you increase your average profit per trade.

You can easily quantify all of these, with every variable you add or subtract from your strategy.

You can test various entry systems to determine:

a. Did I get filled on touch or pass through if this was a limit order.
b. Did I end the first bar of the trade flat up or down
c. Am I filtering too many trades out, thus not taking enough trades in a day to hit my goal?

You can tests various exit systems to figure out.

a. Did my PT vs. SL have a good expectancy
b. How much slippage did my system have
c. Did I cut my trade short instead of capturing additional profits.
d. What was my MAE / MFE for my trades

So these are the key questions you should focus on, and everything is measurable and quantifiable. Eventually you can start to blend various parts of your system together to calculate your cumulative edge, vs the markets house edge to see if it is positive or negative overall. Once you find something that is logically and mathematically viable, then from there it is all about infrastructure and execution.

I could give further information it you have any questions, but I hope this gives you some generic ideas to get started with.

Good Luck!

Ian

January 3rd, 2018, 03:25 AM

This is something that I decided to look into deeper in the last month or so too.

I have been following https://metricsmaestro.wordpress.com/ for quite a while, he tracks statistics mainly for the ES and SPY. Some very interesting statistics there. What I also like about the way he tracks statistics is he shows the practical application for the stats too: https://metricsmaestro.wordpress.com/category/playbook/.

The approach I decided to take is to start with one instrument and build a base from there. I will definitely contribute to the thread when I have something useful.

January 3rd, 2018, 11:38 AM

@Ozquant - thanks for your input. I trade using Sierrachart, so it's programmable. But I don't know how to mathematically ask the questions I have, and then how to statistically analyze the answers. I'm really trying to build from Step 1 at this point...which is how to transform my question into an equation I can program into Sierrachart. I'm using excel because it simulates SC and is less cumbersome.

@Popsicle - thanks for the link. I'll save that info to look at. It'd be great if he could show how he actually got those numbers, that's really what I'm trying to figure out at the point...not really the results, but the mechanics.

@iantg - thanks for your extensive write-up. I'm to go through it step by step. I have a lot of questions for you, and I will ask them as I work through your list.

So for the House Edge calculation:

iantg

1. You give up the spread if you use market orders. So if nothing else is in play your odds of losing are 2 to 1.

How do I figure this out? I drew a simulation of buying a market order. It seems that there are 3 ways to lose money, and one way to win money, which would be odds against you 3:1

What do you think?

January 3rd, 2018, 12:02 PM

Jackbravo,

Thanks for following up, and looking into this. I think there are a number of fair ways to get to this and your method is certainly not incorrect by any means. The way that I build my edge calculations is to run each variable independent. So regarding my assessment that you have 2 to 1 odds of losing. It is just setting the bet line with the following assumptions. (I didn't mention these previously.)

Here are the assumptions and the betting line.

Profit = 2 ticks
Loss = 2 ticks

Odds of Winning Odds of Losing
Flat Entry 50% 50%
Entry with 1 tick profit 75% 25%
Entry with 1 Tick Loss 25% 75%

So because you give up the spread, you typically end up with 1 tick against you from the start. So your odds of hitting a 2 tick loss before you hit a 2 tick profit are 2 to 1. Now if you're PT / SL targets were different, you would see different odds / betting lines.

Every other aspect of the house edge such as the commissions that hit you regardless of if you win or lose are all applicable in calculating the overall house edge, but I keep each part separate, because as if I change the PT / SL targets for example, it only changes this aspect but other parts would stay constant.

There is no real right or wrong way to do this, but I typically try to break everything down as granular as possible, so that if and when I change things I can see the impact of the one change independently of everything else.

Hope it helps.

January 4th, 2018, 06:54 AM

iantg

Profit = 2 ticks
Loss = 2 ticks

Odds of Winning Odds of Losing
Flat Entry 50% 50%
Entry with 1 tick profit 75% 25%
Entry with 1 Tick Loss 25% 75%

I guess I don't know how the odds are calculated in the first place. I drew a table of my understanding so far. I get that it's a 50/50 shot from flat to +2/-2, but how do you calculate the odds at -1 tick starting? Thanks for any help!

January 4th, 2018, 09:55 AM

Are you trying to develop a statistical trading model? I have/had one that I liked but I am no longer pointed at black/grey box and my time frame has expanded. BUT I'd revive it if you wanted to work on it. I think I can find the documents.

Dan

Learning statistical analysis: Step by Step

Discussion in Traders Hideout

Learning statistical analysis: Step by Step