Welcome to NexusFi: the best trading community on the planet, with over 150,000 members Sign Up Now for Free
Genuine reviews from real traders, not fake reviews from stealth vendors
Quality education from leading professional traders
We are a friendly, helpful, and positive community
We do not tolerate rude behavior, trolling, or vendors advertising in posts
We are here to help, just let us know what you need
You'll need to register in order to view the content of the threads and start contributing to our community. It's free for basic access, or support us by becoming an Elite Member -- see if you qualify for a discount below.
-- Big Mike, Site Administrator
(If you already have an account, login at the top of the page)
I appreciate and respect the reconciliation work you put in. You definitely understand this way better than most people.
To your question about the anomalies. As you can imagine the task of un-bundling and classifying the data isn't super linear. Within Ninjatrader here plenty of cases where the level 2 feed and level 1 feed get out of sync pretty bad, and I have built some heuristics into my extraction program that backfill some of the gaps with kind of a one shoe fits all approach based on the standard case of things likely fit. But there are for sure outliers that will completely be off because I didn't capture every permutation.
But as you have seen, the majority of the cases reconcile to the raw feed file I provided. I will also add that the actual extraction program uses a slightly different data model to the raw file I posted, so there will be some differences there as well. This was more of just a high level example of how everything ties.
I won't claim to be the first guy to ever try to un-bundle volume data and do this type of analysis, but I am perhaps the first that has shared this publicly in such a way.
But in general the gaps you found, are all backfilled with logic in my program and at the end I think I am probably accurately capturing 95% of rows the correct way. The remaining 5% won't be material to the analysis I am doing, but I generally think I get 50% of those right. The remaining rows are likely off by just a small margin, because I didn't go through every possible whammy that could happen.
I also have the MBO feeds, which are way larger, and way more accurate / detailed. I did a recon to these just to get an idea of how well this approach was working and it was within an acceptable margin of error for me.
In the analytical world there is no such thing as art, there is only the science you know and the science you don't know. Characterizing the science you don't know as "art" is a fools game.
More or less. The extraction program uses the level 1, and level 2 feeds in a specific sequencing way. What I dumped into excel to show the raw data was just to illustrate how it works at a high level. Some of it can be reconciled and some of it can't. But I hope seeing how everything relates is useful. There aren't many people that have ever tried to un-bundle volume data using retail tools to do this type of analysis. But I think it's super useful for modeling and testing.
In the analytical world there is no such thing as art, there is only the science you know and the science you don't know. Characterizing the science you don't know as "art" is a fools game.
I figured everything out that I presented in this thread on my own just by looking at the data. I am a data scientist among other things by trade, so building this type of taxonomy and analyzing the data this way wasn't too much of a leap for me.
In the analytical world there is no such thing as art, there is only the science you know and the science you don't know. Characterizing the science you don't know as "art" is a fools game.
I discovered this thread months ago and it sent me down a path researching microstructure and developing tools to try to understand it. Thanks for this info, it truly was eye opening for me!
I hope this isn't too off topic, but I'm curious on how the concepts you describe in this thread would apply to penny-tick sized and small spread assets such as stocks? Do you have any thoughts on that? The relatively large tick size of futures make natural price levels. Would you instead use some small bars (possibly tick or volume bars) to set the range for price levels, and consider a "price level" break on the breakout of one of these bars? Or would you just look at it exactly the same way as ES and just accept the penny tick size? Would greatly appreciate any thoughts on this.
If you really want to get into this sort of thing the book trades quotes and prices financial markets under the microscope covers these subjects quite thoroughly.
Good question, I've tried the NQ and YM before. They are both a bit thinner than the ES. The concepts are largely the same but there are more gaps in the data with multiple price levels getting swept often. So reconstructing a halfway decent model of what happened is a bit more challenging. The signals you get from thinner books tend to be less helpful in terms of finding alpha.
But on the flip side there are other opportunities with thinner markets too... You just have to know what to look for.
99% of this stuff won't help the average retail trader a bit, I might post a few things more applicable for the majority of users on this site at some point though.
In the analytical world there is no such thing as art, there is only the science you know and the science you don't know. Characterizing the science you don't know as "art" is a fools game.
In thinking about this a bit more, perhaps the more general concept is "best bid/offers tend to mean revert". A retracement to a "weak side" is a reversion to the mean. When the tick size is large this plays out as a bouncing between the recent levels. The degree of bounciness is probably related to the volatility of the thing being traded, but a large tick size makes for smoother movement because each tick movement has a relatively large barrier to break through. I think I'll look into using some moving averages of tick bars. Perhaps an "average movement per tick" will correspond to some rough price levels.
Did you need to sort the streamed data by time? I just started messing with the NT OnMarketDepth() event. When I streamed a data sample direct to a text file I noticed that the orders are somewhat out of chronological order? Sometimes by as much as 30 seconds?