Market Microstructures - The Red Pill

November 22nd, 2019, 11:20 AM

Hi Birdyalex,

I appreciate and respect the reconciliation work you put in. You definitely understand this way better than most people.
To your question about the anomalies. As you can imagine the task of un-bundling and classifying the data isn't super linear. Within Ninjatrader here plenty of cases where the level 2 feed and level 1 feed get out of sync pretty bad, and I have built some heuristics into my extraction program that backfill some of the gaps with kind of a one shoe fits all approach based on the standard case of things likely fit. But there are for sure outliers that will completely be off because I didn't capture every permutation.

But as you have seen, the majority of the cases reconcile to the raw feed file I provided. I will also add that the actual extraction program uses a slightly different data model to the raw file I posted, so there will be some differences there as well. This was more of just a high level example of how everything ties.

I won't claim to be the first guy to ever try to un-bundle volume data and do this type of analysis, but I am perhaps the first that has shared this publicly in such a way.

But in general the gaps you found, are all backfilled with logic in my program and at the end I think I am probably accurately capturing 95% of rows the correct way. The remaining 5% won't be material to the analysis I am doing, but I generally think I get 50% of those right. The remaining rows are likely off by just a small margin, because I didn't go through every possible whammy that could happen.

I also have the MBO feeds, which are way larger, and way more accurate / detailed. I did a recon to these just to get an idea of how well this approach was working and it was within an acceptable margin of error for me.

birdyalex

Hi Ian!

First of all thank you for sharing the results of your work and answering questions!
I tried to replicate your "base data" (ES 2-25-19 Limit Order Fill Rate Analysis.xlsx) based on provided raw L1 data and found some strange cases.
Let's be more specific - from the sample of L1 data (ES 2-25-19 Raw L1 Data.csv) one can build something like 238 "macro price levels" and 214 of which are fully aligned with what one can find in the "base data" sheet (ES 2-25-19 Limit Order Fill Rate Analysis.xlsx), but for the rest of 24 examples there are some minor divergences. I guess it should be emphasized that while one can build 238 "macro price levels" from the raw data you�ve got almost 310 "macro price levels" in your "base data" sheet (for the same time slice). Let�s see by some examples what I mean.

Example #1
your "base data" sheet got the following figures (in green)

based on the raw data these figures should correspond to the following time slice

�

the differences in the figures ("base data" (left) vs. reproduced (right))

at least have a look at the "Ask Starting Volume" and "Ask Ending Volume" for example. One can see from the raw data that the figures should be of 200 and 192 respectively while you�ve got 199 and 199. That�s strange.

Let�s for example see if there are any differences for the cases preceding and succeeding the case in question (in blue)

My calculations got matched with yours

- - -

What�s interesting is the rows 3359-3360 and 3362-3363 can�t be calculated from the raw data because there�s no such data in there from which it can be calculcated.

Example #2
your "base data" sheet got the following figures (in green)

based on the raw data these figures should correspond to the following time slice

the differences in the figures ("base data" (left) vs. reproduced (right))

here it is clearly seen that for example "Bid Starting Volume" and "Ask Trans Volume" are 189/3 (vs 191/0 in the "base data").

Let�s again compare if there are any differences for cases preceding and succeeding the case in question (in blue)

we have got rows #22173-22218 covering all three cases (preceding, in question and succeeding).

for those preceding and succeeding cases I have got fully matched results with yours

- - -

but again for BBOID=3590,3591 (rows #3595,3596 of the "base data" sheet) one can�t recover that data from the raw data.

So my question is did you use one extraction program to build your "base data" sheet and some other to dump raw data? Or am I missing something?

Thanks!

Alex

November 22nd, 2019, 11:23 AM

More or less. The extraction program uses the level 1, and level 2 feeds in a specific sequencing way. What I dumped into excel to show the raw data was just to illustrate how it works at a high level. Some of it can be reconciled and some of it can't. But I hope seeing how everything relates is useful. There aren't many people that have ever tried to un-bundle volume data using retail tools to do this type of analysis. But I think it's super useful for modeling and testing.

karun

From what I understand the raw data file was solely to illustrate the format of the data, I think the starting timestamps were different so it doesn't align exactly with the Summary Statistics

July 4th, 2020, 04:16 PM

options

Here what I was talking about . Price at support , orders adding on a bid side , plus a 100 lot buyer (dimond) . Went market to get this one , easy 5 ticks .

Hello @options, may I know how you get those diamonds, are those block orders? Thank you.

Sent using the nexusfi.com mobile app

July 11th, 2020, 12:43 AM

Excellent info here. Did you figure out things mostly by yourself or you'd recommend some educators ?

July 13th, 2020, 07:28 PM

I figured everything out that I presented in this thread on my own just by looking at the data. I am a data scientist among other things by trade, so building this type of taxonomy and analyzing the data this way wasn't too much of a leap for me.

Nadir22

Excellent info here. Did you figure out things mostly by yourself or you'd recommend some educators ?

December 5th, 2020, 04:05 PM

I discovered this thread months ago and it sent me down a path researching microstructure and developing tools to try to understand it. Thanks for this info, it truly was eye opening for me!

I hope this isn't too off topic, but I'm curious on how the concepts you describe in this thread would apply to penny-tick sized and small spread assets such as stocks? Do you have any thoughts on that? The relatively large tick size of futures make natural price levels. Would you instead use some small bars (possibly tick or volume bars) to set the range for price levels, and consider a "price level" break on the breakout of one of these bars? Or would you just look at it exactly the same way as ES and just accept the penny tick size? Would greatly appreciate any thoughts on this.

SpeculatorSeth · December 5th, 2020, 04:55 PM

If you really want to get into this sort of thing the book trades quotes and prices financial markets under the microscope covers these subjects quite thoroughly.

December 6th, 2020, 10:36 PM

Good question, I've tried the NQ and YM before. They are both a bit thinner than the ES. The concepts are largely the same but there are more gaps in the data with multiple price levels getting swept often. So reconstructing a halfway decent model of what happened is a bit more challenging. The signals you get from thinner books tend to be less helpful in terms of finding alpha.

But on the flip side there are other opportunities with thinner markets too... You just have to know what to look for.

99% of this stuff won't help the average retail trader a bit, I might post a few things more applicable for the majority of users on this site at some point though.

jjttjj

I discovered this thread months ago and it sent me down a path researching microstructure and developing tools to try to understand it. Thanks for this info, it truly was eye opening for me!

I hope this isn't too off topic, but I'm curious on how the concepts you describe in this thread would apply to penny-tick sized and small spread assets such as stocks? Do you have any thoughts on that? The relatively large tick size of futures make natural price levels. Would you instead use some small bars (possibly tick or volume bars) to set the range for price levels, and consider a "price level" break on the breakout of one of these bars? Or would you just look at it exactly the same way as ES and just accept the penny tick size? Would greatly appreciate any thoughts on this.

December 8th, 2020, 09:01 AM

In thinking about this a bit more, perhaps the more general concept is "best bid/offers tend to mean revert". A retracement to a "weak side" is a reversion to the mean. When the tick size is large this plays out as a bouncing between the recent levels. The degree of bounciness is probably related to the volatility of the thing being traded, but a large tick size makes for smoother movement because each tick movement has a relatively large barrier to break through. I think I'll look into using some moving averages of tick bars. Perhaps an "average movement per tick" will correspond to some rough price levels.

March 20th, 2022, 07:44 PM

@iantg and @birdyalex

Did you need to sort the streamed data by time? I just started messing with the NT OnMarketDepth() event. When I streamed a data sample direct to a text file I noticed that the orders are somewhat out of chronological order? Sometimes by as much as 30 seconds?

Market Microstructures - The Red Pill

Discussion in Emini and Emicro Index

Market Microstructures - The Red Pill