Tick Database Storage

January 27th, 2010, 12:43 AM

Have any of you gone through the process of building a tick database? The deeper I look into it the harder it seems to implement as a one-man show.

I basically need the database to update from Ninja, IQFeed, or csv files, and then have that data available for use back in Ninja, Excel, R, etc. Over time that data becomes important.

It sounds simple enough until you try to do something like taking the raw tick data from two instruments and create 5 minute OHLC "bars" data to read into R for a covariance test or whatever... and then try 3, 7, and 10 minute OHLC data for comparison. I know Matlab has a package called Tick2Bar or something like that but much of this already makes my head hurt (not a Matlab user. Yet.).

Is there a simple way you guys and gals do this? Obviously I want to automate these processes as much as possible.

January 27th, 2010, 03:48 AM

I've done it.

I built a Ninja indicator that reads/writes ticks to a MySQL database. Although I just looked and couldn't find the .cs file, I will check for it in another spot later.

Mike

January 27th, 2010, 06:26 AM

I'd love to see that code, Mike. I take it that it doesn't pull data out of the NT database but writes on OnBarUpdate? Does that tax your system?

I've just tried third-party data conversion utility to try to export the NT7 db (MS SQL CE)to mySQL without success... but I admit I'm out of my depth. The file structure thing is confusing to the uninitiated.

I'm beginning to appreciate how much resource some people commit to getting their data managed properly.

January 27th, 2010, 06:50 AM

MXASJ

I'd love to see that code, Mike. I take it that it doesn't pull data out of the NT database but writes on OnBarUpdate? Does that tax your system?

I've just tried third-party data conversion utility to try to export the NT7 db (MS SQL CE)to mySQL without success... but I admit I'm out of my depth. The file structure thing is confusing to the uninitiated.

I'm beginning to appreciate how much resource some people commit to getting their data managed properly.

Right, it uses OnMarketData and etc, it does not convert the existing database.

I used to run it on a separate box, a Quad Core Q6600, it would come close to maxing out NT 6.5 single-core/thread, it would be much better to do it on NT7. The SQL db server should be local if possible as well to reduce latency. I was recording for a half dozen instruments.

Mike

January 27th, 2010, 07:27 AM

Creating n-minute bar data (or indeed daily data etc.) from tick data is quite straightforward. However there are complications. These arise from deciding what to do with missing data. For example, what if the market had no transactions in a particular period - do you create dummy bars to fill in the gaps? The answer depends on what you want to do with the data in your analysis. You do not want dummy bars if you are doing moving-average type studies, but you will need them if you are doing time-related studies.

The second complication is to do with market hours. If you are creating a generalised routine, it will need to know the open and close times for each market, and any holiday dates, half-days etc.

Something else to consider is database size. If you are storing actual tick data, this can create huge databases. These will have to be planned carefully because they will need to be efficient and you will need to be able to back them up somewhere.

A final thought relates to multiple inputs to the database - you will need to decide which is the primary source and what to do when the data overlaps.

Just a few things to think about. It is something I want to do myself one day.

January 27th, 2010, 10:16 AM

Big Mike

I used to run it on a separate box, a Quad Core Q6600, it would come close to maxing out NT 6.5 single-core/thread, it would be much better to do it on NT7. The SQL db server should be local if possible as well to reduce latency. I was recording for a half dozen instruments.

ouch, I've been wanting to do something like this..it bugs me having all this data streaming to my machine and not saving it but 6 instruments is pretty low for the amount of work involved.
Mike do you know where the bottle neck is with this? I wonder how much more this could be pushed using a time series db as opposed to SQL, if the bottle neck was with SQL.

January 27th, 2010, 10:17 AM

Bottleneck is NT 6.5 not MySQL.

Haven't tried it with NT 7. Also I wrote it a year ago, my coding technique was not nearly as good back then, a newer version could be written from ground up and be far more efficient I am sure.

For instance, I remember I was storing ticks each tick (one query per tick). Duh, that is stupid. You can easily implement some buffering, but a year ago I didn't know how.

Mike

January 27th, 2010, 11:46 AM

Ahh that is cool but I do really wonder what the upper limit would be with sql...
Have you ever messed with Berkley DB?

Here is a guy with a post doing this with some insane specs..
BerkeleyDB: High Volume database hitting limits : Sleepycat, BerkeleyDB, C++ API
"I am using Berkeley DB to store real time ticks for market data. To give you an idea of the amount of data...
- I get roughly 15000 ticks/sec on an average
- by the end of the day the database file grows upto 50GB."

The bottom he says he is subscribed to 10000 instruments. Thats way overkill for my purposes but to me storing less than the 500 in the S&P would not really be worth the effort.

Of course I'm not sure how ninja would handle anything close to that. There are C# bindings for it.
Berkeley DB C# Bindings | Dinosaur Technology and Trading
"To give you an idea of why this is important, take a look at their white paper from 2006 on performance. In a transacted environment they achieved 125,486 single record writes per second. With modern 2009 hardware, and multiple CPU / Solid State Disk systems, this could readily record every single tick coming off of the NYSE and NASDAQ (multiple million per second)."

MXASJ, I'm pretty much in the same boat..ninja, iqfeed...I think even though matlab is expensive it would be worth it over R just as far as time and stuff available.
Matlab supports berkley DB so I wouldn't think it would be too bad once you got stuff streaming into matlab.
The problem is with the low cost data venders you are basically on your own. Esignal doesn't sound like its worth the effort with matlab from what I've found, DTN you can't really find much at all on this...
CQG API supports matlab...I've kind of taken the position that its probly not worth doing this project until your ready to pony up to the next level of datastream expense. Unless someone comes up with some mind blowing solution, but then again I'm just not sure enough people are looking to do this stuff for DTN/esignal to waste their resources on this.

January 27th, 2010, 12:44 PM

swandro

Creating n-minute bar data (or indeed daily data etc.) from tick data is quite straightforward. However there are complications. These arise from deciding what to do with missing data. For example, what if the market had no transactions in a particular period - do you create dummy bars to fill in the gaps? The answer depends on what you want to do with the data in your analysis. You do not want dummy bars if you are doing moving-average type studies, but you will need them if you are doing time-related studies.

The second complication is to do with market hours. If you are creating a generalised routine, it will need to know the open and close times for each market, and any holiday dates, half-days etc.

Something else to consider is database size. If you are storing actual tick data, this can create huge databases. These will have to be planned carefully because they will need to be efficient and you will need to be able to back them up somewhere.

A final thought relates to multiple inputs to the database - you will need to decide which is the primary source and what to do when the data overlaps.

Just a few things to think about. It is something I want to do myself one day.

i agree with you, but not to pessimistic ...

https://www.google.com/codesearch/p?hl=de#hAxBGal4fIY/matlabcentral/files/3398/Tick2Bar.m&q=tick2bar&sa=N&cd=1&ct=rc

create bar data, and then smooth this data (maybe with heikin ashi) ... but you are right there a too much approach (forex, stocks, futures) i look on forex and there is no real problem to do this, but it is also depends from your feed

January 28th, 2010, 08:43 PM

I just had a silly thought that will probably consume me for the rest of today.

A strategy has access to BarsArray[0], which is OHLC data for type, periodicity, and days loaded of the strategy (say ES 03-10, type min, periodicity 5, days back 20).

How easy or difficult would it be to write that "old" BarsArray data to a csv file with time stamps?

Is this me not seeing the forest for the trees or is it something hard to do? I've been always thinking of the Historical Data Manager export function which exports ticks, but if a BarsArray can be written to a file... that could be enormously helpful for a lot of things.

Thoughts? Ideas? I'll go read up on FileRead/Write, Stream, and other similar things I never tried in NT.

Tick Database Storage

Discussion in Platforms and Indicators

Tick Database Storage