Tick Database Storage

June 21st, 2011, 04:02 AM

Another idea is to use a NoSQL database. Cloud providers, Google, Amazon, Microsoft Azure, don't use SQL servers for their data, it's just too much overhead.

I might one day try to setup a tick database on my Azure account, could be fun

If you take a look how the Table service is organized (there's also a Blob and queue service) , it's
PartitionKey - RowKey - Object (max 64 kB, for more, use a link to a Blob object)

This strucuture is optimized to allow storage of billions of rows , actually your account can grow up to 100 TB of data, and if you never do a full scan and only query on PartitionKey -RowKey, performance is good.

So if you set up PartitionKey= instrument name, and RowKey = tick timestamp, you can easily build a database that wil store billions of ticks, and that on which you can run very speedy queries like "give me all the ticks on instrument xyz and between startdate and enddate".

I'm pretty sure there are also desktop implementations of NoSQL data servers, you can check NOSQL Databases

August 7th, 2011, 10:15 AM

Please, trust my skills. For using tick history the better way is to use SQL. One time me and my guys created system, which needs to analyse historical tick data. SQL for our deals was the best (even better then Oracle).

August 17th, 2011, 04:27 PM

I'm a software developer and found this forum entry via google and would like to share my knowledge and hope getting also information about other tick database systems or best practices. I'm currently developing a system which should be commercial. So the plan is to buy the streams as well as history data from stock data vendors and pay license and service fees. My knowledge base is currently very low in context of handling stock data in IT systems. But my knowledge is very high in software architectures (also with normal relational database use cases).

Current plan:
Using a relational database for all. The table is splitted into two separate tables. One key table and one table for the tick data. The key table contains references to other tables and provides a unique id which is referenced from the tick table. The key table contains ids like stock id, stock exchange id, currency id. The tick table contains the price, bid, ask and the time. The PK is the reference to the key table and the time field. The key table is only to reduce the amout of fields and so also the needed storage per tick (5NF).

All the analyses should be made on time frames like 1m, 5m, 1h, 1 day, and so on - not on a tick base. For performance improvements end of day jobs aggregate the data and stores this aggregated data into separate tables. Views are used to combine the day data and the data from the history time frame tables. The tick table is cleaned during the end of day job (runs when the referenced stock closes) and the data are stored as blobs. I think this would improve the performance a lot and keep the table space as less as possible without implementing blob handling for analyzes.

For the stream handling I would buffer the incoming ticks. The buffer is inserted if 500 - or 5000 ticks are reached or 500ms are bygone. This will reduce the transaction effort.

@Mike
Don't use a ID per tick. I think the performence is very very low with this strategy because the key generation effort is to high and after years (or weeks) you must override the keys because the limit is reached. By the way it is also unnecessary.

I would be deeply grateful for feedback from experienced users or developers.

August 17th, 2011, 04:34 PM

MichaelE

All the analyses should be made on time frames like 1m, 5m, 1h, 1 day, and so on - not on a tick base.

.
.
.

Don't use a ID per tick. I think the performence is very very low with this strategy because the key generation effort is to high and after years (or weeks) you must override the keys because the limit is reached. By the way it is also unnecessary.

Tick based analysis is a requirement. The system needs to be good enough to capture the live data from the market, which is tick increment + bid/ask size.

Mike

August 17th, 2011, 04:55 PM

Big Mike

Tick based analysis is a requirement. The system needs to be good enough to capture the live data from the market, which is tick increment + bid/ask size.

I guess there is a misunderstanding. Why you need an increment? It is unnecessary because the time, as well as the stock, stock exchange and currency should be the PK. Also the total volume is unneccessary because it could be easily calculated using the quote size. (All my comments refer to your last screenshot).

Ok, if tick base analysis is your requirement (tick analyses are't mine), you must handle this with a big system. But to reduce the needed storage and improve the performance I would increase the level of normalization and would also work with blobs for historic data.

December 4th, 2011, 01:13 AM

BigMike,

Out of curiousity did you get this working? If not I may have time Next week to talk about options.

Thanks

Kosta

December 5th, 2011, 04:21 AM

Kostagr33k

BigMike,

Out of curiousity did you get this working? If not I may have time Next week to talk about options.

Thanks

Kosta

I had it working over 2 years ago with NinjaTrader. I posted the code somewhere on the NT forums, but it was long ago and probably easier to start over.

Mike

December 5th, 2011, 10:13 PM

Mike,

What I was trying to ask is if you still were trying to figure out a more performance related solution that would scale, or if you decided not to move forward with the project full scale?

kosta

December 6th, 2011, 06:37 AM

Kostagr33k

Mike,

What I was trying to ask is if you still were trying to figure out a more performance related solution that would scale, or if you decided not to move forward with the project full scale?

kosta

It is in the back of my mind, but not something I am working on.

Mike

July 17th, 2012, 12:29 AM

Have documented entire process here, ongoing:

Using MySQL for storing tick data

Thought I would document a few things.

I am using MariaDB (MySQL) on a Debian server.

I have a database called 'ticks', and a table called 'qcollector'. I am storing QCollector (tick data) data here.

root@media:~# mysql
Welcome to …

Mike

Tick Database Storage

Discussion in Platforms and Indicators

Tick Database Storage