Welcome to NexusFi: the best trading community on the planet, with over 150,000 members Sign Up Now for Free
Genuine reviews from real traders, not fake reviews from stealth vendors
Quality education from leading professional traders
We are a friendly, helpful, and positive community
We do not tolerate rude behavior, trolling, or vendors advertising in posts
We are here to help, just let us know what you need
You'll need to register in order to view the content of the threads and start contributing to our community. It's free for basic access, or support us by becoming an Elite Member -- see if you qualify for a discount below.
-- Big Mike, Site Administrator
(If you already have an account, login at the top of the page)
Another idea is to use a NoSQL database. Cloud providers, Google, Amazon, Microsoft Azure, don't use SQL servers for their data, it's just too much overhead.
I might one day try to setup a tick database on my Azure account, could be fun
If you take a look how the Table service is organized (there's also a Blob and queue service) , it's
PartitionKey - RowKey - Object (max 64 kB, for more, use a link to a Blob object)
This strucuture is optimized to allow storage of billions of rows , actually your account can grow up to 100 TB of data, and if you never do a full scan and only query on PartitionKey -RowKey, performance is good.
So if you set up PartitionKey= instrument name, and RowKey = tick timestamp, you can easily build a database that wil store billions of ticks, and that on which you can run very speedy queries like "give me all the ticks on instrument xyz and between startdate and enddate".
I'm pretty sure there are also desktop implementations of NoSQL data servers, you can check NOSQL Databases
Please, trust my skills. For using tick history the better way is to use SQL. One time me and my guys created system, which needs to analyse historical tick data. SQL for our deals was the best (even better then Oracle).
I'm a software developer and found this forum entry via google and would like to share my knowledge and hope getting also information about other tick database systems or best practices. I'm currently developing a system which should be commercial. So the plan is to buy the streams as well as history data from stock data vendors and pay license and service fees. My knowledge base is currently very low in context of handling stock data in IT systems. But my knowledge is very high in software architectures (also with normal relational database use cases).
Current plan:
Using a relational database for all. The table is splitted into two separate tables. One key table and one table for the tick data. The key table contains references to other tables and provides a unique id which is referenced from the tick table. The key table contains ids like stock id, stock exchange id, currency id. The tick table contains the price, bid, ask and the time. The PK is the reference to the key table and the time field. The key table is only to reduce the amout of fields and so also the needed storage per tick (5NF).
All the analyses should be made on time frames like 1m, 5m, 1h, 1 day, and so on - not on a tick base. For performance improvements end of day jobs aggregate the data and stores this aggregated data into separate tables. Views are used to combine the day data and the data from the history time frame tables. The tick table is cleaned during the end of day job (runs when the referenced stock closes) and the data are stored as blobs. I think this would improve the performance a lot and keep the table space as less as possible without implementing blob handling for analyzes.
For the stream handling I would buffer the incoming ticks. The buffer is inserted if 500 - or 5000 ticks are reached or 500ms are bygone. This will reduce the transaction effort.
@Mike
Don't use a ID per tick. I think the performence is very very low with this strategy because the key generation effort is to high and after years (or weeks) you must override the keys because the limit is reached. By the way it is also unnecessary.
I would be deeply grateful for feedback from experienced users or developers.
Tick based analysis is a requirement. The system needs to be good enough to capture the live data from the market, which is tick increment + bid/ask size.
I guess there is a misunderstanding. Why you need an increment? It is unnecessary because the time, as well as the stock, stock exchange and currency should be the PK. Also the total volume is unneccessary because it could be easily calculated using the quote size. (All my comments refer to your last screenshot).
Ok, if tick base analysis is your requirement (tick analyses are't mine), you must handle this with a big system. But to reduce the needed storage and improve the performance I would increase the level of normalization and would also work with blobs for historic data.
I had it working over 2 years ago with NinjaTrader. I posted the code somewhere on the NT forums, but it was long ago and probably easier to start over.
What I was trying to ask is if you still were trying to figure out a more performance related solution that would scale, or if you decided not to move forward with the project full scale?