Welcome to NexusFi: the best trading community on the planet, with over 150,000 members Sign Up Now for Free
Genuine reviews from real traders, not fake reviews from stealth vendors
Quality education from leading professional traders
We are a friendly, helpful, and positive community
We do not tolerate rude behavior, trolling, or vendors advertising in posts
We are here to help, just let us know what you need
You'll need to register in order to view the content of the threads and start contributing to our community. It's free for basic access, or support us by becoming an Elite Member -- see if you qualify for a discount below.
-- Big Mike, Site Administrator
(If you already have an account, login at the top of the page)
In this thread, I will propose several ways to retrieve historical data from Yahoo Finance :
- Web Query (from Java code),
- YQL (from Java code),
- Quantmod (with R).
If you have other or better ideas, do not hesitate to contribute. :)
EDIT: Yahoo …
Google Finance (free) has been addressed in this thread, with examples of codes in R and Java:
In this thread, I will propose several ways to retrieve historical data from Google Finance :
- Web Query 1 (from Java code),
- Web Query 2 (from Java code),
- YQL,
- Quantmod (with R).
If you have other or better ideas, do not hesitate to contribute. …
Quandl (free) has been addressed in this thread, with examples of codes in R and Java:
As reminded by ratfink in another trade, Kinetic EOD data provided though NinjaTrader is another free source of data, which can be accessed programmatically from outside through .ntd files.
Broker: Mirus (Broker), Continuum (Data), Dorman (Clearing)
Trading: Futures
Posts: 202 since Mar 2013
Thanks Given: 428
Thanks Received: 202
Nicolas--
I've been looking for daily price data, too--albeit for my use within NT.
I've built a tool for NT that uses .NET to make a WebRequest to Quandl for Historical Contract data (here is Quandl's wheat landing site: https://www.quandl.com/c/futures/cme-wheat-futures), at which point I parse out the data into a NT7 compatible import format ( NinjaTrader Version 7), and then write the cleaned up data to a properly named text file (i.e., ZW 12-59.Last.txt)--at which point this text file can be imported into NT7. I built it to have an end date, so you can download all of the historical contract data within a time span (i.e., in the case of wheat, 1959 - 2014) by contract (i.e., F, G, H, J, etc). It's pretty cool.
But, and here is the big but(t) with Quandl, it's tough to know how good this data is. For instance, you can click on their Continuous Contracts to see a merged chart that uses all of the data (presumably) within the Historical Contract data files, and the data gaps jump right out at you. For instance, here are various Continuous Contracts that show the gaps.
So as you can see, a number of the daily Quandl futures contracts have data gaps that would impact a backtest (I looked at ES, GC, ZW, ZC, ZS, 6A, 6B, 6C, 6E, 6J, 6L, 6M, 6N, 6R, 6S, CL, HG, NG, NQ, PL, SI, YM, TF, ZN, ZF, ZT and SP). And the real challenge is that I haven't run across a tool that would allow us to determine whether the free data that you've mentioned here (i.e., in your first post)--plus the various tick sharing threads on nexusfi.com (formerly BMT) (i.e., CL, ES, QCollector, etc)--is any good. So to the extent that we can think about and collaborate on some sort of tool/standard that we could use as a community to confirm various data sources, it would seem to me that we need to be very careful.
Here is an idea I've been thinking about with regard to how to address data cleanliness.
Either nexusfi.com (formerly BMT) purchases or a current nexusfi.com (formerly BMT) members offers up a subscription to a historical data service (i.e., Bloomberg, NANEX, etc) that either BigMike or the nexusfi.com (formerly BMT) member offering up the subscription could then use to test various data files against (I'm thinking like some sort of a grader with a report card per run). The report card could report what the discrepancies were in the submitted data and the subscription feed--presumably so that the submitted data could be edited to bring the data in line with the subscription data. This edited data could then be posted in a new forum thread called "Data" or something by instrument (i.e., CL, ZW, ES, etc). The idea here is that the community is out collecting data, but nexusfi.com (formerly BMT) has a standard by which collected data is graded against so that nexusfi.com (formerly BMT) Elite Members (and heck, I could see creating a new membership beyond Elite for this) knew that the data we were using was sound.
However in the mean time, I think given that there is not some sort of a data quality standard on nexusfi.com (formerly BMT) yet, we need to be careful about using the free data that's posted on the site and elsewhere. For instance, I wrote my Quandl Data Grabber indicator to download a ton of Quandl daily data into NT7 format for the express purposes of posting everything I downloaded on nexusfi.com (formerly BMT). But then I looked at the data gaps and I'm pausing on posting the historical contract data, as I don't want others to be using this data until it's in a better shape.
In any event, those are my current thoughts on free data. I'd be interested to hear your thoughts when you have a moment.
I think those charts above look scarier than they are. If you are grabbing data in python and leaving it as a pandas dataframe a NaN isn't going to mess up any kind of backtesting. A missing value isn't the price going to zero like how it looks on those charts.
I think to get all hung up on bad ticks is to view the market too deterministically anyway. No strategy is failing because it uses quandl or yahoo instead of bloomberg. That is a pretty clear self deception IMO.
Broker: Mirus (Broker), Continuum (Data), Dorman (Clearing)
Trading: Futures
Posts: 202 since Mar 2013
Thanks Given: 428
Thanks Received: 202
You may be right. If nothing else, hopefully our discussion here will prompt others to at least give their data a cursory once over instead of blindly assuming that it is sound.
I'm currently working on incorporating some rollover date and offset functionality to my Quandl grabber (based on volume, open interest or volume and open interest). We can't programmatically define new expiries via NScript, so the indicator will just print the expiries and their rollover (roll into) dates and the corresponding offsets, which will then need to be manually created and entered in to NT.
I wish I had the time to go down the R rabbit hole (i.e., other languages beyond NT7), but I just need to stay focused on .NET right now.
I guess I view things along the lines if you had a database of 10,000 poker hands with the money that exchanged hands on each one, even if you deleted 5000 of the hands you would still have a great sample of the process at work. If you had 3 companies recording the poker hands, there would be no value in tracking down the reason for the discrepancy on hand #7877 with all 3 companies had the players holding different cards.
I think most trading backtesting is fools gold because like in the above example you could test for what happens when a player gets a certain hand over a certain number of deals and if they win or lose but that tells you absolutely nothing about what is going to happen on the next hand in the future. A deck of cards is surely more deterministic than the markets but we tend to think in terms of the opposite.
All you really need to know for R is how to get data in and then transform it to the input the function you want to use takes.