Historical Market Data for Futures Trading: Sources, Quality, and Building Research-Ready Datasets
Overview #
Futures data isn't like equity data. There's no permanent ticker that tracks a company from IPO to delisting. Every futures contract expires, every quarter or month a new one takes the front seat, and the price gap between the outgoing and incoming contract can wreck your backtest if you don't handle it right. That gap isn't a bug
This article covers how to source futures data, how to judge whether it's trustworthy, how to stitch contracts into continuous series without corrupting your signals, and how to build datasets you can actually rely on when real money is at stake.
Why Futures Data Is Different #
Equities have one price series per stock. Futures have dozens of contract months per instrument, each with its own price, volume, and open interest trajectory. The ES (E-mini S&P 500) alone generates a new front-month contract every quarter. CL (Crude Oil) rolls monthly. Agricultural contracts like corn and soybeans have their own seasonal roll calendars.
This creates two problems that don't exist in equities:
Contract identity. "ES" isn't a single instrument
Roll discontinuities. When the front month expires and you switch to the next contract, there's almost always a price gap. For financial futures like ES, the gap might be 5-15 points. For commodities like CL or NG, the gap can be enormous
If you're running a mean-reversion strategy and your backtest shows a 50-point move that's actually a roll gap, your results are fiction.
Data Sources and What You're Actually Paying For #
Exchange Data #
The most authoritative source is always the exchange itself. CME Group, ICE, Eurex
The catch: exchange data is expensive, licensing is restrictive, and raw feeds require significant infrastructure to capture, normalize, and store. CME DataMine offers historical data products directly
Vendor Data #
Most traders get their historical data from vendors
The convenience is real, but so are the risks:
- Symbol mapping errors. As [Fat Tails documented on NexusFi] [2], some vendor symbol maps "do not work" for certain contracts, and "the daily futures data is sometimes more false than correct" for specific symbols like BC, CC, CT, EMD, GC.
- Timestamp conventions. Some vendors use exchange timestamps, others use arrival timestamps. Some normalize to UTC, others to Central Time. A timezone mismatch silently corrupts session-boundary analysis.
- Coverage gaps. Vendor history depth varies dramatically. Some carry 20+ years of minute data for major contracts. Others have spotty coverage for older periods or less liquid instruments.
Free Data #
Fine for learning. Not sufficient for serious backtesting. Coverage is limited, granularity is usually daily, corrections are rarely applied, and contract mapping is often incomplete.
Decision Framework #
| Strategy Type | Minimum Data Requirement | Why |
|---|
|
| Daily/swing bar-based | Vendor daily or minute bars, back-adjusted continuous | Return accuracy matters more than tick precision |
|---|---|---|
| Tick-level / order flow | High-fidelity tick/quote data with exchange timestamps | Bid/ask accuracy is the entire signal |
| Spread / calendar | Individual contract months with accurate roll dates | Spread calculation requires actual contract prices |
Data Fields That Matter #
Trade data records actual transactions
Quote data records the best bid and ask (Level 1) or the full order book depth (Level 2). Quote data matters for execution analysis and order flow strategies. As the [extensive comparison thread on NexusFi] [3] documented, data feeds can differ much in bid/ask accuracy during fast markets.
Settlement prices are calculated by the exchange, usually based on a weighted average of trades during a specific settlement window. Settlement is not the same as "last trade" or "close"
Timestamp Semantics #
Timestamps carry more complexity than most traders realize:
- Exchange time vs arrival time: the moment a trade occurred on the matching engine vs the moment your vendor received it. The difference can be milliseconds or seconds.
- Session boundaries: RTH (Regular Trading Hours) vs Globex (overnight). Different platforms define these differently.
- DST transitions: twice a year, session start/end times shift. Your data must handle this or your "9:30 AM open" bar contains wrong data for half the year.
- Trading date vs calendar date: a CME Globex session starting Sunday evening at 5 PM CT belongs to Monday's trading date.
Data Quality #
Data quality is the single most impactful factor in backtest credibility. Here are the specific failure modes with diagnostic criteria.
Missing Data and Gaps #
Gaps happen. The question isn't whether you have them
Diagnostic checklist:
- Compare your bar count against expected bars for the session. ES RTH has 405 one-minute bars per session (6.75 hours * 60). If you have 390, you're missing 15 bars
- Flag any gap longer than 2x the normal bar interval. A 5-minute gap in 1-minute data is a problem.
- Cross-check daily settlement prices against exchange-published settlements. A mismatch indicates data corruption.
[Schnook's analysis on NexusFi] [4] documented cases where vendor data was incomplete while Bloomberg showed complete data
Bad Ticks and Outliers #
Bad ticks are price records that didn't represent actual market conditions. They show up more often than you'd expect, especially in older data.
Detection rules:
- Price moves exceeding 3x the 20-bar ATR in a single tick
- Zero or duplicate timestamps in sequential records
- Prints outside the bid-ask spread by more than 2 ticks during normal conditions
- Volume spikes >10x the 20-bar average volume without corresponding price movement
Survivorship and Selection Bias #
In futures, survivorship bias shows up as:
- Incomplete contract coverage: vendor only stores "currently active" contracts, so expired back-months are missing
- Filtered symbol universes: free sources cover only major contracts, creating selection bias toward the most liquid instruments
- Specification changes: tick sizes, multipliers, and trading hours change over time. If your data doesn't reflect historical specs, analysis of older periods is unreliable
Contract Roll Correctness #
This is the big one. As [kevinkdog noted on NexusFi] [5], "If you are backtesting, the choice of what type of continuous contract to use is HUGE."
The "same symbol" does not mean the same contract across time. Your platform's "continuous ES" chart made decisions about when to roll, how to handle the gap, and what to display. Those decisions change your backtest results.
Roll Methods #
Four common approaches, each with different tradeoffs:
| Roll Method | When to Roll | Pros | Cons |
|---|
|
| Calendar-based | Fixed days before expiry (e.g., 8 days) | Simple, deterministic, reproducible | Ignores actual liquidity migration |
|---|
| Volume crossover | When back-month volume exceeds front-month | Follows actual liquidity | Noisy
| Open interest crossover | When back-month OI exceeds front-month | More stable than volume signal | OI reports lag by one day |
|---|
For ES, the volume crossover typically happens 7-10 days before expiry. For CL, it's often 3-5 days before. Agricultural contracts vary seasonally. The key: document your roll method and apply it consistently.
Continuous Contracts #
Since every futures contract expires, you need a continuous price series for longer-term analysis. Three approaches, none universally "correct."
Unadjusted Continuous #
Splice contract prices end-to-end. The gap between contracts appears as a discontinuity.
Use when: You need actual traded prices for support/resistance or round numbers.
Fails when: Any strategy computing returns, averages, or oscillators across roll boundaries sees phantom signals from the gap.
Back-Adjusted: Difference Method #
The math: At each roll point, calculate gap = New_Front_Close - Old_Front_Close. Subtract this gap from every historical bar.
Worked example (ES March to June roll):
- ESH26 closes at 5,820.00 on roll day
- ESM26 closes at 5,832.50 on the same day
- Gap = 5,832.50 - 5,820.00 = +12.50 points
- Every bar before the roll gets shifted down by 12.50 points
- A bar that was 5,700.00 becomes 5,687.50
After 10 years of quarterly rolls, early bars might be shifted by 200+ cumulative points. The relative movement between bars is preserved
Back-Adjusted: Ratio Method #
The math: At each roll point, calculate ratio = New_Front_Close / Old_Front_Close. Multiply every historical bar by this ratio.
Worked example (same ES roll):
- Ratio = 5,832.50 / 5,820.00 = 1.002148
- A bar at 5,700.00 becomes 5,700.00 * 1.002148 = 5,712.24
- Percentage returns between bars are preserved exactly
As [kevinkdog documented on NexusFi] [6], "Typical backadjusted continuous futures contracts work well with price DIFFERENCES, but will give incorrect results for RATIO calculations." The reverse is also true
Which Adjustment for Which Signal #
| Signal Type | Use This | Why |
|---|
|
| Moving average crossovers, momentum | Difference | Preserves absolute price changes between bars |
|---|---|---|
| Support/resistance, round numbers | Unadjusted | Preserves actual traded prices |
| Calendar spreads | Individual months | Continuous series are meaningless for spreads |
What NOT to adjust: Volume and open interest. They're properties of specific contracts. A back-adjusted volume series is meaningless
The Impact Is Real #
Consider a simple 20-day SMA crossover on ES. At an unadjusted roll with a +12.50 point gap, the fast average jumps relative to price history. This creates a false bullish signal that wouldn't exist with proper adjustment. Over 10 years of quarterly rolls, these phantom signals compound into significant backtest distortion. As [kevinkdog's analysis on NexusFi] [7] states, "if you use a continuous unadjusted contract, you will get incorrect backtest results due to rollover gaps."
Building Research-Ready Datasets #
Getting from raw vendor data to a trustworthy backtest dataset is a pipeline. Here's the workflow.
Step 1: Define Contract Universe #
Start with metadata: which instruments, which months, tick size, multiplier, trading hours. For ES: quarterly (H, M, U, Z), tick size 0.25, multiplier $50, Globex 5:00 PM - 4:00 PM CT.
Step 2: Extract Individual Contracts #
Pull data for each contract month separately
Step 3: Normalize #
Standardize timestamps to a single timezone (UTC recommended). Standardize symbol naming. Validate price scales are correct across products.
Step 4: Clean and Validate #
Run systematic quality checks and log results:
| Check | Threshold | Action if Failed |
|---|
|
| Missing bars | >1% of expected session bars | Investigate source, flag affected dates |
|---|---|---|
| Price outliers | >3x 20-bar ATR single-bar move | Flag for manual review |
| Volume outliers | >10x 20-bar avg without price move | Investigate for data corruption |
| Settlement mismatch | >0.5 tick vs exchange published | Replace with exchange settlement |
| Session boundary | First bar >5 min after session open | Check DST handling and session template |
As [SMCJB noted on NexusFi] [9], "bad data is bad data"
Step 5: Apply Roll Logic #
Choose your roll method (calendar, volume, OI, or last-trade-date). Apply your chosen continuity method (unadjusted, difference, or ratio). Document everything. Different roll methods produce different results
Step 6: Build Derived Features #
Resample to target bar size, compute indicators, calculate returns. Keep this reproducible
Minimum viable output for validation: After running the pipeline, your dataset should come with a manifest showing total bars, expected bars, gap count, outlier count and handling, roll dates and gap sizes, and a continuity check pass/fail.
Practical Data Management #
Version Everything #
Raw data, cleaned data, and derived datasets get versioned independently. When you change a cleaning rule or update roll logic, you need to know which version produced which backtest.
Audit Trails #
For every dataset: what contracts went in, what cleaning rules applied, what roll methodology was used, when the dataset was built. If you can't answer "how did we get this bar series?" six months later, the dataset is unreliable.
Quick-Start by Strategy Type #
Swing trader: Start with daily bars from a vendor like DTN IQFeed. Use difference-adjusted continuous contracts. Validate roll dates against the exchange calendar. Cross-check monthly settlement prices. This gets you started with minimal infrastructure.
Intraday researcher: Minute bars with verified session boundaries. Confirm DST handling by checking the first bar of each session across March and November transitions. Cross-reference against exchange settlement prices daily.
Tick researcher: High-fidelity tick data with exchange timestamps. Verify bid-ask spread consistency. Build your own bars from ticks rather than trusting vendor-aggregated bars.
Reproducibility #
The gold standard: given raw vendor data and your documented pipeline, any analyst should regenerate the identical dataset. Deterministic scripts, documented parameters, checksums on output files. A strategy that "works" on one version and breaks when regenerated with slightly different cleaning rules didn't really work in the first place.
What Comes Next #
Historical market data preparation is the foundation, not the destination. Once you have a clean, contract-correct, reproducible dataset, the next challenge is execution realism
That's a different article. This one's job was to make sure the data underneath is worth building on.
Knowledge Map
Prerequisites
Understand these firstGo Deeper
Build on this knowledgeReferences This Article
Articles that build on this topicCitations
- — Back-adjusted, Continuous contracts - best for support and resistance? (2012) 👍 30“Let me just summarize a few points on continuous and backadjusted contracts. Backadjusted Contract The backadjusted contract correctly shows the relative price movement, but the absolute values shown are only correct for the last contract shown on th...”
- — Kinetick - A new Market Data Feed Service for NinjaTrader (2010) 👍 3“Had a look at various futures contracts and daily data supplied via Kinetick. Here are my first impressions on the futures data - I could only load continuous contracts (##-##), no individual contracts were available - The continuous contracts were n...”
- — Analysis and comparison on different data Feeds and Platforms for Bid/Ask Studies (2010) 👍 7“DNT.IQfeed vs SC futures historical backfill. I compared it with volume breakdown study on Market Delta and Total Ask Bid Vol Diff Bars on SC using DTN.IQfeed and they show the same result (some little negligible point different).”
- — Accuracy of SC Denali data, and a data reconciliation request (2021) 👍 1“Just to follow up on this, SC Support never provided an adequate response to my questions and the above noted issue remains unresolved (inaccurate volume and questionable price data for Dec. 4 and 5, 2018).”
- — long term analysis of futures contracts (2022) 👍 2“If you are backtesting, the choice of what type of continuous contract to use is HUGE. 2 examples: If you use unadjusted continuous contracts, gaps during rollovers will lead to inaccurate, meaningless results.”
- — Backadjust futures contracts for spread trading backtesting (2023) 👍 3“Typical backadjusted continuous futures contracts work well with price DIFFERENCES, but will give incorrect results for RATIO calculations (incorrect in that the ratio for a date late year will change at the next rollover.”
- — Why Back Adjustments on Prior Contracts? (2021) 👍 5“To add to what josh says, I would NEVER rely on the actual historical (more than say 6 months ago) prices produced by continuous back adjusted contracts.”
- — Problems with CL back-adjusted data (2013) 👍 5“Correct. The backadjusted contract should show prices as high as 190 for CL in 2008. This is due to the Cushing contango, which made it much more expensive for long-only funds to roll WTI futures compared to Brent futures.”
- — Learning statistical analysis: Step by Step (2018) 👍 2“I'm not sure there is a gold standard of data but there are many data issues that can catch you unaware. For example Survivor-ship bias Stock Splits Stock Dividends How roll adjusted contracts are calculated.”
