NexusFi: Find Your Edge


Home Menu

 



Data Quality and Integrity in Futures Trading: Detecting Bad Ticks, Gaps, and Errors Before They Wreck Your Edge

Looking for DTN IQFeed pricing, features, reviews, and community ratings? Visit the directory listing.
DTN IQFeed Directory →
Looking for NinjaTrader pricing, features, reviews, and community ratings? Visit the directory listing.
NinjaTrader Directory →

Overview #

Every futures trader trusts their data. Price bars render on the screen, indicators update, signals fire

Data quality isn't a "nice to have" data engineering concern. It's a first-order trading risk control. Get it wrong and your backtest produces imaginary edge. Get it really wrong and your live system trades on stale quotes or phantom prices while you watch.

This article covers the real-world environment of data corruption in futures markets

Key Concepts #

Bad Tick: A trade or quote that's clearly inconsistent with surrounding market structure

Data Gap: Missing messages, missing time intervals, or incomplete sessions. Can come from dropped packets in multicast feeds, feed disruptions, or vendor stitching issues in historical data.

Erroneous Print: A real exchange-disseminated trade that's economically suspect

Feed Error: Pipeline problems between the exchange and your screen

Quality Mask: A metadata layer attached to each data point marking it as validated, suspect, or invalid. Professionals don't delete bad data

Stale Book: A feed that's technically connected but not updating. Your platform shows prices, but they're from 30 seconds ago. More dangerous than a disconnected feed because it looks normal.

Layered data validation pipeline showing six stages from raw capture to canonical dataset
Bad tick detection in ES futures showing outlier price spike flagged by volatility bounds

What Bad Data Actually Looks Like #

The NexusFi community has documented these problems extensively. As @Big Mike shared in his historical ES tick data thread, even carefully curated datasets can contain corruptions:

"Unfortunately, it seems my data for ES during date range of 2-17-2011 to 2-20-2011 is corrupted in such a way I can't export it. Have never seen this..."

Historical ES Tick Data from 2003 and up (@puppeye)

That's Big Mike

The Five Categories of Data Corruption #

1. Price anomalies. A trade prints at $100.50 when the best bid is $100.00 and best offer is $100.10. Or a tick jumps 200 points and snaps back in the next millisecond. These happen more than you'd think, especially during low-liquidity sessions. As @josh noted in the Spoo-nalysis thread discussing tick data issues:

"I accidentally re-downloaded my tick data, removing the 'bad ticks' that occur..."

Spoo-nalysis ES e-mini futures S&P 500 (@tigertrader)

One download produces different data than another. That's vendor normalization in action.

2. Sequence breaks. Most professional feeds carry sequence numbers. When seq[t] != seq[t-1]+1, you've got missing data. Your chart doesn't show a gap

3. Stale data. The feed is connected. The platform shows green. But the last update was 45 seconds ago while the market moved 10 ticks. Your limit order sits at a price the market blew through. Your stop-loss thinks it's safe when it's already breached. Stale data is more dangerous than missing data because missing data you can detect.

4. Vendor normalization errors. Wrong multipliers. Incorrect tick sizes. Misapplied symbol mapping after contract codes change. @hughesfleming raised this exact problem:

"IQFeed is unfiltered but it also does not do a great job of correcting bad data. I also have a second feed that is much better at cleaning historical data but is problematic in other areas."

inconsistent results from different data sources (@mitty)

Two vendors, same market, different data. One filters aggressively and may remove real prints. The other passes through everything including obvious garbage. Neither is "correct"

5. Exchange corrections. CME and ICE regularly issue trade corrections, canceled trades, and busted prints. If your system only processes "add trade" messages and ignores correction messages, your Last Traded Price will drift from reality. Your OHLC bars will contain trades that the exchange officially removed. Your indicators will compute on data the exchange itself says is wrong.

Why Futures Data Is Uniquely Vulnerable #

Futures markets create specific data integrity challenges that equity or forex traders don't face to the same degree:

Leverage amplifies everything. A bad tick in Apple stock might distort a signal by 0.1%. The same error in ES futures, with 50x leverage, can trigger a margin call, a stop-loss cascade, or an algo that fires into a phantom market. The stakes per data point are higher.

Microstructure dependency. Strategies based on order flow, time and sales, or book imbalance are directly consuming raw tick data. One corrupt tick doesn't just add noise

Contract roll complexity. As @Fat Tails explained in his complete analysis of continuous contracts:

"Data will gap up, if the market is contango

continuous contract in NT7 /merge policy / rollover (@christo64)

Continuous contract construction is itself a form of data transformation that can introduce errors. Wrong roll date, wrong roll rule, failure to account for volume/open interest transitions

"If you use unadjusted continuous contracts, gaps during rollovers will lead to inaccurate, meaningless results. Many people think they can profit from rollover gaps, and that is not true."

long term analysis of futures contracts (@camtrades)

Session structure. Futures trade across multiple sessions with different characteristics. Globex overnight is not the same as RTH. Auction opens and closes produce legitimate price jumps that look like bad ticks to naive filters. A price move that's anomalous at 2 AM is perfectly normal at 8:30 AM on NFP release.

How Professionals Detect Corrupt Data #

Professional shops don't rely on a single filter. They implement layered validation

Layer 1: Structural Integrity #

Track the transport layer before you even look at prices:

  • Sequence number monitoring. Every message from CME's MDP feed carries a sequence number. Professional systems check seq[t] = seq[t-1] + 1 on every update. A gap means dropped packets. A duplicate means reprocessed data. Either one corrupts downstream calculations.
  • Heartbeat detection. If no update arrives within the expected cadence (varies by instrument and session), the data is stale. Professional systems track time-since-last-update per instrument and trigger alerts when it exceeds thresholds.
  • Timestamp sanity. Compare exchange timestamps against arrival timestamps. Clock drift between exchange and your system creates ordering errors. Precision Time Protocol (PTP) hardware timestamps are the professional standard

Layer 2: Market Plausibility #

Hard rules based on contract specification facts:

  • Tick-size compliance. ES trades in 0.25 increments. A price of 5432.30 is impossible and means something is wrong in the pipeline.
  • Spread validity. Ask must be greater than or equal to bid. A persistently inverted book (ask < bid) that lasts more than a few microseconds is corrupt data.
  • Bounded price jumps. If a tick's price change exceeds N× the recent Average True Range without a corresponding volume spike, it gets flagged. The threshold is instrument-specific and session-specific
  • BBO consistency. Trade prints should occur at or near the prevailing bid/ask. A trade printing far outside the BBO without matching volume or event justification is suspect.

Layer 3: Regime Awareness #

The hardest part of tick filtering is knowing when an anomaly is real:

  • Session transitions. Open, close, auction periods, and halts produce legitimate price jumps that look like bad ticks to a dumb filter.
  • Event windows. CPI, NFP, FOMC, WASDE, inventory reports, OPEC headlines
  • Limit-up/limit-down. Exchange-imposed price bands change the valid range. Your filter needs to know the current day's limits.
  • Roll and expiry periods. Nearby expiry contracts exhibit delivery-driven behavior that looks like a spike but reflects real economic activity.

Layer 4: Cross-Source Validation #

When a tick looks suspicious, professionals compare against:

  • Secondary feed. A/B feed arbitration runs two independent data sources (e.g., Rithmic and CTS). If one deviates from the other, the system can identify which feed is corrupt.
  • Related instruments. If ES prints something crazy, check NQ, RTY, and SPX cash. If they're all calm, the ES print is probably bad.
  • Nearby contract months. Compare the front month against the next nearby. A 50-tick divergence that appears only in one contract is suspect.

As @aventeren found when checking ES data quality:

"I was noticing some large losses when backtesting... upon further inspection found some issues with the data."

Data Quality Check (ES) (@aventeren)

Cross-checking data against multiple sources is how you find these issues before they find your PnL.

The IQFeed Bad Tick Problem #

IQFeed is one of the most popular data providers in the NexusFi community, and their unfiltered data approach is a perfect illustration of the data quality trade-off. @phaser raised the issue directly:

"I've been trying out IQFeed data streaming via Python and Amibroker. As IQFeed gives unfiltered data..."

IQFeed: Suggestions to filter bad ticks (@phaser)

And @traderdavidt followed up with practical experience:

"I've also had this issue previously using IQFeed service as it is as you say, unfiltered and not cleaned."

IQFeed: Suggestions to filter bad ticks (@phaser)

IQFeed's philosophy is to pass through raw exchange data without filtering. That means you get the "truth" as the exchange saw it

There's no free lunch here. Filtered feeds can hide real market events. Unfiltered feeds require you to build your own quality layer. The professional approach: use unfiltered data but implement your own validation pipeline on top. You get the raw truth and the ability to decide what's real.

Filtered vs unfiltered data feed comparison for futures traders

Handling Bad Data: Label, Don't Delete #

Here's the single most important principle professional shops follow: don't silently delete suspect data. Label it.

The quarantine approach works like this:

  1. Flag the tick with a quality code and reason (price jump, sequence gap, stale, etc.)
  2. Hold in quarantine
  3. Confirm against a secondary source
  4. If unconfirmed, reject

Why not just delete? Because "delete on sight" hides information. That "bad tick" might be the first signal of a real market event. That "gap" might be a feed problem on your end, not the exchange's. And when you need to reconstruct what happened during a trading incident, you need the full record

For backtesting, this translates to quality masks: a metadata layer that marks each bar or tick as usable, suspect, or invalid. Your backtest engine computes results only on usable data, but you can see exactly how much data was excluded and test how sensitive your results are to those exclusions.

Quality mask showing data classified as valid, suspect, or invalid for backtesting

Gap Detection and Handling #

In Live Trading #

When a gap is detected (missing sequence numbers, stale book, feed disconnection):

  • Freeze dependent strategies. Don't trade on incomplete data.
  • Fall back to secondary feed if available.
  • Widen risk controls or reduce size. If the book is unreliable, aggressive execution is dangerous.
  • Log everything. When the gap resolves, you need to understand what you missed.

In Backtesting #

  • Never interpolate tick data to fill gaps. Interpolation invents microstructure that never existed. Your order-flow strategy will "see" liquidity, aggression patterns, and absorption events that are pure fiction.
  • Mark gap periods as invalid. Exclude them from performance calculations or mark them as low-confidence.
  • Reconstruct from alternative sources when possible. If one vendor has a gap, another vendor might have the data.

As @eddi0505 discovered when testing data from a provider that marketed data quality:

"The data provided is useless and the quality... I want to warn all NT trading participants."

Backtestdata.com quality of data (@eddi0505)

Not all data providers deliver what they promise. Verify independently.

Data gap detection showing missing sequence numbers recovered from secondary feed

Backtest-Specific Integrity Failures #

The five biggest ways bad data corrupts backtests:

1. Silent bad ticks in history. A single erroneous print can blow out a volatility estimate, create a fake breakout signal, and corrupt the PnL distribution of your strategy. You won't see it unless you explicitly look. The fix: run outlier detection on your historical data before backtesting. Flag and review the top 0.1% of price moves.

2. Bad roll construction. Wrong roll date. Wrong roll rule. Failure to account for the volume/OI transition from front month to next. As @FuturesTrader71 explained:

"Non-adjusted data is always provided for backtesting purposes. For example, for our automated trading systems, we test on unadjusted tick-level bid/ask data."

Back-adjusted, Continuous contracts - best for support and resistance? (@Big Mike)

The choice between adjusted and unadjusted continuous contracts is itself a data integrity decision. Each produces different backtest results, and using the wrong one for your strategy type will give you misleading performance numbers.

3. Vendor normalization errors. Wrong multipliers, incorrect tick sizes, misapplied symbol mapping. These are subtle and dangerous because the data looks normal until you calculate PnL and get impossible numbers.

4. Hidden interpolation. Some vendors silently fill gaps by interpolating between known prices. This creates false liquidity impressions and invents microstructure events that never happened. If your strategy trades on order flow or time and sales, interpolated data will produce fantasy results.

5. Survivorship bias in contracts. Only using contracts with complete data and ignoring expired or delisted contracts creates unrealistic results. This is especially relevant in commodity futures where contract specifications occasionally change.

Building a Practical Data Validation Pipeline #

For traders building their own systems, here's a practical architecture:

Stage 1: Raw Capture #

Store the raw feed with original timestamps and sequence numbers. Don't transform anything. This is your audit trail and your ability to rebuild everything when you discover an issue.

Stage 2: Structural Validation #

  • Check sequence continuity
  • Detect duplicates
  • Verify timestamp ordering
  • Flag missing intervals

Stage 3: Market Plausibility #

  • Tick-size compliance
  • Spread validity (no inverted books)
  • Bounded price jumps (instrument-specific thresholds)
  • Trade-vs-BBO consistency

Stage 4: Contextual Validation #

  • Session-aware thresholds
  • Event-window detection
  • Cross-instrument comparison
  • Liquidity-context assessment

Stage 5: Quality Labeling #

  • Apply quality codes to every data point
  • Route suspect data to quarantine
  • Confirm or reject after secondary validation
  • Build quality mask for backtesting

Stage 6: Canonical Dataset #

  • Cleaned, versioned, auditable data for production use
  • Separate storage of raw, validated, and bar-aggregated data

Operational Metrics Worth Tracking #

Professional shops monitor these per instrument, per session, per venue:

Metric What It Catches Alert Threshold

|

Message loss rate Dropped packets, network issues >0.01% per session
Duplicate message rate Reprocessing errors >0.001%
Outlier tick count Bad ticks, erroneous prints >5 per instrument/hour
Feed latency (p99) Infrastructure degradation >50ms for colocated, >500ms for retail
Stale quote duration Frozen feeds >2 seconds during RTH
Vendor discrepancy rate Normalization differences Any persistent divergence
Data quality monitoring dashboard with six operational metrics and alert system

Exchange-Specific Considerations #

CME (Globex) #

Generally strong but not bulletproof. High message volume during macro events (FOMC, NFP) can produce packet loss if your infrastructure can't keep up. Sequence handling on MDP feeds is critical. Contract roll and session calendars are the most common integrity failure points. Settlement data may differ from intraday trade data

ICE #

Thinner markets in energy and softs mean outlier-looking prints may be legitimate. Off-hours liquidity can be sparse. Correction/canceled trade message processing is critical

Other Venues #

Eurex, LME, and regional venues have different auction mechanisms, session structures, and correction semantics. The biggest variable across all exchanges is vendor normalization

The Kill Switch: When Data Quality Degrades in Live Trading #

Here's the control that matters most: if data quality is uncertain, reduce risk or stop trading.

Implement these safeguards:

  • Pre-trade validation. Before every order: verify market data freshness, confirm bid/ask is valid and recent, reject order logic if the feed is stale or anomalous.
  • Circuit breaker. If >X% of incoming ticks are flagged erroneous within a rolling 1-minute window, halt automated trading and alert the operator. The threshold depends on the strategy
  • Stale-book detection. If top-of-book hasn't updated beyond your strategy-specific threshold, freeze signals. Don't let your algo trade on prices from 30 seconds ago.
  • Feed redundancy. Run primary and secondary feeds from independent sources. Monitor both for sequence gaps and latency. Automatic failover when one feed degrades.

As @Big Mike noted when discussing data reliability:

"You should talk to Eric @ Nanex. It's the same engine that IQFeed (and thus Kinetick) are using, but it is put together in a different way."

Data reliability and backtesting NT and IB (@spiv1)

The engine matters, but so does how you use it. Same underlying data, different quality outcomes depending on your validation layer.

Kill switch decision flow for live trading data quality validation

Bottom Line #

Data quality in futures isn't a one-time cleanup job. It's an ongoing operational discipline

The professional approach:

  1. Store raw data separately from validated data. You need the ability to rebuild when (not if) you discover issues.
  2. Implement layered validation. No single filter catches everything. Use structural checks, market plausibility rules, regime awareness, and cross-source confirmation together.
  3. Label, don't delete. Quality masks preserve the ability to audit, investigate, and test sensitivity to your cleaning assumptions.
  4. Know your vendor's trade-offs. Filtered feeds hide real events. Unfiltered feeds include garbage. Choose the trade-off that matches your strategy's sensitivity, and build your own validation on top.
  5. Build kill switches for live trading. When data quality degrades, the correct response is to reduce risk

The most dangerous failure in data integrity is silent corruption: data that looks plausible enough to pass simple checks but still biases your research or execution. The only defense is a systematic, layered, always-on validation pipeline

Related NexusFi Academy Articles:

Citations

  1. @Big MikeHistorical ES Tick Data from 2003 and up (2011) 👍 7
    “Unfortunately, it seems my data for ES during date range of 2-17-2011 to 2-20-2011 is corrupted in such a way I can't export it. Have never seen this before. So I am going to have to post the file in two parts with three missing days of data, sorry.”
  2. @joshSpoo-nalysis ES e-mini futures S&amp;P 500 (2014) 👍 9
    “Latest in the cumulative tick saga, and I'll leave it at this -- I accidentally re-downloaded my tick data, removing the "bad ticks" that occur during July 3, thanksgiving, and christmas eve every year, and notice the result: https://nexusfi.”
  3. @hughesfleminginconsistent results from different data sources (2018)
    “This is a common problem. It also depends on how well your data source filters bad ticks. Multicharts does not deal with this on its own. IQFeed is unfiltered but it also does not do a great job of correcting bad data.”
  4. @Fat Tailscontinuous contract in NT7 /merge policy / rollover (2011) 👍 31
    “First of all there are several ways of creating continuous or merged futures. This document gives an overview of the options. https://nexusfi.com/free_downloads/educational_manuals_ebooks_videos/628-download.”
  5. @kevinkdoglong term analysis of futures contracts (2022) 👍 2
    “If you are backtesting, the choice of what type of continuous contract to use is HUGE. 2 examples: If you use unadjusted continuous contracts, gaps during rollovers will lead to inaccurate, meaningless results.”
  6. @aventerenData Quality Check (ES) (2014) 👍 2
    “Howdy-- So I've cobbled together ES data using this thread and the QCollector thread, but upon backtesting it I was noticing some large losses when there shouldn't have been (I was using a fixed stop).”
  7. @phaserIQFeed: Suggestions to filter bad ticks (2019) 👍 1
    “I've been trying out IQFeed data streaming via Python and Amibroker. As IQFeed gives unfiltered data (From IQFeed's website: "...IQFeed provides a TRUE, tick-by-tick datafeed. IQFeed feed is completely unfiltered, allowing you to see EVERY TRADE...”
  8. @traderdavidtIQFeed: Suggestions to filter bad ticks (2020)
    “I've also had this issue previously using IQFeed service as it is as you say, unfiltered and not cleaned.”
  9. @eddi0505Backtestdata.com quality of data (2016) 👍 1
    “Dear Fellows, As there is almost no serious feedback on the historic tick data sold by several third party providers on the forum I want to share my experience with one of those: Backtestdata.”
  10. @FuturesTrader71Back-adjusted, Continuous contracts - best for support and resistance? (2012) 👍 7
    “Hi Mike, If you are using DTN IQFeed, to get the continuous, back-adjusted data for roll-over, you would have to use @ES#C. IRT/MarketDelta assume automatically that you are using the continuous, so you don't need to enter more than just @ES#.”
  11. @Big MikeData reliability and backtesting NT and IB (2013) 👍 1
    “You should talk to Eric @ Nanex. Welcome to Nanex.net. It's the same engine that IQFeed (and thus Kinetick) are using, but it is put together in a far more robust package for developers. You get a data dump of the entire market on demand.”
  12. @Big MikeHistorical ES Tick Data from 2003 and up (2012) 👍 20
    “OK, so I have fixed this bug (all the details here): https://nexusfi.com/elite-circle/21664-using-mysql-storing-tick-data-7.html#post281576 And now here is the re-posted data.”

Help Improve This Article

NexusFi Elite Members can help keep Academy articles accurate and comprehensive.

Unlock the Full NexusFi Academy

715 in-depth articles across 17 categories — written by traders, backed by community research. Includes knowledge maps, citations with community excerpts, and the ability to help improve articles.

We add approximately 302 new Academy articles every month and update approximately 607 with fresh content to keep them highly relevant.

Strategies (78)
  • Volume Profile Trading
  • Order Flow Analysis
  • plus 76 more
Market Structure (38)
  • Initial Balance: The First Hour That Defines Your Entire Trading Day
  • Opening Range: Why the First 15 Minutes Define Your Entire Trading Session
  • plus 36 more
Concepts (38)
  • Futures Order Types: Market, Limit, Stop, and Conditional Orders
  • Renko Charts and Range Bars for Futures Trading: The Complete Guide
  • plus 36 more
Exchanges (38)
  • Futures Exchanges: Understanding Where and How Futures Trade
  • plus 36 more
Indicators (47)
  • Delta Analysis & Cumulative Volume Delta (CVD)
  • Market Internals: Reading the Broad Market to Trade Index Futures
  • plus 45 more
Instruments (39)
  • Micro E-mini Futures (MES, MNQ, MYM, M2K): The Complete Guide to CME Fractional-Sized Contracts
  • E-mini Nasdaq-100 (NQ) Futures: The Complete Trading Guide
  • plus 37 more
+ 11 More Categories
715 articles total across 17 categories
Risk Management (38) • Automation (38) • Data (38) • Prop Firms (38) • Platforms (52) • Psychology (39) • Brokers (40) • Prediction Markets (39) • Regulation (38) • Cryptocurrency (39) • Infrastructure (38)
Become an Elite Member


© 2026 NexusFi®, s.a., All Rights Reserved.
Av Ricardo J. Alfaro, Century Tower, Panama City, Panama, Ph: +507 833-9432 (Panama and Intl), +1 888-312-3001 (USA and Canada)
All information is for educational use only and is not investment advice. There is a substantial risk of loss in trading commodity futures, stocks, options and foreign exchange products. Past performance is not indicative of future results.
About Us - Contact Us - Site Rules, Acceptable Use, and Terms and Conditions - Downloads - Top