Databento for Futures Traders: API-First Market Data, Historical Tick Data, and the End of the Bloomberg Lock-In
If you've tried to get institutional-quality tick data for ES, NQ, or crude oil before Databento existed, you know the drill. You either paid $24,000 a year for Bloomberg, spent weeks negotiating enterprise contracts with CQG or Refinitiv, settled for the 180-day IQFeed history window, or cobbled together data from wherever you could find it and hoped it was clean. None of those options were good. Most retail systematic traders just backtested on garbage data and wondered why their live results diverged.
Databento changed that equation in April 2023 when it opened public access to institutional-grade CME futures data at transparent, usage-based pricing. The pitch is simple: sign up in five minutes, get $125 in free data credits, pull tick data or full order book history via a three-line Python snippet, and pay only for what you download. No sales calls, no annual contracts, no minimum commitments.
That's the marketing version. This article is the trading community version — what Databento actually provides, how it compares to the alternatives you already know, where it genuinely wins, where it doesn't, and exactly how a systematic futures trader uses it in practice.
Overview #
Databento is an API-native market data platform that gave retail systematic traders access to institutional-grade CME futures tick data at transparent, usage-based pricing. Before Databento opened public access in April 2023, getting accurate ES or NQ tick data required Bloomberg ($24k/year), enterprise CQG contracts, or accepting the limitations of IQFeed's 180-day history window.
This article covers what Databento actually provides, how the data schema stack works from OHLCV bars to full MBO order-by-order data, how pricing works (and where the break-even sits between pay-per-GB and flat subscriptions), why the DBN binary format matters for backtest-to-live parity, how to use the Python API, what 15 years of crisis event history enables, what MBO order book reconstruction actually makes possible, and how to build a complete workflow from signup to live strategy deployment.
Prerequisites: Basic familiarity with futures contracts and Python. Familiarity with tick data concepts helpful but not required.
What Databento Actually Is #
Databento is an API-first market data platform founded in 2019 by Christina Qi and Luca Lin. Qi previously co-founded Domeyard LP, a high-frequency trading firm that traded up to $7.1 billion per day. The technical team came from Two Sigma, Tower Research, Citadel, Flow Traders, Virtu, Bloomberg, and Facebook infrastructure engineering. They raised $41.8 million in total funding.
The company exists because its founders directly experienced the problem they're solving. Building Domeyard required institutional-grade tick data for HFT research. Getting it meant enterprise contracts, multi-week procurement processes, opaque pricing, and vendor lock-in. Databento was built to eliminate those barriers for everyone.
The core products are:
Historical Data API: Pull years of data via REST API in Python, C++, or Rust. Download tick-by-tick trades, full order book snapshots, OHLCV bars, or anything in between. Billed per gigabyte of data consumed, or flat-rate subscription for heavy users.
Live Streaming API: Subscribe to real-time CME data with microsecond latency from co-located servers in Aurora I (CME's own data center). The live API uses the same DBN binary format and the same Python library as historical data — meaning your backtest code runs in production without modification.
Reference Data: Security master, instrument definitions (point-in-time, no look-ahead bias), corporate actions.
The thing that distinguishes Databento from most vendors isn't just pricing — it's the architecture. Most market data providers built their APIs around display terminals and retrofitted programmatic access. Databento built API-first, which means the developer experience is genuinely good instead of tolerated.
The Data Schema Stack: From OHLCV to Full Order Book #
Databento supports 15+ data schemas organized by granularity. This matters because most data vendors give you a choice between "cheap and limited" or "enterprise and complete." Databento puts the whole stack on a single pay-per-GB meter.
OHLCV (ohlcv-1s, ohlcv-1m, ohlcv-1h, ohlcv-1d): Open/high/low/close/volume at your chosen resolution. Derived from trades data. The cheapest and smallest format — good for daily strategy research, not sufficient for intraday microstructure work. ES front-month OHLCV for a full year is basically free.
Trades (trades): Every individual trade, tick by tick. Price, size, timestamp, aggressor side (buy or sell). This is the time-and-sales tape. About 2 GB per month for ES, depending on activity. The right schema for most systematic strategy backtesting where execution detail matters but order book reconstruction doesn't.
TBBO (tbbo): Trade Best Bid/Offer — every trade event paired with the best bid and ask immediately before execution. Think of it as trades data with the BBO context attached. This is what Elite member @Hulk used when he needed to analyze Bitcoin futures options: "For what you want, your best choice is Databento IMO. Go to databento.com and purchase the MBO schema for the specific dates you want. It will cost you < $1 for one day" — the best choice for option BTC historical data.
MBP-1 (mbp-1): Every event that updates the best bid/offer. Includes trades and book depth changes at the top level. More granular than trades, less than full depth.
MBP-10 (mbp-10): Top 10 price levels on both sides of the book, updated on every change. What most people mean when they say "Level 2 market depth." About 25 GB per month for ES.
MBO (mbo): Market By Order — the full L3 data. Every individual order lifecycle: add, cancel, modify, fill. Keyed by unique order ID so you can reconstruct the complete order lifecycle. About 50 GB per month for ES front-month alone. The most expensive schema and the most powerful.
As Elite member
That thread, started by @artemiso in 2019, is worth reading in full if you're serious about microstructure — it's the clearest explanation of what MBO granularity actually enables that I've seen in the trading community.
The key point about Databento's schema architecture: all schemas for a given dataset are derived from the same underlying source feed. Databento captures at the most granular level available (MBO for CME) and derives everything else from it. This means your OHLCV bars and your MBP-10 depth are perfectly consistent with each other, because they came from the same source data.
The Datasets: What's Covered #
CME Globex (GLBX.MDP3) is the flagship dataset. It covers everything traded on CME Group: ES, NQ, RTY, YM (equity index futures), CL, NG, HO, RB (energy), GC, SI, HG (metals), ZN, ZB, SR3 (rates), 6E, 6B, 6J, 6C (currencies), ZC, ZS, ZW, LE, HE (agriculture) — 650,000+ symbols total including options. History goes back to 2010, with full MBO granularity available from May 2017 when CME introduced MDP 3.0. Before that date, the highest granularity is MBP-10.
ICE Futures US covers soft commodities: cotton (CT), coffee (KC), cocoa (CC), sugar (SB).
Eurex covers European equity index and rates futures: FDAX (DAX), FESX (Euro Stoxx 50), FGBL (Bund), FGBM (Bobl).
EEX covers European power and gas: electricity futures, natural gas, emissions allowances (EUA CO₂).
ICE Europe Commodities: Brent crude (BRN), gas oil (G), UK natural gas (NBP).
Cboe CFE: VIX futures (VX), mini VIX. Added April 2026.
US Equities: Nasdaq TotalView, OPRA (equity options), NYSE, BATS, IEX, and aggregated across 15+ equity venues.
For a CME-focused futures trader, GLBX.MDP3 is what you need. The coverage is deep and the history goes back to 2010.
Pricing: The Usage-Based Math #
The pricing model is usage-based for historical data (pay per GB consumed) and flat-rate subscription for live streaming.
Historical data (pay-as-you-go): Starting from roughly $0.50-$2/GB depending on schema and dataset. The exact rate varies — use the cost estimation API before large downloads. Historical data billing is charged once per download, and your downloaded file is accessible for 30 days without re-billing.
$125 in free data credits on signup (expires 6 months, one per team). What that actually buys for ES:
- Full CME venue, MBO schema: approximately 1 day of data
- Full CME venue, Trades schema: approximately 2 months of data
- ES only, MBO schema: approximately 14 months of data
- ES only, Trades schema: approximately 16 months of data
For a retail systematic trader doing ES strategy research on tick data, the free credits alone cover months of backtesting. ES is a thin slice of the full CME firehose.
Standard subscription (~$179/month): Expands historical data coverage for MBP-1, TBBO, and Trades schemas from 1 month to 15+ years. Live data access included (requires exchange license fee on top). Two devices, personal use license.
Break-even math: At roughly $2/GB blended rate, the subscription wins when you're consuming more than ~90GB per month. A retail trader doing focused ES research might consume 5-50GB monthly, meaning pay-as-you-go often wins much.
Exchange data fees for live data (passed through at cost, no markup): CME non-professional: $32.65/month. CME professional non-display use: $740/month. These are exchange-mandated fees that every provider charges — Databento just shows you the exact number.
— a practical point: you can download the data through a web interface and get flat CSV files without writing a single line of code if you just need a quick data pull. See the full comparison discussion.
DBN Format: Why One Binary Format Matters #
Databento Binary Encoding (DBN) is the native data format. It's self-describing (you can introspect a file without external schema), zero-copy (data structure is identical in memory, on wire, and on disk), and used identically for both historical and live streaming.
The practical consequence of this is significant: the same Python code that reads historical data works for live data with one change. You swap db.Historical() for db.Live(). That's it.
# Backtest: fetch 3 months of ES tick trades
client = db.Historical(key="YOUR_API_KEY")
df = client.timeseries.get_range(
dataset="GLBX.MDP3",
schema="trades",
symbols="ES.v.0", # continuous front-month
stype_in="continuous",
start="2024-01-01",
end="2024-03-31",
).to_df()
# Go live: same three parameters, one function change
live_client = db.Live(key="YOUR_API_KEY")
live_client.subscribe(
dataset="GLBX.MDP3",
schema="trades",
symbols="ES.v.0",
stype_in="continuous",
)
The dual-codebase problem — where your backtesting data infrastructure is completely different from your live trading data infrastructure — has been the source of enormous amounts of subtle bugs in systematic trading systems. DBN's design solves it structurally.
For data at rest, Databento provides dbn-cli, a command-line tool that converts DBN files to CSV or JSON. Pandas and numpy support via .to_df() and .to_ndarray() methods. No specialized database or infrastructure required for most research workflows.
The Python Library: 3 Lines to a DataFrame #
Install: pip install databento. Requires Python 3.10+. The library pulls in pandas, numpy, pyarrow, aiohttp, and zstandard as dependencies.
Core historical workflow:
import databento as db
client = db.Historical("YOUR_API_KEY")
# Check cost before downloading
cost = client.metadata.get_cost(
dataset="GLBX.MDP3",
symbols="ES.v.0",
schema="trades",
stype_in="continuous",
start="2024-01-01",
end="2024-06-30",
)
print(f"Estimated cost: ${cost:.2f}")
# Download and convert to DataFrame
data = client.timeseries.get_range(
dataset="GLBX.MDP3",
symbols="ES.v.0",
schema="trades",
stype_in="continuous",
start="2024-01-01",
end="2024-06-30",
)
df = data.to_df()
# df columns: ts_recv, symbol, price, size, side, action, flags
Before pulling large datasets, always run metadata.get_cost() first. An MBO pull for a full year of ES runs 50+ GB — you want to know the number before you commit. The $125 in signup credits goes faster than expected on MBO data.
The metadata.get_cost() call is important — before pulling large historical datasets, always estimate. An MBO pull for a full year of ES is going to be expensive. Know the number before you commit.
Batch download for large jobs:
# Submit a batch job -- better for large date ranges
job = client.batch.submit_job(
dataset="GLBX.MDP3",
symbols=["ES.v.0", "NQ.v.0"],
schema="ohlcv-1m",
stype_in="continuous",
start="2020-01-01",
end="2024-12-31",
encoding="dbn",
)
Multi-product basket:
# Get correlated index futures together -- single API call
data = client.timeseries.get_range(
dataset="GLBX.MDP3",
symbols=["ES.v.0", "NQ.v.0", "RTY.v.0", "YM.v.0"],
schema="ohlcv-1m",
stype_in="continuous",
start="2023-01-01",
end="2024-12-31",
)
df = data.to_df()
# All four index futures in one DataFrame
Continuous contract symbology:
The stype_in="continuous" parameter maps symbols like ES.v.0 to the actual front-month contract at each date. ES.v.0 = front month, ES.v.1 = second month, etc. This handles the rollover logic automatically.
Important: Databento provides unadjusted prices for continuous contracts. The gap that occurs when the front month rolls appears as-is. If your research requires back-adjusted data (Panama Canal method, proportional adjustment, or any other), you compute it yourself. Databento gives you raw exchange prices, which is better for certain research — you know exactly what the market saw and you control the adjustment logic.
Historical Coverage: 15 Years Through Crisis Events #
The argument for deep historical data in systematic trading isn't academic — it's about whether your strategy has actually been stress-tested. A backtest over the 2019-2024 bull market tells you almost nothing about robustness. You need to see how a strategy behaves in the 2020 COVID crash (-35% in 23 trading days), the 2022 rate shock (-27% over the full year), the 2010 Flash Crash (ES dropped 60 handles in 35 minutes), and the 2011 debt ceiling crisis (-18% drawdown).
Databento provides:
- CME Globex tick data from June 2010 forward
- MBO (L3) data from May 2017 forward (when CME introduced MDP 3.0)
- Nanosecond timestamps from November 2015 forward
- Millisecond timestamps before November 2015
For the 2010-2017 period, the highest granularity available is MBP-10 (top 10 levels). For most systematic trading research, trades data going back to 2010 is sufficient. Full order book reconstruction requires the 2017-present MBO data.
This depth matters for another reason: roll cycles. ES changes front-month every three months. A one-year backtest covers only four roll cycles. A 10-year backtest covers 40 — enough to actually understand whether your execution around roll dates has statistical validity.
[IMAGE: fb2ad0f9-0265-4be1-ad63-838af3a6c929 | MBO Order Lifecycle showing add/modify/cancel/fill events with unique order IDs]
MBO Order Book Reconstruction: What's Actually Possible #
This is where Databento separates from every retail data provider. The MBO schema gives you every individual order event: order added, order cancelled, order modified, order filled. Each event is keyed by order ID so you can reconstruct the complete order lifecycle.
Why this matters for trading research:
Queue position modeling: In a FIFO market like CME, the fill probability of your limit order depends entirely on where you are in the queue at each price level. If you have the full MBO history, you can determine exactly where a simulated limit order would have been placed in the queue and model fill probability realistically. This is impossible with MBP data, which only shows aggregate size at each level, not individual order position.
Cancellation flow analysis: Large traders cancel orders strategically. Seeing whether cancels are coming from the front or back of the queue — something only MBO allows — can reveal information about large trader intent that's invisible in aggregated depth data.
Execution quality benchmarking: As @artemiso described in the MBO data thread, MBO data with co-location timestamps lets you "find your own orders in the data and figure out your execution latency" and "benchmark the integrity of your data feeds." That capability now lives in a self-service product.
Databento publishes daily MBO snapshots at 00:00:00 UTC that contain the full order book state sorted in FIFO priority order. This is the starting point for intraday reconstruction — you load the snapshot, then apply the day's incremental MBO events.
For anyone building execution quality monitoring, order flow strategies, or statistical microstructure models, this is the data layer that makes serious work possible without a Bloomberg contract.
API-Native vs Legacy: What Actually Changed #
The honest comparison against alternatives you already know:
vs DTN IQFeed: IQFeed has 180 days of tick history, no order book depth (no L2 or MBO), and integrates natively with NinjaTrader, Sierra Chart, MultiCharts. If you're a discretionary trader who uses charting platforms, IQFeed is plug-and-play. If you're building systematic strategies and need years of tick data for research, IQFeed's history limit is the constraint. Databento wins on data depth; IQFeed wins on platform integration for discretionary traders.
vs Rithmic: Rithmic's primary strength is execution latency and live order routing. It has MBO support for live data but minimal historical data. It's not a historical data research platform — it's a routing/feed infrastructure for active traders. Use Rithmic for live execution at your broker, use Databento for research data.
vs CQG: CQG is a full institutional platform — analytics, execution, data, co-located algorithmic trading engine. It has 50+ years of agricultural history and serves hedge funds and prop firms. It's also expensive, requires sales negotiation, and is designed for organizations with institutional IT budgets. Databento gives you comparable futures data at a fraction of the complexity and cost for independent developers and small teams.
vs Bloomberg: Bloomberg is $24,000+ per seat per year and is a full-service institutional intelligence platform — not a data API. The Bloomberg BLPAPI doesn't provide bulk historical tick data export for programmatic backtesting — it's fundamentally a display terminal. Databento is what you use when you need the data, not the terminal.
vs Refinitiv/LSEG: Strong in FX, fixed income, and fundamental data. Refinitiv's tick history product (DataScope) exists but isn't designed for MBO-level granularity. Databento wins narrowly for CME futures tick data depth and developer experience.
vs Quandl/Nasdaq Data Link: Excellent EOD data, alternative datasets, Commitment of Traders, fundamental. Good for daily horizon research. No intraday tick data or order book depth. Different use case entirely.
The matrix is simple: if you need tick data or order book depth for CME futures backtesting, and you want it in a form that's programmable without a six-figure budget, Databento is the answer.
Exchange Coverage: The Unified API Advantage #
The less obvious advantage of Databento's architecture is cross-venue coverage. If you're running a spread strategy between CME ES and Eurex FDAX, or between CME energy and ICE Brent crude, or if you're doing macro research that requires US rates AND European rates AND commodity data — you previously needed multiple separate vendor relationships with different authentication systems, different data schemas, different parsing code, and different support channels.
Databento covers 45+ venues with a single API key, single Python library, and single normalized DBN schema:
- CME Globex: 650,000+ symbols, the entire CME Group product set
- ICE Futures US: Soft commodities (cotton, coffee, cocoa, sugar, orange juice)
- ICE Europe Commodities: Energy benchmarks (Brent crude, gas oil, UK natural gas)
- EEX: European power, natural gas, CO₂ emissions allowances
- Eurex: European equity index futures (DAX, Euro Stoxx 50), rates (Bund, Bobl, Schatz)
- Cboe CFE: VIX futures (added April 2026)
Cross-asset research that required 3-5 separate vendor contracts is now a single API key.
[IMAGE: 7627883b-71ba-41aa-9b38-9555441a982a | Futures data vendor cost comparison: Bloomberg vs Refinitiv vs IQFeed vs Databento with MBO and history availability]
Monthly Cost Model: Who This Is For #
The pricing structure works out differently at different consumption levels:
Weekend researcher / individual quant: If you're pulling 5-50GB of historical tick data per month for ES-focused strategy research, pay-as-you-go costs you $10-$100 per month. The $125 in signup credits gets you started. Total annual cost might be $100-500 depending on how much you download.
Retail systematic trader with active pipeline: 50-100GB/month. Pay-as-you-go runs $100-$200. This is the break-even zone — at 90GB/month, the Standard subscription ($179/month) becomes competitive. The subscription also unlocks 15+ years of Trades/MBP-1 history.
Semi-pro / small fund: 100GB+ per month. Subscription is clearly more cost-effective. Plus plan with annual contract is better suited here.
Institutional / HFT research: Large MBO data consumption, dedicated connectivity needs, commercial redistribution rights. The Unlimited plan or enterprise arrangement.
Most NexusFi members doing systematic trading fall in the first two categories. The economics are favorable compared to what was previously available.
[IMAGE: 3204416b-bb34-4fb2-b6eb-366fb34529b8 | Two-column comparison: what Databento does vs does not provide]
Real Limitations: What Databento Doesn't Do #
Honest assessment of where Databento doesn't fit:
No charting platform: Databento has no GUI, no charts, no visual interface. It's a data API. You plug it into your own code, a Jupyter notebook, or a framework like NautilusTrader. If you want to point-and-click your way through data visualization, you need a separate tool (TradingView, NinjaTrader, Sierra Chart, etc.).
Not a broker or execution venue: Data only. You still need a separate broker account for actual trading. Databento doesn't route orders.
API programming required: Unlike IQFeed or Rithmic, which integrate with a checkbox into retail platforms, Databento requires you to write code. Python 3.10+ minimum. If you're not comfortable with Python or another supported language, the platform isn't accessible without a developer.
Live data requires exchange license: Real-time streaming isn't plug-and-play on top of subscription. You need to go through Databento's licensing flow. The CME non-professional license is $32.65/month on top of any subscription — legitimate and reasonable, but additional complexity.
MBO history requires Unlimited plan: Full L3 order book history for more than the last 1 month requires the top-tier plan. The Standard plan gives 1 month of MBO depth, which is fine for recent execution quality analysis but not for long-term microstructure research.
Pre-2017 CME data caps at MBP-10: Before CME switched to MDP 3.0 in May 2017, full order-by-order data doesn't exist. For a 2010 crisis-period backtest, you're working with L2 depth, not L3.
No back-adjusted continuous contracts: You get unadjusted roll gaps. Compute your own adjustments, or accept that your backtest uses exchange prices as they actually occurred.
[IMAGE: 36681591-1ddd-419a-9225-1d843b8294a7 | Algorithmic framework integration showing DBN format connecting Historical and Live APIs to NautilusTrader, Tickblaze, Python, and C++]
Integration with NautilusTrader #
NautilusTrader has an official Databento adapter that maps DBN schemas directly to NautilusTrader's native objects:
mboschema →OrderBookDeltaeventsmbp-1schema →QuoteTickeventstradesschema →TradeTickeventsmbp-10schema →OrderBookDepth10eventsohlcv-*schemas →BarTypeobjects
The adapter handles both historical simulation and live trading, using the same DBN format for both. For NautilusTrader users, this is the most direct path from data to backtest to live strategy.
Tickblaze (another systematic trading framework) also has a formal Databento partnership announced August 2025. OpenBB integration exists for CME futures analysis.
[IMAGE: dc86ea49-8879-42f0-93d3-7cda0921c7ba | Six-step workflow from Databento signup through signal research to live strategy deployment]
The Workflow: From Signup to Live Strategy #
Here's the practical path from zero to data in your backtest:
1. Sign up: databento.com, get $125 credits. No credit card required initially.
2. Check cost before pulling: Always run metadata.get_cost() before large downloads. You don't want surprises on MBO pulls.
3. Start with trades: Most backtesting workflows work perfectly on the Trades schema. It's cheap, fast to download, and sufficient for execution-focused research. Pull a year of ES tick trades (~16 months for $125 in credits) and validate your signal research framework before upgrading to MBP or MBO.
4. Event-driven replay: For accurate backtest fills, use data.replay(callback) instead of loading everything into a DataFrame. Replay processes events in exchange order, which matters for tick-by-tick strategies.
class MyStrategy:
def on_trade(self, record):
price = record.price / 1e9 # Databento stores prices as int64 × 1e-9
size = record.size
side = record.side # 'A' = ask/sell, 'B' = bid/buy
ts = record.ts_event # nanoseconds since epoch
# Your strategy logic here
strategy = MyStrategy()
data.replay(strategy.on_trade)
5. Upgrade to MBP-1 for execution modeling: Once your signal research is validated on trades data, upgrade to MBP-1 or TBBO to model slippage against the BBO at execution time.
6. MBO only when you need queue position: Full order book reconstruction is expensive in both data cost and processing. Most systematic strategies don't need it. Reserve MBO analysis for execution quality research, market making research, or strategies that explicitly model queue dynamics.
7. Live deployment:
When you're ready, swap db.Historical() for db.Live(), add the exchange license ($32.65/month for CME non-pro), and your existing backtest code handles live data.
The Broader Context #
The data access gap in futures trading has historically been one of the strongest moats for institutional trading firms. They had access to high-quality tick data, full order book history, and co-location infrastructure. Retail systematic traders were working with whatever IQFeed's 180-day window provided, or paying for Kinetick data, or using broker-supplied data of questionable integrity.
Databento didn't create this market — it formalized and commoditized what research teams at HFT firms already had. The founders understood this because they came from HFT and built those systems themselves. What's genuinely new is the economics: usage-based pricing, self-service licensing, and a developer-first API that makes institutional-grade data accessible to individual quants building their own systems.
Whether you're stress-testing a mean-reversion strategy on 10 years of ES trades, modeling queue position for an execution algorithm, doing cross-asset spread research between CME and Eurex, or just trying to get accurate tick data for the 2020 crash, Databento is the most direct path to that data that exists today.
The $125 in free credits means there's basically no cost to verify whether the data quality and API fit your workflow. For anyone doing serious systematic futures research, it's worth the 15 minutes to find out.
Knowledge Map
Go Deeper
Build on this knowledgeReferences This Article
Articles that build on this topicCitations
- — am looking for historical data option BTC (2023) 👍 2
- — am looking for historical data option BTC (2023) 👍 3
- — Historical Tick Data. (2024) 👍 2
- — Historical market depth and MBO data: Assess your latency, data and execution quality (2019) 👍 5
- — Historical market depth and MBO data: Assess your latency, data and execution quality (2019) 👍 16
- — Databento Historical API Reference
- — Databento Pricing
- — Machine Learning Journal (2025) 👍 2
- — High Frequency Trading Adventures with Rithmic's R API (2026) 👍 3
- — GEX / VEX Use and Calculation (2024) 👍 1
