Overfitting and Curve-Fitting in Futures Strategy Development: Detecting, Preventing, and Building Systems That Survive Live Markets

Version 3 · June 11, 2026 · Automation · 11 citations

Looking for NinjaTrader pricing, features, reviews, and community ratings? Visit the directory listing.

Looking for DTN IQFeed pricing, features, reviews, and community ratings? Visit the directory listing.

Subtitle: How to Detect, Prevent, and Build Automated Systems That Actually Survive Live Markets

Overview #

Here's the most dangerous moment in algo trading: you've spent three weeks building a strategy on ES or NQ. You've run it through backtests. The equity curve is gorgeous — steady climb, low drawdown, Sharpe north of 2.0. You're convinced you've found something real. You go live.

Then it falls apart. Immediately, systematically, and completely.

This is overfitting — the silent killer of automated trading systems. The strategy failed because you built it too well for the wrong thing. Instead of capturing a real, repeatable edge in the market, you captured the noise embedded in your specific historical dataset. The backtest was spectacular because the data was molded to fit the strategy, not the other way around.

Overfitting is endemic in retail algo development. The tools that make backtesting easy — optimization engines, parameter scanning, automated search — are also the tools that make overfitting nearly inevitable if you don't know what you're doing. Almost every spectacular backtest from a new developer is overfit. The math practically guarantees it.

This article breaks down exactly how overfitting happens, how to detect it before you risk capital, and how to build systems strong enough to survive the transition from backtest to live trading. By the end, you'll have a concrete checklist you can run against any strategy before you put money on it.

What Is Overfitting? #

Overfitting happens when your strategy learns noise instead of signal.

Every price series contains two components: real, repeatable market structure (signal) and random, unrepeatable variation (noise). When you improve a strategy against historical data, you want it to learn the signal. But optimization engines are indifferent — they'll happily fit to either one, and they'll fit better to noise because noise is always there, always accommodating, always willing to be explained.

The mechanical result: in-sample (IS) performance is inflated. The optimizer found parameter combinations that happened to work on that specific dataset, at that specific time, under those specific conditions. It looks like edge. It isn't.

Then you test out-of-sample (OOS) or go live. The noise doesn't repeat — it can't, by definition. The strategy that produced 2.5% IS with Sharpe 1.8 drops to -0.2% OOS with Sharpe near zero.

@kevinkdog

“The more you curvefit the backtest, the more you are fitting the model to noise. Since noise will always be different going forward, the algo is basically doomed if it is tightly fit to that noisy historical data.”

^[1]

The insidious part: a well-fit strategy and a curve-fit strategy look identical in-sample. You need specific detection methods to tell them apart — and that's what most developers skip.

The Four Sources of Overfitting #

Understanding where overfitting comes from tells you where to look for it.

1. Sample Size Too Small #

Every statistical estimate has a margin of error that depends on sample size. When you improve strategy parameters on too little data, the estimates are noisy — the "best" parameter values are likely just the ones that happened to be lucky in a small sample, not the ones that reflect true market structure.

The critical nuance: effective sample size matters, not raw trade count. Intraday ES strategies often have 200-300 apparent trades in a backtest, but those trades cluster around news events, specific session hours, and volatility regimes. 300 trades on a single ES futures contract over three months might represent only 50-80 truly independent information events.

Daily bar strategies have the opposite problem: to get 5,000 bars of data on a futures contract, you need 20 years of history. You're either data-poor (too few independent trades) or you're using stale structure.

2. Market Regime Change #

Markets are not stationary. The statistical relationships between indicators and returns shift over time as volatility, trend persistence, liquidity, and policy environment change. A strategy optimized for 2021's quiet, low-volatility, Fed-backstopped environment doesn't work in 2022's rate-hiking, high-volatility environment — not because it's theoretically wrong, but because the parameters were calibrated to a regime that ended.

Regime sensitivity is especially acute for mean reversion strategies on ES and NQ. A system built to fade extreme moves works brilliantly when overnight gaps and intraday ranges follow historical distributions. When realized volatility jumps, every parameter that was calibrated to "extreme" is now pointing at normal.

3. Too Many Parameters #

Each additional parameter gives your optimization engine one more degree of freedom to fit noise. A strategy with 3 parameters has 3 dimensions to work with. One with 15 parameters has 15 dimensions. In 15 dimensions, there's almost always a combination that fits any historical dataset well — regardless of whether there's any real edge.

4. Parameter Interdependence #

Most strategies use multiple parameters that are not independent of each other — they interact. When you improve stop size and profit target together, the optimizer can find combinations that work for a specific volatility window without either parameter individually being "right."

@Fat Tails

“most of the systems I have seen use functions f(1), f(2), f(3) all derived from price and then try to improve parameters for those functions. This is like a dog that tries to catch its tail, it simply cannot work.”

^[2]

The solution is to introduce truly independent information sources — volume, time of day, intermarket correlations, market breadth — before worrying about parameter count.

Overfitting Math: Degrees of Freedom #

There's a simple rule most developers skip, and it explains a huge fraction of blown strategies.

Minimum 10-20 trades per parameter.

Raw trade count matters less than effective trade count. If your strategy has 8 parameters and generates 120 trades in IS, you're at 15 trades per parameter — borderline acceptable. If you have 8 parameters and 50 trades, you're at 6 trades per parameter. That strategy is mathematically guaranteed to be overfit.

@Big Mike

“My first guess would be that you have almost certainly overfit (Curve fit) to the historical data.”

^[3]

For an ES intraday strategy averaging 40 trades per month with a 3-month IS window, you have roughly 120 trades. If you're optimizing 10 parameters, that's 12 trades per parameter — borderline. The solution is to either extend the IS window or reduce the parameter count.

These are nominal trades, not effective trades. If half your trades happen in the 30 minutes after CPI releases, the effective independence is lower. Adjust down. Strong strategies should aim for 30+ trades per parameter with genuinely independent inputs.

Table showing minimum trades-per-parameter thresholds for ES futures strategy validation: 3 params needs 60-90 trades, 10 params needs 200-300, 15+ params is guaranteed overfit zone — Degrees of freedom math: a strategy with 10 parameters trading ES intraday at 40 trades per month needs at least 5-7 months of in-sample data at minimum -- and 12 months for robust results. Most retail backtests violate this threshold by design.

Overfitting illustration: in-sample vs out-of-sample equity curve divergence in ES futures backtesting — Classic overfitting pattern: the in-sample equity curve climbs steadily while the out-of-sample period collapses immediately, revealing the strategy learned noise rather than repeatable market structure.

Detection Methods: The Three Tests #

Before risking capital on any strategy, run all three of these tests. If the strategy fails any one of them, it's overfit.

Test 1: Timeframe Robustness #

Improve your strategy on 1-minute ES bars. Then, without changing a single parameter, run it on 2-minute and 5-minute bars. A strong strategy should show similar directional performance across timeframes — the Sharpe won't be identical, but it should be in the same ballpark.

If your ES 1-minute strategy has a 1.8 Sharpe in IS but the 2-minute backtest shows Sharpe 0.3 with the same parameters, the "edge" was a 1-minute artifact — probably microstructure noise that doesn't exist at coarser resolutions.

Test 2: Correlated Instrument Test #

If your strategy is capturing genuine market structure in ES, it should work in NQ — not identically, but meaningfully. Both are equity index futures driven by similar macro flows.

@Big Mike

“Switch to a highly correlated instrument. For example if trading ES then switch to YM or NQ and re-run the test. In both cases, your final results should be highly correlated with the originals. If they aren't then likely curve fitted to specific data.”

^[3]

Test 3: Out-of-Sample Testing (One Time Only) #

Reserve 20-30% of your data before development starts. Don't touch it. Build and improve entirely on the in-sample data. Then, once you've finalized parameters, test on the reserved data. Once.

@Big Mike's explanation is the clearest version of this rule: "Once you have tested your strategy on out of sample data you cannot make changes to your strategy and re-test it on that data. It is no longer out of sample, and any changes you make to it are now curve fitted." ^[4]

Every time you look at OOS results and then adjust your strategy, you contaminate the OOS. After the first look, it's in-sample. For statistical meaning, you need at least 100 trades in the OOS period.

@Carl123

“take the oldest 20% of the price data, improve the parameters. Use only the performance on the 20% in sample data to determine the values and use these on the whole dataset. Then take the most recent 20%... then the middle 20%... If all three equities still look great, I might have something.”

^[5]

The OOS test reveals truth: both strategies showed the same IS results, but the robust strategy degraded ~20-30% out-of-sample (expected) while the overfit strategy collapsed immediately -- classic noise-fitting signature.

Walk-Forward Optimization: The Real Test #

Walk-forward optimization (WFO) is the closest thing algorithmic trading has to a rigorous scientific test. Instead of a single IS/OOS split, you roll the evaluation window across time, repeatedly optimizing on recent data and testing on the next unseen period.

Walk-forward optimization diagram showing rolling IS/OOS windows for ES futures strategy validation — Walk-forward optimization with 3:1 IS:OOS ratio: each OOS period is tested on data the strategy has never seen, and the aggregate of 10-20 OOS periods provides a statistically meaningful assessment of real-world robustness.

Tip

WFO Key Rules: Use a rolling (unanchored) window so old regime data doesn't dominate. Standard ratio is 3:1 IS:OOS. You need 10-20 OOS periods minimum for statistical significance — with monthly OOS periods and 3:1 ratio, that means 40-80 months of total data. @kevinkdog is explicit: "That one period of out of sample might not be significant — that's why true walkforward testing has 10-20+ out of sample periods." ^[6]

Kevin Davey details these WFO protocols extensively in his book Building Winning Algorithmic Trading Systems (Wiley, 2014) ^[11], drawing on his experience winning the World Cup Trading Championship three consecutive years with triple-digit returns — a practical guide that bridges the gap between academic walk-forward theory and live futures execution.

@kbellare, who tested over 100 strategies with WFO, notes: "the 3:1 (3 in-sample, 1 out-of-sample period) ratio is well-established." ^[7] He also flags that the absolute period matters: a daily strategy might use 3-month IS / 1-month OOS, but a weekly strategy may need much longer windows.

Look at the distribution of OOS period results, not just the aggregate. A strategy positive in 12 of 15 OOS periods has a very different confidence profile than one positive in 6 periods but with 3 massive wins. You want consistent performance with roughly similar metrics from period to period.

@WoodyFox demonstrates an interesting technique — using the "Rate of Change" of parameters rather than the best parameter value across optimization periods. Testing on NQ, walking forward with ROC outperformed POC by over 20%. ^[8]

Walk-forward optimization diagram with rolling IS and OOS windows showing 3:1 ratio across three sequential run periods — Walk-forward structure with 3:1 IS:OOS ratio and rolling windows: each run optimizes on the most recent 12 months and tests on the subsequent 4-month OOS period. Minimum 10-20 sequential OOS periods required for statistical significance.

Monte Carlo Significance Testing #

Even after WFO, you have one more question to answer: could these results have occurred by random chance?

The Monte Carlo permutation test answers this. You take your strategy's trade-level PnL series and shuffle it thousands of times, creating a distribution of "null hypothesis" results — what performance would look like if there were no real edge and the results were pure random luck. Then you compare your actual results to this distribution.

If your actual Sharpe ratio lands in the top 5% of the null distribution, you have statistical evidence of edge. If it falls in the middle, the results are consistent with random chance — regardless of how good the absolute numbers look.

Critical technical point: don't permute individual trades. Shuffle daily PnL blocks instead — individual trade shuffling destroys the time dependence that makes markets what they are. Block bootstrapping preserves autocorrelation and volatility clustering for a more honest null distribution. The mathematical foundation for quantifying overfitting probability was formalized by Bailey, Borwein, López de Prado, and Zhu in The Probability of Backtest Overfitting (Journal of Computational Finance, 2017) ^[12]. Their combinatorially symmetric cross-validation (CSCV) framework demonstrated that the probability of overfitting increases with the number of strategy configurations tested — even when proper out-of-sample methods are employed.

For intraday ES/NQ strategies, use daily blocks. Generate 10,000 permutations. If your strategy's Sharpe doesn't clear a 5% p-value threshold, be skeptical regardless of the absolute numbers.

Monte Carlo significance test chart showing null distribution of 10000 random permutations with actual strategy Sharpe in top 3% indicating statistical significance — Monte Carlo block bootstrap: the strategy's Sharpe (1.42) lands in the top 3% of 10,000 permuted trade series -- statistical evidence of real edge. A strategy whose Sharpe falls near the middle of this distribution is consistent with random chance, regardless of how good the number looks in isolation.

The Simplicity Principle #

The most counterintuitive lesson in algorithmic strategy development: simpler strategies survive longer.

A simple strategy with 3-4 parameters might have a Sharpe of 0.8 in IS. A complex strategy with 15 optimized parameters might have a Sharpe of 1.8. Most traders will deploy the complex strategy. It usually fails.

Every parameter you add gives your model one more way to fit noise. The complex model's higher IS performance is actually a warning sign — the model found noise patterns that existed in the training data but won't persist.

@kevinkdog articulates this clearly: "The algo model should be 'fit' as little as possible, and should be as simple as possible. Ideally, this means the algo may tease out the 'signal' part instead of the noise... An example: a simple algo (Algo A) that goes long on a 50 bar high. A complicated version (Algo B) optimized to get a 46 bar high/low entry, stoploss 1.65ATR, profit target 2.33ATR. Algo A is more likely to perform well in future than Algo B." ^[1]

The practical test: remove one parameter from your strategy. If IS performance drops slightly but the strategy remains viable, the parameter probably wasn't contributing real edge. If removing it causes OOS performance to improve or stabilize, you've confirmed it was a noise-fitter. Apply this iteratively until removing any parameter would materially hurt IS performance in a way that doesn't recover OOS.

Two parameter sensitivity charts: left shows smooth performance degradation around optimal for robust strategy, right shows cliff edges immediately adjacent to optimal for overfit strategy — Parameter sensitivity test: vary each parameter +/-20% from optimal without re-optimizing. A robust strategy degrades smoothly (left) -- a curve-fitted strategy collapses at the slightest deviation from the exact optimized value (right).

Objective Function Traps #

What you improve for matters as much as how you improve.

The most common mistake is optimizing for total net profit or raw Sharpe ratio. These metrics are manipulable by the optimizer in ways that look great in IS but fail in reality.

@kbellare

“choosing 'Highest/Lowest' metrics (e.g. Highest Profit Factor, Highest Select Net Profit, Highest MAR, etc) set you up for failure — by definition, they pick the outliers in-sample period which invariably fail in out-of-sample periods.”

^[7]

Better objective functions:

Objective	The Problem	Better Alternative
Max total profit	Selects for lucky big winners	Profit factor > 1.5 with minimum trade count
Max Sharpe	Non-normal returns, manipulable by reducing variance via tail cuts	Strong Sharpe computed on median/percentile returns
Min drawdown	Incentivizes subtle tail manipulation	Max drawdown constrained to X%, improve for return within constraint
Max profit factor	Ignores trade count — 1 trade with a huge win hits this	Minimum 50+ trades AND profit factor > 1.5

The principle: use objective functions with constraints. Require minimum trade counts, maximum drawdown limits, and consistency across time periods. Optimizing within multiple constraints forces the optimizer to find strategies that work broadly.

Table comparing optimization objective functions, their overfitting risks, and better constrained alternatives for futures strategy development — Optimization objective traps: max total profit, max Sharpe, min drawdown, and max profit factor are all manipulable by the optimizer in ways that inflate IS results while failing OOS. Constrained multi-objective optimization forces the optimizer to find broadly robust strategies.

Regime Sensitivity #

A strategy can be well-validated against overfitting and still fail — because it was built for a specific market regime that no longer exists. Regime sensitivity is distinct from overfitting. An overfit strategy learned noise. A regime-sensitive strategy learned real signal — but signal that's conditional on a market state that has changed.

The test: split your data by volatility regime and evaluate strategy performance separately. Use a simple split: days when the VIX is above/below its median value, or days when the ES true range is in the top/bottom quartile.

What kills traders is having an implicit regime filter. The optimizer found parameters that work in a specific vol regime, but there's no explicit filter in the code. The strategy runs regardless of conditions. In the regime it was trained on, it performs. In every other regime, it loses money.

For ES/NQ trading, the major regimes to test:

High vs. low realized volatility (use 20-day realized vol, split at median)
Trending vs. ranging (use ADX > 25 as trending, < 20 as ranging)
FOMC/CPI event days vs. normal days (the market behaves structurally differently)
First hour vs. afternoon session (intraday behavior differs much by time window)

If performance degrades materially in any regime split, either add explicit regime filtering or understand that the strategy will fail in that regime and size so.

Two equity curves showing ES mean reversion strategy performance in low volatility regime (Sharpe 1.6, rising) vs high volatility regime (Sharpe negative 0.4, declining) — Regime sensitivity: the same ES mean reversion strategy delivers Sharpe 1.6 in low-volatility periods and Sharpe -0.4 in high-volatility periods -- implicit regime conditioning with no explicit filter in the rules. Fix: add an explicit VIX or ATR regime filter and count it in your degrees-of-freedom calculation.

Regime sensitivity analysis chart showing ES futures strategy performance in high vs low volatility environments — Regime sensitivity analysis for an ES mean reversion strategy: strong positive performance in low-volatility periods collapses to negative expectancy in high-volatility periods, revealing implicit regime conditioning.

When These Methods Fail #

None of these techniques guarantee a strong strategy. They reduce the probability of overfitting — they don't eliminate it.

When Anti-Overfitting Methods Fail: Detection and Recovery — Even proper validation methods can fail. Understanding when walk-forward, Monte Carlo, and out-of-sample testing give false confidence -- and what to do about it.

Walk-forward optimization can itself be overfit. If you run WFO across dozens of strategy variations and pick the one with the best aggregate WFO result, you've moved the overfitting up one level.

@Trembling Hand

“you have also just curve fitted your results to that moment in time. It was almost inevitably to find that result. But its a false positive.”

^[9] The meta-level overfitting problem is real — selection bias operates at every stage of the research pipeline.

Monte Carlo tests assume your historical sample represents the true return distribution. Fat tails, regime changes, and structural breaks mean the past doesn't fully characterize the future.

The correlated instrument test misses strategy-specific risks. ES and NQ correlate closely in normal conditions but diverge during stress. And every regime filter you add is a parameter that must be included in the trades-per-parameter calculation.

There is no method that definitively proves a strategy has edge. You can reduce the probability of being wrong, but not to zero. The appropriate response is position sizing — size small enough that if the strategy fails, it's a learning experience rather than a catastrophe.

Practical Application: Pre-Deployment Checklist #

Pre-deployment validation pipeline showing 8 gates between backtest and live capital — The 8-gate validation pipeline: data sufficiency, protocol hygiene, stability tests, cost realism, objective discipline, regime audit, statistical significance, and simplicity check. All gates must pass before deploying with real capital.

Before any strategy goes live with real capital, run through this checklist. A strategy that passes all eight checks is still not guaranteed to work — but it's been through the level of scrutiny that serious systematic traders apply.

A) Data Sufficiency

[ ] Calculated effective sample size (not just nominal trade count)
[ ] Minimum 10-20 trades per parameter (effective, not nominal)
[ ] IS window long enough to span at least one full market cycle

B) Protocol Hygiene

[ ] OOS data set aside before any optimization started
[ ] OOS inspected exactly once (after final parameter selection)
[ ] Walk-forward run with minimum 10 OOS periods

C) Stability Tests

[ ] Timeframe robustness: similar performance at +-1-2 bar sizes
[ ] Correlated instrument: ES strategy tested on NQ with minimal scaling
[ ] Parameter sensitivity: +-20% of each parameter causes smooth degradation

D) Cost Realism

[ ] Consistent spread/slippage model across IS and OOS
[ ] Round-turn cost includes commissions + exchange fees

E) Objective Discipline

[ ] Optimization did not solely target max profit or max Sharpe
[ ] Optimization constrained by maximum drawdown and minimum trade count
[ ] Results not dominated by one or two standout trade periods

F) Regime Audit

[ ] Performance split by volatility regime (high/low realized vol)
[ ] Performance split by trend/range regime (ADX or similar)
[ ] Any regime-conditional behavior explicitly built into strategy rules

G) Statistical Significance

[ ] Block bootstrap Monte Carlo run (minimum 10,000 permutations)
[ ] Sharpe and total return land in top 10% of null distribution
[ ] p-value < 0.10 at minimum (< 0.05 strongly preferred)

H) Simplicity Check

[ ] Removed every parameter that didn't survive ablation testing
[ ] Strategy rationale can be explained in one or two sentences
[ ] No parameters were added solely because they improved IS results

Citations #

@kevinkdog, "Sustained success with an algo," Elite Algorithmic NinjaTrader Trading, April 2022. https://nexusfi.com/showthread.php?t=58315&p=864023#post864023
@Fat Tails, "An experiment on curve fitting," Traders Hideout, May 2010. https://nexusfi.com/showthread.php?t=3950&p=42905#post42905
@Big Mike, "Benchmarks for a good automated ES trading system," Emini and Emicro Index, February 2014. https://nexusfi.com/showthread.php?t=30494&p=387495#post387495
@Big Mike, "Does backtesting work?", Elite Quantitative GenAI/LLM, July 2011. https://nexusfi.com/showthread.php?t=11896&p=133283#post133283
@Carl123, "Backtest Strategy weak points," NinjaTrader, October 2020. https://nexusfi.com/showthread.php?t=55923&p=822657#post822657
@kevinkdog, "KJ Trading Systems Kevin Davey - Ask Me Anything (AMA)," Trading Reviews and Vendors, December 2015. https://nexusfi.com/showthread.php?t=26335&p=543481#post543481
@kbellare, "Walk Forward Testing & Optimization Experiences and Best Practices," NinjaTrader, December 2013. https://nexusfi.com/showthread.php?t=23495&p=377865#post377865
@WoodyFox, "Woody's thoughts and things of interest," Trading Journals, July 2021. https://nexusfi.com/showthread.php?t=57378&p=847149#post847149
@Trembling Hand, "How quickly do algos go bad?", Elite Quantitative GenAI/LLM, July 2021. https://nexusfi.com/showthread.php?t=57404&p=847674#post847674
@kbellare, "Walk Forward Testing & Optimization Experiences and Best Practices," NinjaTrader, December 2013. https://nexusfi.com/showthread.php?t=23495&p=373229#post373229
Kevin J. Davey, Building Winning Algorithmic Trading Systems: A Trader's Path From Data Mining to Monte Carlo Simulation to Live Trading (Wiley, 2014). https://www.wiley.com/en-us/Building+Winning+Algorithmic+Trading+Systems-p-9781118778883
David H. Bailey, Jonathan M. Borwein, Marcos López de Prado, and Qiji Jim Zhu, "The Probability of Backtest Overfitting," Journal of Computational Finance, 2017. https://www.davidhbailey.com/dhbpapers/backtest-prob.pdf

Knowledge Map

🔭

Go Deeper

Build on this knowledge

⚙ Algorithmic Trading in Futures: From Signal to Execution to Survival Algorithmic Trading ⚙ Walk-Forward Analysis: The Stress Test That Separates Robust Strategies from Curve-Fit Miracles Algorithmic Trading ⚙ Genetic Algorithms and Evolutionary Optimization for Futures Strategy Development Algorithmic Trading ⚙ Strategy Evaluation Metrics for Automated Futures Trading: Sharpe, Sortino, Drawdown, and the Numbers That Actually Matter Algorithmic Trading

📍

References This Article

Articles that build on this topic

📚 Statistical Edge in Futures Trading: How to Define, Measure, and Defend What You Think You Have Core Concepts ⚙ Order Flow Integration for Automated Futures Trading: DOM, Footprint, and Delta as Machine Inputs Algorithmic Trading 🖥 Pine Script Strategy Backtesting: The Complete Guide to Reliable TradingView Backtests Trading Platforms ⚙ Algorithmic Trading in Futures: From Signal to Execution to Survival Algorithmic Trading ⚙ From Discretionary to Systematic: Building Your First Automated Futures Strategy Algorithmic Trading ⚙ Backtest to Live: Closing the Performance Gap in Automated Futures Trading Algorithmic Trading ⚙ Genetic Algorithms and Evolutionary Optimization for Futures Strategy Development Algorithmic Trading ⚙ Reinforcement Learning for Futures Trading: Building Adaptive Strategies That Learn from Market Feedback Algorithmic Trading ⚙ Monte Carlo Simulation for Futures Strategy Validation: Stress-Testing Your System Before It Stress-Tests Your Account Algorithmic Trading

Citations

@kevinkdog — Sustained success with an algo (2022) 👍 3
“The more you curvefit the backtest, the more you are fitting the model to noise. Since noise will always be different going forward, the algo is essentially doomed if it is tightly fit to that noisy historical data.”
@Fat Tails — An experiment on curve fitting (2010) 👍 3
“most of the systems I have seen use functions f(1), f(2), f(3) all derived from price and then try to optimize parameters for those functions. This is like a dog that tries to catch its tail, it simply cannot work.”
@Big Mike — Benchmarks for a good automated ES trading system (2014) 👍 3
“Switch to a highly correlated instrument. For example if trading ES then switch to YM or NQ and re-run the test. If they aren't then likely curve fitted to specific data.”
@Big Mike — Does backtesting work? (2011) 👍 2
“Once you have tested your strategy on out of sample data you cannot make changes to your strategy and re-test it on that data. It is no longer out of sample.”
@Carl123 — Backtest Strategy weak points (2020) 👍 3
“take the oldest 20% of the price data, optimize the parameters. Use only the performance on the 20% in sample data to determine the values and use these on the whole dataset.”
@kevinkdog — KJ Trading Systems Kevin Davey - Ask Me Anything (AMA) (2015) 👍 6
“That one period of out of sample might not be significant -- that's why true walkforward testing has 10-20+ out of sample periods.”
@kbellare — Walk Forward Testing & Optimization Experiences and Best Practices (2013) 👍 6
“choosing Highest/Lowest metrics set you up for failure -- by definition, they pick the outliers in-sample period which invariably fail in out-of-sample periods.”
@WoodyFox — Woody's thoughts and things of interest (2021) 👍 6
“WFO is single greatest tool that a systematic trader has in their toolbox. Walking Forward with ROC will allow us to eliminate the 1 in 50 chance and has a profit of $53,067.50.”
@Trembling Hand — How quickly do algos go bad? (2021) 👍 5
“you have also just curve fitted your results to that moment in time. It was almost inevitably to find that result. But its a false positive.”
Kevin J. Davey — Building Winning Algorithmic Trading Systems: A Trader's Journey From Data Mining to Monte Carlo Simulation to Live Trading (2014)
David H. Bailey, Jonathan M. Borwein, Marcos López de Prado, Qiji Jim Zhu — The Probability of Backtest Overfitting (2017)

Help Improve This Article

NexusFi Elite Members can help keep Academy articles accurate and comprehensive.