Backtesting Costs & Walk-Forward Testing

Why backtests without commission and slippage are fiction, and why parameters optimised on historical data don't survive forward testing — with practical fixes for both.

Two assumptions silently destroy backtest credibility: that trades execute at exactly the signal price, and that the parameters found by optimisation will continue to work on unseen data.

The first assumption is broken by commission and slippage. Every real trade incurs execution costs — the bid/ask spread, brokerage commission, and market impact. On a strategy that takes many trades, these costs compound into a substantial drag. A strategy with a profit factor of 1.4 in a zero-cost backtest may have a profit factor of 0.9 after realistic costs — a profitable-looking system that actually loses money.

The second assumption is broken by curve-fitting. When you test 50 parameter combinations and keep the best one, you have essentially found the settings that happened to fit the historical data best. Those settings often fail on new data because they captured noise rather than genuine market structure. Walk-forward testing addresses this by separating the data used to find parameters from the data used to evaluate them.

Commission: the compounding drag

Commission applies to every trade — twice. Once when you enter and once when you exit. A 0.1% commission per side is 0.2% round-trip. On 100 trades per year, that is 20% of the traded capital consumed by commissions, regardless of the strategy's gross performance.

The `commission_type` parameter in Pine Script's `strategy()` declaration accepts several formats:

`strategy.commission.percent` — percentage of trade value per fill (most common for equities and futures)
`strategy.commission.cash_per_contract` — fixed dollar amount per contract (common for futures)
`strategy.commission.cash_per_order` — fixed dollar amount per order (flat-fee brokers)

Setting `commission_value = 0.1` with `strategy.commission.percent` means 0.1% of the trade's notional value is deducted on each fill. A $10,000 trade incurs $10 commission per fill, $20 round trip.

The practical test: run your strategy first with no commission, then with realistic commission. If the profit factor drops significantly (say, from 1.8 to 1.1), the strategy's raw edge is too thin to survive real-world execution costs and needs refinement.

Slippage: fills never happen at the signal price

Slippage is the difference between the price your script signals an entry and the price the order actually fills at. In liquid markets during normal hours, slippage is modest — often just 1–2 ticks. In thin markets, at market open, or during news events, slippage can be substantial.

The `slippage` parameter in Pine Script specifies the number of ticks of adverse slippage applied to each fill:
Long entries fill at `signal_price + (slippage × tick_size)`
Long exits (stops) fill at `stop_price - (slippage × tick_size)`
Short entries fill at `signal_price - (slippage × tick_size)`

A realistic starting value is 1–2 ticks for liquid instruments. For less liquid instruments, 3–5 ticks or more may be appropriate.

Don't forget the spread

Pine's `commission_type` and `slippage` settings do not model the bid/ask spread. On futures and most equities, spread is effectively rolled into slippage on market orders, so 1–2 ticks of slippage covers the spread reasonably well. On forex and crypto, the spread is often significant and separate: a 2-pip spread on a major FX pair is roughly 0.02% of notional per round-trip, on top of commission. For those markets, add the typical spread (in pips or ticks) into your `slippage` value so that the backtest sees the full friction a live fill would incur.

Walk-forward testing: separating discovery from validation

Walk-forward testing is a methodology for testing whether optimised strategy parameters generalise to unseen data. The process:

Divide your history into segments — e.g. 12 months of training data followed by 3 months of test data
Optimise on the training period — find the parameter set (EMA lengths, ATR multipliers, etc.) that performs best on those 12 months
Apply to the test period — run the strategy with those exact parameters on the 3 months that were not part of the optimisation. No further adjustments.
Record the test-period result — this is your out-of-sample performance
Slide forward — move the window forward 3 months and repeat
Aggregate — the combined out-of-sample results give a realistic estimate of live performance

If the out-of-sample results are reasonably close to the in-sample results, the strategy has some robustness. If the out-of-sample results are dramatically worse, the in-sample optimisation was largely curve-fitting.

In TradingView, you cannot automate walk-forward testing from Pine Script alone, but you can perform it manually:

Fix a start and end date using the strategy settings panel
Run the optimisation on the training window
Note the best parameters
Change the date range to the test window
Manually input the parameters found in step 3 — do not re-optimise
Record the result

Key metrics that expose fragility

A strategy that is genuinely robust tends to show these characteristics:

Consistent profit factor across time periods — not dramatically better in one period than another
Similar win rate across sub-periods
Drawdown proportional to returns — not concentrated in one period
Performance degrades gracefully as commission is increased — not collapsing at the first hint of cost

A strategy that fails these checks is not necessarily worthless, but it warrants more investigation before trading it live.

When parameters drift significantly across windows

A subtle failure mode of walk-forward testing: the in-sample and out-of-sample numbers both look fine in each window, but the optimal parameters themselves are wildly different across windows. Window 1 prefers ATR multiplier 2.5; window 2 prefers 1.8; window 3 prefers 3.2. Each window's out-of-sample test only uses its own optimised params, so each window looks OK — but the "best" parameters clearly aren't stable.

That is a sign the strategy is piggy-backing on regime-specific behaviour that the optimiser keeps chasing. The honest response is to either (a) pick a parameter value that represents a sensible compromise across all windows and accept somewhat lower per-window performance in exchange for stability, or (b) make parameter selection regime-aware (use ATR mult 2.5 in high-vol environments, 1.8 in low-vol) and backtest that combined logic end-to-end. What you must not do is quietly present the best in-sample parameter for each window as if that was achievable in live trading — it wasn't, because you didn't know which window you'd be in.

//@version=6
strategy("Realistic Backtest — Commission & Slippage",
         overlay=true,
         commission_type  = strategy.commission.percent,
         commission_value = 0.1,    // 0.1% per side = 0.2% round trip
         slippage         = 2,      // 2 ticks of slippage per fill
         default_qty_type = strategy.percent_of_equity,
         default_qty_value = 10)    // risk 10% of equity per trade

fastEMA = ta.ema(close, 9)
slowEMA = ta.ema(close, 21)
atrVal  = ta.atr(14)

longEntry  = ta.crossover(fastEMA, slowEMA)  and barstate.isconfirmed
shortEntry = ta.crossunder(fastEMA, slowEMA) and barstate.isconfirmed

if longEntry
    strategy.entry("Long", strategy.long)
    strategy.exit("Long Exit", "Long",
                  stop=close - atrVal * 2, limit=close + atrVal * 4)

if shortEntry
    strategy.entry("Short", strategy.short)
    strategy.exit("Short Exit", "Short",
                  stop=close + atrVal * 2, limit=close - atrVal * 4)

plot(fastEMA, "Fast EMA", color=color.blue)
plot(slowEMA, "Slow EMA", color=color.orange)
plotshape(longEntry,  style=shape.triangleup,   location=location.belowbar, color=color.green, size=size.small)
plotshape(shortEntry, style=shape.triangledown, location=location.abovebar, color=color.red,   size=size.small)

// ── Performance summary table ─────────────────────────────────────────
var table t = table.new(position.top_right, 2, 6,
                         bgcolor=color.new(color.black, 75), border_width=1)

if barstate.islast
    winRate = strategy.wintrades / math.max(strategy.closedtrades, 1) * 100
    pf      = strategy.grossprofit / math.max(strategy.grossloss, 1)

    table.cell(t, 0, 0, "Net Profit",    text_color=color.white, text_size=size.small)
    table.cell(t, 1, 0, str.tostring(strategy.netprofit, "#.##"),
               text_color=strategy.netprofit > 0 ? color.green : color.red, text_size=size.small)
    table.cell(t, 0, 1, "Win Rate",      text_color=color.white, text_size=size.small)
    table.cell(t, 1, 1, str.tostring(winRate, "#.#") + "%", text_color=color.white, text_size=size.small)
    table.cell(t, 0, 2, "Profit Factor", text_color=color.white, text_size=size.small)
    table.cell(t, 1, 2, str.tostring(pf, "#.##"),
               text_color=pf >= 1.5 ? color.green : pf >= 1.0 ? color.yellow : color.red, text_size=size.small)
    table.cell(t, 0, 3, "Max Drawdown",  text_color=color.white, text_size=size.small)
    table.cell(t, 1, 3, str.tostring(strategy.max_drawdown, "#.##"), text_color=color.red, text_size=size.small)
    table.cell(t, 0, 4, "Total Trades",  text_color=color.white, text_size=size.small)
    table.cell(t, 1, 4, str.tostring(strategy.closedtrades), text_color=color.white, text_size=size.small)
    table.cell(t, 0, 5, "Commission",    text_color=color.white, text_size=size.small)
    table.cell(t, 1, 5, "0.1%/side",     text_color=color.gray,  text_size=size.small)