← Back to all concepts

Concepts / Walk-forward vs CPCV

Walk-forward vs CPCV

5 min read · Concept

Two answers to the same question: 'how would this strategy have done on data it never saw during fitting?' They disagree often. Both matter.

The problem they both solve

A backtest that fits parameters on the same data it evaluates on is a story, not a forecast. The fix is out-of-sample (OOS) testing: split the data, fit on one slice, evaluate on a different slice. The interesting question is how you split. Walk-forward and CPCV (Combinatorial Purged Cross-Validation) are two different answers, and the gap between them tells you something useful about the strategy.

Walk-forward: temporal, sequential

Walk-forward splits the timeline into N expanding (or rolling) windows. Each window: fit on the past, evaluate on the next chunk forward, then roll the cutoff forward and repeat. Quantis uses 3 windows by default with expanding training sets and equal-size out-of-sample blocks.

# Walk-forward (3 windows, expanding train)
window 1: train [2015-2018]    test [2019]
window 2: train [2015-2019]    test [2020]
window 3: train [2015-2020]    test [2021]

# Final OOS Sharpe = average of 3 OOS Sharpes

Walk-forward is intuitive: it mimics how you'd actually trade, refitting periodically as new data lands. The downside is bias — only 3 OOS data points, all from the most recent regime. If the latest regime is unusual (it often is — last 5 years dominated by the post-COVID liquidity flood), all your OOS performance came from one regime and you don't know how the strategy handles others.

CPCV: combinatorial, regime-mixed

CPCV is from López de Prado, Advances in Financial Machine Learning (2018). The data is split into N groups (Quantis uses 10), then every combination of training/test groups is run. With 10 groups partitioned 8 train / 2 test, you get 45 distinct paths through the data — each combining a unique mix of historical regimes for training and evaluation.

# CPCV (10 groups, 2-group test sets → 45 paths)
group 1: 2015-Q1    group 6:  2017-Q2
group 2: 2015-Q2    group 7:  2017-Q3
...etc until 10 groups across the full sample

paths = combinations(10, 2) = 45

for each path:
  test on this 2-group slice
  train on the other 8 groups (purged of overlap)
  record OOS Sharpe

# Headline: median OOS Sharpe across 45 paths
# Trust pill: fraction of paths with positive Sharpe

The “purged” bit matters. If your features have a 5-day lookback and your labels have a 5-day forward return, naively splitting can leak information across the train/test boundary. Purging removes the boundary observations on both sides; embargoing extends the cushion past the test window. Quantis applies both.

Why we show both

They answer different questions, and the gap between them is informative.

  • Walk-forward Sharpe high, CPCV Sharpe low: the strategy worked in the most recent regime but fails on average across the historical regime mix. Your edge is regime-specific.
  • CPCV Sharpe high, walk-forward Sharpe low: the strategy works across most historical regimes but the most recent one happens to be unfriendly. Could be bad luck or could be a structural change — investigate which regimes specifically failed.
  • Both high, both stable: the strategy generalises. This is what you want to see before deployment.
  • Fraction of CPCV paths positive: this is the most underrated number on the result card. A strategy with median Sharpe 0.8 and fraction-positive 90% is dramatically more robust than one with median 0.8 and fraction-positive 55%. The latter has a wide tail and basically depends on which slice of history you happened to draw.

What to watch in the result card

  • Walk-forward consistency. If only 1 of 3 OOS windows is good, the headline OOS Sharpe is a regime artefact. Three roughly-similar windows is what real edge looks like.
  • CPCV fraction-positive > 70%. Below 60% is a yellow flag — your strategy fails in too many slices of history. Above 80% with a positive median is the green light.
  • Per-regime breakdown. Quantis decomposes CPCV paths by market regime (bull / bear / chop). A strategy that wins the bull paths and loses the bear paths is a long-only beta bet, not edge.

Common mistakes

  • Quoting a single OOS Sharpe number. The number is a point estimate; the distribution across walk-forward windows or CPCV paths is what tells you whether the point estimate is meaningful.
  • Tuning hyperparameters on walk-forward results. The moment you select between variants based on walk-forward OOS performance, the OOS becomes in-sample for the variant selection. CPCV gives you fresh paths to validate the choice on.

Further reading

  • Overfitting → the broader problem these methods exist to detect.
  • Sharpe vs PSR vs DSR → the metrics that quantify how much of your headline Sharpe is selection inflation.
  • López de Prado (2018). Advances in Financial Machine Learning. Wiley.