Concepts / Sharpe vs PSR vs DSR

Sharpe vs PSR vs DSR

6 min read · Concept

The three numbers that tell you whether a backtest's headline Sharpe is real edge or a story your data is telling you.

The problem with Sharpe alone

Sharpe is a sample statistic. A backtest can show Sharpe 1.5 that's actually 0.3 in reality, just from sampling noise on short windows. Five years of monthly returns is 60 observations. The standard error on a Sharpe of 1.5 at n=60 is roughly ±0.3 — and that's assuming returns are normal, which they aren't. On weekly data with 60 observations the same Sharpe sits inside a 95% CI that comfortably contains zero.

Reporting Sharpe alone isn't lying — it's just the wrong unit. You need a number that says “given my sample size and the shape of my returns, how confident should I be that this Sharpe is real?”

PSR (Probabilistic Sharpe Ratio)

PSR comes from Bailey & López de Prado (2012). It answers exactly that question. PSR(SR* = 0) gives you the probability that the true Sharpe is above zero, given your observed Sharpe, sample size, skew, and kurtosis.

The formula corrects the naïve standard error for non-normality — return distributions with negative skew and fat tails (i.e. real strategies) deflate the PSR sharply. A backtest with Sharpe 1.5, 60 monthly bars, mild positive skew, and benign kurtosis might come out with PSR 95%. The same headline Sharpe with negative skew and kurtosis 8 might come out with PSR 60%.

Read it as: PSR > 95% means high confidence the Sharpe is real in isolation. PSR is honest about sampling noise but it does not yet account for the elephant in the room — that you tested many candidates and picked the best one.

DSR (Deflated Sharpe Ratio)

DSR comes from Bailey & López de Prado (2014) and is the PSR adjusted for selection bias. The intuition: if you tried N candidate strategies and reported the best one's Sharpe, the expected best-of-N Sharpe is well above zero even when none of the candidates have any real edge.

The formula: DSR = PSR(SR* = E[max of N trial Sharpes]). The deflated benchmark E[max] grows with both N (the number of candidates tested) and the cross-sectional spread of those trial Sharpes. The more variants you tried and the more they varied, the higher the bar your headline Sharpe has to clear.

Read it as: DSR < 0.5 means the headline Sharpe is mostly selection bias — you would have expected to draw a number that good from a portfolio of nothing-burgers. DSR > 0.95 means the Sharpe survives even after accounting for the variants you tested.

The PSR-DSR gap

When PSR is high (e.g. 95%) and DSR is low (e.g. 5%), the gap (90 percentage points) measures how much of your headline Sharpe is selection inflation versus genuine edge. The Quantis result card shows this gap explicitly. A wide gap is not a verdict that the strategy is broken — it's a verdict that the variant-selection process is doing the heavy lifting, and you need to validate on new unseen data before believing the number.

Narrow gaps (e.g. PSR 92%, DSR 88%) mean the variant search added almost nothing — your strategy was good before you searched. Those are the result cards worth deploying.

What to do with each

Isolation backtest, no candidate search: trust PSR. Sharpe alone is too noisy; PSR fixes the noise.
Backtest with candidate search (Quantis runs 4 variants by default): trust DSR. PSR will lie to you because it doesn't know you searched.
DSR essentially 0: the result is likely a random good draw from your candidate set. Treat it as a hypothesis to validate on new data, not a strategy to deploy.

Worked example

Strategy backtested with 4 variants. Headline Sharpe 0.64. PSR 97% — in isolation, the Sharpe is real (we're 97% confident the true Sharpe is above zero). DSR 0% — after adjusting for variant selection, no credible Sharpe survives. The 97-percentage-point gap is the selection inflation: essentially the entire headline number was the byproduct of picking the best of 4.