Concepts / Sharpe vs PSR vs DSR
Sharpe vs PSR vs DSR
6 min read · Concept
The three numbers that tell you whether a backtest's headline Sharpe is real edge or a story your data is telling you.
The problem with Sharpe alone
Sharpe is a sample statistic. A backtest can show Sharpe 1.5 that's actually 0.3 in reality, just from sampling noise on short windows. Five years of monthly returns is 60 observations. The standard error on a Sharpe of 1.5 at n=60 is roughly ±0.3 — and that's assuming returns are normal, which they aren't. On weekly data with 60 observations the same Sharpe sits inside a 95% CI that comfortably contains zero.
Reporting Sharpe alone isn't lying — it's just the wrong unit. You need a number that says “given my sample size and the shape of my returns, how confident should I be that this Sharpe is real?”
PSR (Probabilistic Sharpe Ratio)
PSR comes from Bailey & López de Prado (2012). It answers exactly that question. PSR(SR* = 0) gives you the probability that the true Sharpe is above zero, given your observed Sharpe, sample size, skew, and kurtosis.
The formula corrects the naïve standard error for non-normality — return distributions with negative skew and fat tails (i.e. real strategies) deflate the PSR sharply. A backtest with Sharpe 1.5, 60 monthly bars, mild positive skew, and benign kurtosis might come out with PSR 95%. The same headline Sharpe with negative skew and kurtosis 8 might come out with PSR 60%.
Read it as: PSR > 95% means high confidence the Sharpe is real in isolation. PSR is honest about sampling noise but it does not yet account for the elephant in the room — that you tested many candidates and picked the best one.
DSR (Deflated Sharpe Ratio)
DSR comes from Bailey & López de Prado (2014) and is the PSR adjusted for selection bias. The intuition: if you tried N candidate strategies and reported the best one's Sharpe, the expected best-of-N Sharpe is well above zero even when none of the candidates have any real edge.
The formula: DSR = PSR(SR* = E[max of N trial Sharpes]). The deflated benchmark E[max] grows with both N (the number of candidates tested) and the cross-sectional spread of those trial Sharpes. The more variants you tried and the more they varied, the higher the bar your headline Sharpe has to clear.
Read it as: DSR < 0.5 means the headline Sharpe is mostly selection bias — you would have expected to draw a number that good from a portfolio of nothing-burgers. DSR > 0.95 means the Sharpe survives even after accounting for the variants you tested.
The PSR-DSR gap
When PSR is high (e.g. 95%) and DSR is low (e.g. 5%), the gap (90 percentage points) measures how much of your headline Sharpe is selection inflation versus genuine edge. The Quantis result card shows this gap explicitly. A wide gap is not a verdict that the strategy is broken — it's a verdict that the variant-selection process is doing the heavy lifting, and you need to validate on new unseen data before believing the number.
Narrow gaps (e.g. PSR 92%, DSR 88%) mean the variant search added almost nothing — your strategy was good before you searched. Those are the result cards worth deploying.
What to do with each
- Isolation backtest, no candidate search: trust PSR. Sharpe alone is too noisy; PSR fixes the noise.
- Backtest with candidate search (Quantis runs 4 variants by default): trust DSR. PSR will lie to you because it doesn't know you searched.
- DSR essentially 0: the result is likely a random good draw from your candidate set. Treat it as a hypothesis to validate on new data, not a strategy to deploy.
Worked example
Strategy backtested with 4 variants. Headline Sharpe 0.64. PSR 97% — in isolation, the Sharpe is real (we're 97% confident the true Sharpe is above zero). DSR 0% — after adjusting for variant selection, no credible Sharpe survives. The 97-percentage-point gap is the selection inflation: essentially the entire headline number was the byproduct of picking the best of 4.
Treatment: don't deploy. Validate the surviving variant on a new unseen window — fresh data the candidate search never touched. If the Sharpe holds up out-of-sample, the original DSR was the wrong question; the strategy is real and the variant search just happened to pick correctly. If the Sharpe collapses, DSR was right and you nearly deployed noise.
Further reading
- Bailey & López de Prado (2012). The Sharpe ratio efficient frontier. Journal of Risk.
- Bailey & López de Prado (2014). The deflated Sharpe ratio. Journal of Portfolio Management.
- Momentum → the most common strategy where DSR exposes selection-inflated Sharpes.