Quant

Monte Carlo testing, explained with a deck of cards

2026-07-02 · 6 min read

Imagine three years of trading as a deck of 965 cards. Each card is one trade — some say "+40 points," some say "-25 points." Your real history is just one particular order those cards happened to come out in. Shuffle the same cards differently and you get a different story: maybe the bad cards clump together early (a scary losing streak), or spread out evenly (a smooth ride). Same cards, same total — a completely different ride.

Monte Carlo is just: shuffle the deck and deal it again. 10,000 times. Then look at all 10,000 stories and ask — what usually happens? What's the worst that happens? Was my real story lucky, unlucky, or ordinary? (It's named after the Monte Carlo casino, because it's rolling the dice thousands of times instead of trusting one roll.)

Two games with the same deck

Game 1 — shuffle. Same 965 cards, a new order each time. The total never changes — it's the same cards. But the bumps along the way — drawdowns, losing streaks — change completely, because those are properties of the sequence, not the strategy. This shows how rough the ride could have been.
Game 2 — grab-bag. Instead of dealing the whole deck once, you draw a card, write it down, put it back, and repeat 965 times. Some cards get picked twice; some never get picked at all. Each run is a whole alternate-universe 3 years, built from the same trades but not the same trades.

What's a percentile?

Line up all 10,000 results from worst to best, like kids by height. The kid standing 5% from the short end is the 5th percentile — only 500 of the 10,000 results were worse. The kid in the middle is the median — the typical result. The kid 95% of the way up is the 95th percentile — only 5% did better.

10,000 grab-bag universes, lined up worst to best. Your real 3-year result sits close to the typical outcome.

"5th-percentile profit factor = 1.93" means: even in an unlucky-universe version of the same 965 trades, the strategy still earned about 1.93 points for every point it lost. That's why the edge isn't luck. For drawdowns it flips: "95th-percentile drawdown = -554" means only 5% of universes dug a deeper hole than that — the hole you pack supplies for.

Why do some universes make 18k+ when the real one made 15.4k?

That's Game 2's "put it back" rule doing its job. The real deck had a few monster-winner trades — big, one-off gains — and in real life each happened exactly once. But in the grab-bag, because cards go back after each draw, a lucky universe might draw those monster winners two or three times and skip several of the losers entirely. That's how a universe reaches 18,373 points at the 95th percentile. The unlucky universe does the opposite — extra losers, fewer monsters — and lands at 12,494.

That spread, roughly 12.5k to 18.4k, is the honest answer to "how much of my 15.4k was skill, and how much was the luck of the draw?"

The skill part: every single one of the 10,000 universes finished positive. The luck part: which positive number you land on can swing about ±3k points either way.

The takeaway

A single backtest — one shuffle of the deck — tells you one story. It can't tell you if that story was a lucky draw, an unlucky one, or business as usual. Running the deck 10,000 times turns "it made 15.4k points" into something far more useful: a range you can actually plan around, and a confidence that the edge survives even the unlucky universes.

The full numbers — drawdown percentiles, losing-streak lengths, and how they hold up across different market years — are on the trading engine case study. For the broader idea of testing a strategy against data it hasn't seen, see your backtest is lying to you and walk-forward validation.

Monte CarloQuantValidationBacktesting

← Back to the blog