Effect Size & Power
A p-value answers one narrow question: is there an effect at all? It says nothing about how big the effect is, or whether your study was even capable of finding it. Those two questions — answered by effect size and power — are what separate a meaningful result from a misleading one.
Effect size: how big, not just whether
With a huge sample, a trivially small difference can be "statistically significant." With a tiny sample, a huge difference can miss significance. Significance is tangled up with sample size — so we need a measure of the effect that isn't. That's effect size. For a difference between two means, the standard one is Cohen's d: the gap between the means measured in standard deviations.
d = (mean₁ − mean₂) / standard deviation
Rough conventions: d ≈ 0.2 is small, 0.5 medium, 0.8 large. It's the same currency as a z-score — a standardized distance — so it's comparable across studies and scales.
🎮 Effect Size & Power Explorer
Two groups, one real difference. The curves show the effect size (their overlap); the readout shows your power — the chance a study with this n actually detects the effect at α = .05.
Power: could you even detect it?
Power is the probability that your study correctly rejects the null when a real effect exists — your chance of not missing it. The convention is to aim for at least 80%. Play with the sliders and three levers emerge:
Bigger effect → more power. Slide d up: the curves pull apart, overlap shrinks, and a real difference becomes easy to catch. Tiny effects are genuinely hard to detect and need lots of data.
Bigger sample → more power. Slide n up and power climbs even with d fixed. More data sharpens the estimate, so smaller effects become detectable.
Why underpowered studies are dangerous
If power is only 40%, you'll miss a real effect more often than you find it — and a "non-significant" result tells you almost nothing. Worse, the few significant results that do squeak through tend to overestimate the effect. This is why researchers run a power analysis before collecting data: pick the effect size you care about, the power you want (say 80%), and solve for the sample size you need.
Why it matters: reporting an effect size alongside the p-value, and planning for adequate power, is the difference between research that replicates and research that doesn't. Significance is the start of the story, never the whole of it.