Section 1.1

What Is Statistics?

Statistics is the science of learning from data when you can't see everything. You almost never get to measure an entire population, so you take a manageable sample, summarize it honestly, and use it to make a careful, uncertainty-aware guess about the whole. That's the entire enterprise — and this whole course is about doing it well.

Two jobs: describe, then infer

  • Descriptive statistics summarizes the data you actually have — averages, spreads, charts. It makes no claims beyond the data in front of you.
  • Inferential statistics takes the leap: from the sample in hand to a statement about the population you didn't measure. This is where uncertainty — and all the clever machinery of the rest of the course — comes in.

The vocabulary that trips everyone up

Two pairs of words. Get these straight now and everything later is easier:

  • A population is everyone/everything you care about; a sample is the subset you actually measure.
  • A parameter is a true (usually unknown) number about the population — like its mean μ. A statistic is the matching number computed from your sample — like the sample mean x̄. We use the statistic to estimate the parameter.

🎮 Population → Sample → Estimate

The faint cloud is a whole population of 2,000 exam scores; its true mean μ (teal) is the parameter. Take a sample and your sample mean x̄ (orange) is the estimate. Bigger samples land closer.

Parameter μ (truth)
Statistic x̄ (estimate)
Estimation error

Notice the honest tension: your estimate is almost never exactly right, but with a bigger sample it's reliably close. Quantifying "how close, how confident" is precisely what sampling distributions, the Central Limit Theorem, and confidence intervals are for.

Why not just measure everyone?

Usually you can't. The "population" might be all potential customers, every possible repetition of an experiment, or all humans with a condition — too big, too expensive, or not yet existing. Sampling is not a compromise we're embarrassed about; done right, a few hundred well-chosen observations can pin down a population astonishingly well.

Why it matters: every technique ahead is a variation on this one move — sample, summarize, generalize. Keeping "parameter vs statistic" and "population vs sample" straight is the foundation the whole course is built on.