computer-science

When does learning from data work (math starting from basic probability)

Hacker News · May 23, 2026, 11:58 PM

Key takeaways

This post answers a single question: when does learning from data actually work?
You train a model on samples, it performs well on those samples, and you hope it performs well on new data.
Nothing is assumed beyond basic probability (expectations, independence, indicator random variables).

This post answers a single question: when does learning from data actually work?

You train a model on samples, it performs well on those samples, and you hope it performs well on new data. When is that hope justified? The answer turns out to be a clean equivalence: a hypothesis class is learnable if and only if it has finite VC dimension. This is the Fundamental Theorem of Statistical Learning.

We'll build every piece from scratch: Markov's inequality, Hoeffding's lemma, concentration inequalities, the union bound, VC dimension, the Sauer–Shelah lemma, symmetrization, and the generalization bounds that tie it all together. Nothing is assumed beyond basic probability (expectations, independence, indicator random variables). We prove both directions of the equivalence: the upper bound (finite VC dimension implies learnability) and the lower bound (via information theory: total variation, KL divergence, Pinsker's inequality, and Le Cam's method).

Article preview — originally published by Hacker News. Full story at the source.

Read full story on Hacker News → More top stories

Aggregated and edited by the Scoop newsroom. We surface news from Hacker News alongside other reporting so you can compare coverage in one place. Editorial policy · Corrections · About Scoop

When does learning from data work (math starting from basic probability)

Key takeaways

More in computer-science