Skip to content

Glossary

Regression

Fitting predictors to outcomes

Regression is a statistical method for modelling the relationship between a dependent variable (the outcome you want to predict) and one or more independent variables (the predictors). The output is a function — typically with parameters fit to historical data — that lets you estimate the outcome from new predictor values.

The simplest form is linear regression: y = β₀ + β₁x + ε. The algorithm finds the β coefficients that minimise the sum of squared residuals (the “errors”). For a dataset of (height, weight) pairs, linear regression produces the best-fit line through the points, which lets you estimate weight from any new height.

Standard varieties:

  • Multiple linear regression: several predictors. y = β₀ + β₁x₁ + β₂x₂ + ... + ε.
  • Polynomial regression: the predictors include powers of x. y = β₀ + β₁x + β₂x² + .... Fits curved relationships.
  • Logistic regression: the outcome is binary (0/1). The model outputs a probability via the logistic function.
  • Ridge / lasso / elastic-net: linear regression with a penalty for large coefficients. Used when there are many predictors and you want to avoid overfitting.

The key sanity checks for any regression: how well does it fit the training data (R², residual plots), how well does it generalise to new data (cross-validation, holdout test set), do the residuals look random (or do they show patterns the model missed)?

Regression is the workhorse of empirical science. Correlation tells you how strongly two variables move together; regression gives you the equation that converts one to the other.

Related

Published May 16, 2026