Linear regression by least squares: the slope equals the sum of (x sub i minus x bar) times (y sub i minus y bar) divided by the sum of (x sub i minus x bar) squared, and the intercept equals y bar minus the slope times x bar.
Best-fit line:
r² = 0.9976 (excellent fit)
Show Your Work
Slope: m = Σ(xᵢ − x̄)(yᵢ − ȳ) / Σ(xᵢ − x̄)²
Intercept: b = ȳ − m·x̄
n = 5, x̄ = 3, ȳ = 6.02
Σ(xᵢ − x̄)² = 10
Σ(xᵢ − x̄)(yᵢ − ȳ) = 19.6
m = 19.6 / 10 = 1.96
b = 6.02 − (1.96)(3) = 0.14
SS_res = Σ(yᵢ − ŷᵢ)² = 0.092
SS_tot = Σ(yᵢ − ȳ)² = 38.508
r² = 1 − 0.092 / 38.508 = 0.9976
Final answer: y = 1.96x + 0.14
Per-Point Residuals
Each row compares the observed y to the line's prediction ŷ = m·x + b. Positive residuals sit above the line, negative below. OLS picks the slope and intercept so that Σ(residual)² is as small as possible.
#
x
y
ŷ (predicted)
Residual (y − ŷ)
1
1
2.1
2.1
+0
2
2
3.9
4.06
-0.16
3
3
6.2
6.02
+0.18
4
4
8.1
7.98
+0.12
5
5
9.8
9.94
-0.14
Share:
Least-Squares Slope
The slope of the line that minimizes the sum of squared vertical residuals. The numerator is the covariance of x and y (up to a 1/n factor); the denominator is the variance of x. Slope tells you how many units y is expected to change for each one-unit increase in x.
m = Σ(xᵢ − x̄)(yᵢ − ȳ) / Σ(xᵢ − x̄)²
Least-Squares Intercept
Once you have the slope, the intercept is whatever value makes the line pass through the centroid (x̄, ȳ) of the data. The fitted line always passes through the mean point — that's a built-in property of ordinary least squares.
b = ȳ − m·x̄
How It Works
Linear regression fits a straight line y = m·x + b to a set of (x, y) data points using ordinary least squares (OLS). OLS picks the slope and intercept that minimize the sum of squared vertical distances — residuals — between each observed yᵢ and the line's prediction ŷᵢ = m·xᵢ + b. The closed-form solution computes the means x̄ and ȳ, then the slope m = Σ(xᵢ − x̄)(yᵢ − ȳ) / Σ(xᵢ − x̄)² and intercept b = ȳ − m·x̄. The fit quality is summarized by r² (coefficient of determination), which equals 1 − SS_res / SS_tot: the fraction of variance in y that the line explains. r² = 1 means a perfect fit (every point on the line); r² = 0 means the line explains no more than just using ȳ as a constant prediction. Linear regression is the most-used statistical tool in science and business — it underlies calibration curves, sales forecasts, dose-response relationships, A/B test baselines, and the linear hypothesis at the foundation of more elaborate models.
Example Problem
An experiment measures these five (x, y) pairs: (1, 2.1), (2, 3.9), (3, 6.2), (4, 8.1), (5, 9.8). Fit a least-squares line and report the slope, intercept, and r².
Compute the means: x̄ = (1+2+3+4+5) / 5 = 3 and ȳ = (2.1+3.9+6.2+8.1+9.8) / 5 = 6.02.
Final line: y = 1.96·x + 0.14, with r² ≈ 0.9976 — the line explains about 99.76% of the variance in y, indicating an excellent linear fit.
Linear regression assumes the underlying relationship really is approximately linear. If your residuals show a clear pattern (curving up at the ends, fanning out, etc.) the linear model is mis-specified — even a high r² can be misleading. Always plot residuals before trusting the fit.
Key Concepts
Four ideas anchor every linear regression. First, least squares: OLS minimizes Σ(yᵢ − ŷᵢ)² — the sum of squared vertical residuals. It does not minimize horizontal or perpendicular distances, and it's especially sensitive to outliers because squaring amplifies large deviations. Second, correlation versus regression: Pearson's r measures the strength and direction of a linear association between x and y, while regression produces an equation that predicts y from x. The two are related: r² (the regression's goodness of fit) equals Pearson's r squared. Third, r² interpretation: r² is the fraction of variance in y that the model explains. An r² of 0.95 means 95% of y's variability is accounted for by x; the remaining 5% comes from noise, measurement error, or omitted variables. A high r² does not imply causation — x and y may both be driven by a third variable. Fourth, residual analysis: a good linear fit should produce residuals that look like random noise, with no pattern when plotted against x or ŷ. Curvature in the residuals signals that a non-linear model fits better; widening spread (heteroscedasticity) violates OLS's equal-variance assumption.
Applications
Calibration curves — analytical chemistry and instrument calibration use linear regression to convert raw signal (absorbance, voltage, peak area) into concentration via a fitted line through standard reference samples.
Sales and demand forecasting — regression of historical sales on time, price, or advertising spend produces trend lines used for budget planning and inventory.
Dose-response and biostatistics — linear models of drug dose versus measured effect underpin clinical trial baseline analyses (more sophisticated non-linear models extend from there).
A/B testing baselines — regression-adjusted estimators of treatment effects use linear models to remove variance explained by pre-treatment covariates, tightening confidence intervals.
Engineering — stress-strain curves in elastic regions, beam deflection, and many empirical equations in heat transfer and fluid mechanics start as linear fits to experimental data.
Sports analytics — fitting batting averages against on-base percentage, or expected goals against shot characteristics, builds the linear baselines that more elaborate models extend.
Real estate — hedonic pricing models regress sale price on square footage, bedrooms, and location features to estimate per-unit prices and detect over- or under-valued listings.
Common Mistakes
Using r² to validate a non-linear relationship — a U-shaped or exponential curve can still produce a moderate r² with a fitted line, but the line is meaningless. Always plot the data (and the residuals) before trusting the fit.
Extrapolating outside the data range — the line is only known to fit within the x values you observed. Predictions at x values far beyond the range can be wildly wrong if the true relationship curves.
Ignoring outliers — OLS squares the residuals, so a single extreme point can pull the slope dramatically. Inspect residuals; consider robust regression or removing measurement errors after investigation.
Confusing correlation with causation — high r² between x and y does not mean x causes y. A confounding variable (or pure coincidence) can produce strong linear association without any causal link.
Forgetting the regression-toward-the-mean effect — selecting points by extreme x and fitting a line to predict y will systematically over-fit and underestimate variability on new data.
Mixing x and y axes — regressing y on x and regressing x on y produce different slopes (only equal when r² = 1). Decide in advance which variable is the predictor and which is the response.
Treating r² as a complete summary — two datasets with the same r² can look completely different (Anscombe's quartet). Always pair the r² with a scatter plot and a residual plot.
Frequently Asked Questions
What is linear regression?
Linear regression is a statistical method for fitting a straight line to a set of (x, y) data points. The fitted line y = m·x + b lets you predict y for a given x and quantifies how strongly x and y move together. Ordinary least squares (OLS) is the standard fitting method: it picks the slope and intercept that minimize the sum of squared vertical distances between each observed point and the line.
How do you calculate slope in linear regression?
Compute the means x̄ and ȳ of your x and y values. Then slope m = Σ(xᵢ − x̄)(yᵢ − ȳ) / Σ(xᵢ − x̄)². The numerator is the sum of products of x and y deviations from their means (the sample covariance up to a 1/n factor); the denominator is the sum of squared x deviations (the sample variance of x up to 1/n). Once you have the slope, the intercept is b = ȳ − m·x̄.
What is r-squared?
r² (the coefficient of determination) is the fraction of variance in y that your regression line explains. It equals 1 − SS_res / SS_tot, where SS_res = Σ(yᵢ − ŷᵢ)² is the residual sum of squares and SS_tot = Σ(yᵢ − ȳ)² is the total sum of squares. r² ranges from 0 (the line is no better than predicting ȳ for every x) to 1 (every point lies exactly on the line). For simple linear regression, r² is also the square of Pearson's correlation coefficient r.
What is the formula for linear regression?
The model is y = m·x + b. The least-squares estimates are m = Σ(xᵢ − x̄)(yᵢ − ȳ) / Σ(xᵢ − x̄)² and b = ȳ − m·x̄, where x̄ and ȳ are the sample means. Goodness of fit is summarized by r² = 1 − Σ(yᵢ − ŷᵢ)² / Σ(yᵢ − ȳ)², the fraction of y's variance explained by the line.
What's the difference between regression and correlation?
Correlation (Pearson's r) measures the strength and direction of a linear association between two variables on a scale from −1 to +1. Regression produces an equation that predicts one variable from the other and quantifies the rate of change (slope). They're related: for simple linear regression, r² equals Pearson's r squared. Correlation is symmetric (corr(x, y) = corr(y, x)), but regression is not — regressing y on x produces a different slope than regressing x on y, unless r² = 1.
How do you fit a line to data?
The standard method is ordinary least squares: find the slope m and intercept b that minimize Σ(yᵢ − (m·xᵢ + b))². There's a closed-form solution: compute x̄ and ȳ, then m = Σ(xᵢ − x̄)(yᵢ − ȳ) / Σ(xᵢ − x̄)² and b = ȳ − m·x̄. Plot the data first to make sure a linear model is sensible — OLS will produce a line for any data, but the line is only meaningful if the underlying relationship is approximately linear.
When should you not use linear regression?
Skip a linear fit when the data clearly curves (use polynomial or nonlinear regression instead), when residuals fan out as x increases (heteroscedasticity violates OLS assumptions — consider weighted least squares), when one or two extreme outliers dominate (use robust regression like Theil–Sen or Huber loss), or when the response is binary or count-valued (use logistic or Poisson regression). A high r² alone does not validate a linear model — always inspect the scatter plot and residual plot.
Can r-squared be negative?
Not for ordinary least squares applied to the data it was fit on — by construction OLS gives the minimum SS_res, which is at worst equal to SS_tot (yielding r² = 0). r² can be negative on a hold-out set if you apply a previously fit model to new data and it performs worse than just predicting the mean, but for in-sample OLS the answer is r² ≥ 0.
Reference: Ordinary least squares linear regression is the standard introductory statistics technique, defined identically in every textbook (e.g., Devore, Probability and Statistics for Engineering and the Sciences; Wasserman, All of Statistics; Hastie, Tibshirani, and Friedman, The Elements of Statistical Learning). The slope, intercept, and r² formulas given here are the closed-form OLS solutions and are consistent with NIST/SEMATECH and SciPy/statsmodels reference implementations.
Linear Regression Formula
Ordinary least squares fits the line y = m·x + b that minimizes the sum of squared vertical residuals. The closed-form solution gives you the slope and intercept in two short calculations:
m = Σ(xᵢ − x̄)(yᵢ − ȳ) / Σ(xᵢ − x̄)²
b = ȳ − m·x̄
r² = 1 − Σ(yᵢ − ŷᵢ)² / Σ(yᵢ − ȳ)²
Where:
m — the slope of the best-fit line, in y-units per x-unit.
b — the y-intercept, where the line crosses the y-axis (x = 0).
x̄, ȳ — the arithmetic means of the x and y values.
ŷᵢ — the fitted value at xᵢ, equal to m·xᵢ + b.
r² — the coefficient of determination: the fraction of variance in y that the line explains, from 0 (no linear relationship) to 1 (perfect fit).
The fitted line always passes through the centroid (x̄, ȳ) — that comes directly from the intercept formula b = ȳ − m·x̄. The residuals (vertical distances from each point to the line) sum to zero by construction.
Worked Examples
Lab Calibration Curve
How do you fit a calibration line through five standard samples?
A spectrophotometer measures absorbance versus known concentrations of a dye standard: (1, 0.21), (2, 0.39), (3, 0.62), (4, 0.81), (5, 0.98). Fit a calibration line and report the slope, intercept, and r².
x̄ = 3, ȳ = 0.602
Σ(xᵢ − x̄)² = 10, Σ(xᵢ − x̄)(yᵢ − ȳ) = 1.96
m = 1.96 / 10 = 0.196, b = 0.602 − 0.196·3 = 0.014
Use the slope to convert future unknown samples to concentration: concentration = (absorbance − 0.014) / 0.196. A small non-zero intercept usually indicates blank-correction error and is acceptable when |b| ≪ typical absorbance.
Sales Trend
How do you project quarterly revenue from six quarters of data?
Quarterly revenue ($M) over the last six quarters: Q1=2.4, Q2=2.7, Q3=3.1, Q4=3.5, Q5=3.8, Q6=4.2. What's the underlying trend and the projection for Q7?
x = quarter number, y = revenue. x̄ = 3.5, ȳ ≈ 3.283
Σ(xᵢ − x̄)² = 17.5, Σ(xᵢ − x̄)(yᵢ − ȳ) ≈ 6.35
m ≈ 6.35 / 17.5 ≈ 0.363, b ≈ 3.283 − 0.363·3.5 ≈ 2.013
r² ≈ 0.998 — revenue is growing nearly perfectly linearly
Linear regression assumes the trend continues — be cautious about projecting more than a few periods ahead. If revenue growth is actually accelerating or decelerating, a non-linear model fits better.
Outlier Sensitivity
What happens to the fit when one wild outlier joins clean data?
Take the canonical five-point dataset (1, 2.1), (2, 3.9), (3, 6.2), (4, 8.1), (5, 9.8) and add a single outlier at (10, 100). How does the fit change?
Without the outlier: m ≈ 1.96, b ≈ 0.14, r² ≈ 0.998
With the outlier added: x̄ jumps from 3 to ≈ 4.17, ȳ from 6.02 to ≈ 21.7
New slope m ≈ 10.0 — five times larger than before
New intercept b ≈ −20 — flipped sign
r² collapses from ≈ 0.998 to ≈ 0.89 — still high because the outlier dominates the variance
Outlier flips the slope from 1.96 to ≈ 10 and drops r².
OLS squares the residuals, so a single point ten units off the line contributes 100× the influence of a point one unit off. Always inspect residuals and consider robust regression (e.g., Theil–Sen, Huber loss) when extreme observations are present.