# Linear Regression

- Minimisation of the sum of the squared distance from the data points and the proposed line is achieved by calculating the derivative with respect to $a$ and $b$ and setting these to zero.

#### Solution

For a function $y(x) = a + b x$, define: $s_{xx} = \sum_{i=0}^{N-1} (x_i - \bar x)^2$, $s_{yy} = \sum_{i=0}^{N-1} (y_i - \bar y)^2$, $s_{xy} = \sum_{i=0}^{N-1} (x_i - \bar x)(y_i - \bar y)$, then

$$ b = \frac{s_{xy}}{s_{xx}}$$
$$ a = \bar y - b \bar x $$
Regression coefficient:
$$ r = \frac{s_{xy}}{\sqrt{s_{xx}s_{yy}}}$$

- $r$ is 0 if there is no linear trend, 1 for perfect linear fit.
- This discussion assumes there is no known variance for the x and y values. There are solutions which can take this into account, this is particularly important if some values are known with less error than others.
- The solution above requires that the slope is not infinite, $S_{xx}$ is not zero.