All notes
Regressi

# Linear Regression

• Minimisation of the sum of the squared distance from the data points and the proposed line is achieved by calculating the derivative with respect to $a$ and $b$ and setting these to zero.

#### Solution

For a function $y(x) = a + b x$, define: $s_{xx} = \sum_{i=0}^{N-1} (x_i - \bar x)^2$, $s_{yy} = \sum_{i=0}^{N-1} (y_i - \bar y)^2$, $s_{xy} = \sum_{i=0}^{N-1} (x_i - \bar x)(y_i - \bar y)$, then
$$b = \frac{s_{xy}}{s_{xx}}$$ $$a = \bar y - b \bar x$$ Regression coefficient: $$r = \frac{s_{xy}}{\sqrt{s_{xx}s_{yy}}}$$

• $r$ is 0 if there is no linear trend, 1 for perfect linear fit.
• This discussion assumes there is no known variance for the x and y values. There are solutions which can take this into account, this is particularly important if some values are known with less error than others.
• The solution above requires that the slope is not infinite, $S_{xx}$ is not zero.

# Curve fitting with polynomials

Problem: find a polynomial $f(x)$ that passes through the N points $(x_0,y_0), (x_1,y_1), (x_2,y_2), ..... (x_{N-1}, y_{N-1})$.
General solution: $$f(x) = \sum_{i=0}^{N-1} y_i \Pi_{j=0, j\neq i}^{N-1} \frac{x-x_j}{x_i-x_j}$$

Example:

x1 y1
x2 y2
x3 y3
x4 y4

The following is a general method of making a function that passes through any pair of values (xi,yi).
        y1 (x-x2) (x-x3) (x-x4)
f(x) = ---------------------------
(x1-x2) (x1-x3) (x1-x4)

y2 (x-x1) (x-x3) (x-x4)
+ ---------------------------
(x2-x1) (x2-x3) (x2-x4)

y3 (x-x1) (x-x2) (x-x4)
+ ---------------------------
(x3-x1) (x3-x2) (x3-x4)

y4 (x-x1) (x-x2) (x-x3)
+ ---------------------------
(x4-x1) (x4-x2) (x4-x3)

etc etc. As you can see, at x=x1 all the terms disappear except the first which equals y1, at x=x2 all the terms disappear except the second which equals y2, etc etc.