Skip to main content\(\newenvironment{mat}{\left[\begin{array}}{\end{array}\right]}
\newcommand{\colvec}[1]{\left[\begin{matrix}#1 \end{matrix}\right]}
\newcommand{\rowvec}[1]{[\begin{matrix} #1 \end{matrix}]}
\newcommand{\definiteintegral}[4]{\int_{#1}^{#2}\,#3\,d#4}
\newcommand{\indefiniteintegral}[2]{\int#1\,d#2}
\def\u{\underline}
\def\summ{\sum\limits}
\newcommand{\lt}{ < }
\newcommand{\gt}{ > }
\newcommand{\amp}{ & }
\)
Given \(n\) data pairs \(\{(x_i,y_i)\}_{i \in 1,\ldots,n}\), which we believe can be
well-modeled by a straight line graph.
Let's use multivariate calculus to derive the parameters \(a\)
and \(b\) for the best fit line, using as our criterion the line that
minimizes the sum of squared errors:
\begin{equation*}
S(a,b)=\summ_{i=1}^n \left( y_i-(a+b x_i)\right)^2
\end{equation*}
Notice that the variables in this expression are \(a\) and \(b\):
all the subscripted stuff is data. The only two things we don't know are
the slope \(b\) and the intercept \(a\) of the line. Each choice of
those parameters gives a different value to \(S(a,b)\), and we seek
the choice that gives us the smallest value of \(S(a,b)\).
Now, in analogy to univariate calculus, we want to differentiate
by \(a\) and by \(b\), and set those derivatives to 0, in order to
find an extremum (a minumum, in this case).
So as not to get bogged down in new notation, I'll just use the old
"prime" notation, as long as you all know which variable the derivative
refers to. So, if we differentiate \(S\) with respect to \(a\),
calling that \(S_a\), we might write
\begin{equation*}S_a(a,b)=
\left\{
\begin{array}{c}
\left[\summ_{i=1}^n \left( y_i-(a+b x_i)\right)^2\right]' \cr
\summ_{i=1}^n \left[\left( y_i-(a+b x_i)\right)^2\right]' \cr
\summ_{i=1}^n 2 \left( y_i-(a+b x_i)\right)\cdot (-1) \cr
-2 \summ_{i=1}^n \left( y_i-(a+b x_i)\right)
\end{array}
\right.
\end{equation*}
So, to summarize,
- the derivative of a sum
- is the sum of the derivatives;
- use the chain rule on each term, and then
- the factoring property of summation.
In order to minimize this expression, we set it to zero:
\begin{equation*}0=
\left\{
\begin{array}{c}
-2 \summ_{i=1}^n \left( y_i-(a+b x_i)\right) \cr
\summ_{i=1}^n \left( y_i-(a+b x_i)\right) \cr
\summ_{i=1}^n y_i-\summ_{i=1}^n (a+b x_i) \cr
\summ_{i=1}^n y_i-\summ_{i=1}^n a -\summ_{i=1}^n b x_i \cr
\summ_{i=1}^n y_i-n a -\summ_{i=1}^n b x_i \cr
\summ_{i=1}^n y_i-n a - b\summ_{i=1}^n x_i \cr
\frac{1}{n}\summ_{i=1}^n y_i- a - b\frac{1}{n}\summ_{i=1}^n x_i \cr
\overline{y} - a - b\overline{x}
\end{array}
\right.
\end{equation*}
where \(\overline{y} - a - b\overline{x}=0\), or
\(\overline{y} = a + b\overline{x}\), tells us that the "center of
mass" of the points, \((\overline{x},\overline{y})\), falls on the
regression line of best fit. That's an interesting thing!
But it's not enough for us to figure out \(a\) and \(b\): we need a
second point, or a second equation. For that, we differentiate with
respect to \(b\):
\begin{equation*}S_b(a,b)=
\left\{
\begin{array}{c}
\left[\summ_{i=1}^n \left( y_i-(a+b x_i)\right)^2\right]' \cr
\summ_{i=1}^n \left[\left( y_i-(a+b x_i)\right)^2\right]' \cr
\summ_{i=1}^n 2 \left( y_i-(a+b x_i)\right)\cdot (-x_i) \cr
-2 \summ_{i=1}^n \left( y_i-(a+b x_i)\right)x_i
\end{array}
\right.
\end{equation*}
Once again we set this to zero, and so
\begin{equation*}0=
\left\{
\begin{array}{c}
-2 \summ_{i=1}^n \left( y_i-(a+b x_i)\right)x_i\cr
\summ_{i=1}^n \left( y_i-(a+b x_i)\right)x_i\cr
\summ_{i=1}^n y_ix_i- \summ_{i=1}^n (a+b x_i)x_i\cr
\summ_{i=1}^n y_ix_i-\summ_{i=1}^n ax_i - \summ_{i=1}^n b x_i^2\cr
\summ_{i=1}^n y_ix_i- a \summ_{i=1}^n x_i - b \summ_{i=1}^n x_i^2
\end{array}
\right.
\end{equation*}
So this provides a second equation:
\begin{equation*}
\summ_{i=1}^n y_ix_i = a \summ_{i=1}^n x_i + b \summ_{i=1}^n x_i^2
\end{equation*}
From the first equation we can write
\begin{equation*}
a=\overline{y} - b\overline{x}
\end{equation*}
Plugging that into the second equation, we can solve for \(b\), to get
\begin{equation*}
\summ_{i=1}^n y_ix_i = (\overline{y} - b\overline{x}) \summ_{i=1}^n x_i + b \summ_{i=1}^n x_i^2
\end{equation*}
or
\begin{equation*}
b=
\frac{\summ_{i=1}^n y_ix_i - \overline{y}\summ_{i=1}^n x_i}
{\summ_{i=1}^n x_i^2 - \overline{x}\summ_{i=1}^n x_i}
\end{equation*}
which, you might notice, is the same as
\begin{equation*}
b=
\frac{\summ_{i=1}^n x_iy_i - n\overline{x}\overline{y}}
{\summ_{i=1}^n x_i^2 - n\overline{x}^2}
\end{equation*}