Skip to main content
\(\newenvironment{mat}{\left[\begin{array}}{\end{array}\right]} \newcommand{\colvec}[1]{\left[\begin{matrix}#1 \end{matrix}\right]} \newcommand{\rowvec}[1]{[\begin{matrix} #1 \end{matrix}]} \newcommand{\definiteintegral}[4]{\int_{#1}^{#2}\,#3\,d#4} \newcommand{\indefiniteintegral}[2]{\int#1\,d#2} \def\u{\underline} \def\summ{\sum\limits} \newcommand{\lt}{ < } \newcommand{\gt}{ > } \newcommand{\amp}{ & } \)

Section9Multivariate Calculus

Given \(n\) data pairs \(\{(x_i,y_i)\}_{i \in 1,\ldots,n}\), which we believe can be well-modeled by a straight line graph.

Let's use multivariate calculus to derive the parameters \(a\) and \(b\) for the best fit line, using as our criterion the line that minimizes the sum of squared errors: \begin{equation*} S(a,b)=\summ_{i=1}^n \left( y_i-(a+b x_i)\right)^2 \end{equation*} Notice that the variables in this expression are \(a\) and \(b\): all the subscripted stuff is data. The only two things we don't know are the slope \(b\) and the intercept \(a\) of the line. Each choice of those parameters gives a different value to \(S(a,b)\), and we seek the choice that gives us the smallest value of \(S(a,b)\).

Now, in analogy to univariate calculus, we want to differentiate by \(a\) and by \(b\), and set those derivatives to 0, in order to find an extremum (a minumum, in this case).

So as not to get bogged down in new notation, I'll just use the old "prime" notation, as long as you all know which variable the derivative refers to. So, if we differentiate \(S\) with respect to \(a\), calling that \(S_a\), we might write

\begin{equation*}S_a(a,b)= \left\{ \begin{array}{c} \left[\summ_{i=1}^n \left( y_i-(a+b x_i)\right)^2\right]' \cr \summ_{i=1}^n \left[\left( y_i-(a+b x_i)\right)^2\right]' \cr \summ_{i=1}^n 2 \left( y_i-(a+b x_i)\right)\cdot (-1) \cr -2 \summ_{i=1}^n \left( y_i-(a+b x_i)\right) \end{array} \right. \end{equation*}

So, to summarize,
  • the derivative of a sum
  • is the sum of the derivatives;
  • use the chain rule on each term, and then
  • the factoring property of summation.

In order to minimize this expression, we set it to zero:

\begin{equation*}0= \left\{ \begin{array}{c} -2 \summ_{i=1}^n \left( y_i-(a+b x_i)\right) \cr \summ_{i=1}^n \left( y_i-(a+b x_i)\right) \cr \summ_{i=1}^n y_i-\summ_{i=1}^n (a+b x_i) \cr \summ_{i=1}^n y_i-\summ_{i=1}^n a -\summ_{i=1}^n b x_i \cr \summ_{i=1}^n y_i-n a -\summ_{i=1}^n b x_i \cr \summ_{i=1}^n y_i-n a - b\summ_{i=1}^n x_i \cr \frac{1}{n}\summ_{i=1}^n y_i- a - b\frac{1}{n}\summ_{i=1}^n x_i \cr \overline{y} - a - b\overline{x} \end{array} \right. \end{equation*}

where \(\overline{y} - a - b\overline{x}=0\), or \(\overline{y} = a + b\overline{x}\), tells us that the "center of mass" of the points, \((\overline{x},\overline{y})\), falls on the regression line of best fit. That's an interesting thing!

But it's not enough for us to figure out \(a\) and \(b\): we need a second point, or a second equation. For that, we differentiate with respect to \(b\):

\begin{equation*}S_b(a,b)= \left\{ \begin{array}{c} \left[\summ_{i=1}^n \left( y_i-(a+b x_i)\right)^2\right]' \cr \summ_{i=1}^n \left[\left( y_i-(a+b x_i)\right)^2\right]' \cr \summ_{i=1}^n 2 \left( y_i-(a+b x_i)\right)\cdot (-x_i) \cr -2 \summ_{i=1}^n \left( y_i-(a+b x_i)\right)x_i \end{array} \right. \end{equation*}

Once again we set this to zero, and so

\begin{equation*}0= \left\{ \begin{array}{c} -2 \summ_{i=1}^n \left( y_i-(a+b x_i)\right)x_i\cr \summ_{i=1}^n \left( y_i-(a+b x_i)\right)x_i\cr \summ_{i=1}^n y_ix_i- \summ_{i=1}^n (a+b x_i)x_i\cr \summ_{i=1}^n y_ix_i-\summ_{i=1}^n ax_i - \summ_{i=1}^n b x_i^2\cr \summ_{i=1}^n y_ix_i- a \summ_{i=1}^n x_i - b \summ_{i=1}^n x_i^2 \end{array} \right. \end{equation*}

So this provides a second equation: \begin{equation*} \summ_{i=1}^n y_ix_i = a \summ_{i=1}^n x_i + b \summ_{i=1}^n x_i^2 \end{equation*}

From the first equation we can write

\begin{equation*} a=\overline{y} - b\overline{x} \end{equation*}

Plugging that into the second equation, we can solve for \(b\), to get

\begin{equation*} \summ_{i=1}^n y_ix_i = (\overline{y} - b\overline{x}) \summ_{i=1}^n x_i + b \summ_{i=1}^n x_i^2 \end{equation*}

or

\begin{equation*} b= \frac{\summ_{i=1}^n y_ix_i - \overline{y}\summ_{i=1}^n x_i} {\summ_{i=1}^n x_i^2 - \overline{x}\summ_{i=1}^n x_i} \end{equation*}

which, you might notice, is the same as

\begin{equation*} b= \frac{\summ_{i=1}^n x_iy_i - n\overline{x}\overline{y}} {\summ_{i=1}^n x_i^2 - n\overline{x}^2} \end{equation*}