Chapter 4
Differential Calculus and Its Uses
4.5 The Chain Rule
4.5.3 Differentiating a Function-of-a-Function:
The Chain Rule
Now that we have a formula for differentiating the square root function, we need to extend that to a formula for the derivative of `sqrt(u)`, where `u` is itself a function of `x`. Actually, square root has little to do with this step — this is a problem that we will encounter over and over with many different functions. If we write `y=sqrt(u)`, then our problem takes this form: `y` is a function of `u`, and `u` is a function of `x`. Therefore, `y` is a function of `x` also. How do we find `dytext[/]dx` when we know `dytext[/]du` and `dutext[/]dx`?
Let's relate this question to something we have done before. In Chapter 2 we saw that, for any constant `k`,
If we set `u=kx` and `y=e^u`, then this calculation takes the form
. |
We show next that the formula
holds for any function, not just the exponential function, and for any dependence of `u` on `x`. In words, this says that the rate of change of `y` as a function of `x` is the rate of change of `y` as a function of `u` times the rate of change of `u` as a function of `x`.
Suppose we fix a number `x` at which we want to know `dytext[/]dx`, and we compute an approximating difference quotient for a small increment `Delta x`. We write simply `u` for the value of `u` at `x` and `u+Delta u` for the value at `x+Delta x`. That is, `Delta u` is the corresponding increment in the intermediate variable. Similarly, we write `y` for the value of the outer variable at `x` and `y+Delta y` for the value at `x+Delta x`, so `Delta y` is the corresponding increment in the outer variable. Then `dytext[/]dx` is approximated by `Delta ytext[/]Delta x`, and simple algebra tells us that
We may not know yet what `du` is — or why it appears to cancel in the derivative formula — but `Delta u` is an ordinary numerical quantity, so, whenever it is not zero, it is subject to the algebraic cancellation law.
The two factors `Delta ytext[/]Delta u` and `Delta utext[/]Delta x` in the last equation approximate, respectively, the rate of change of `y` with respect to `u` and the rate of change of `u` with respect to `x`. Furthermore, the approximations to instantaneous rates of change all get better as the increment in `x` shrinks to zero, so when we take limiting values of all three quotients, we find
as predicted.
Simple as it is, this equation is perhaps the most important formula of differential calculus, because so many other formulas and calculations depend on it. Important results have names — the name of this one is the Chain Rule. It is called that because it tells us how to differentiate chains of functions, i.e., how to find `dytext[/]dx` when `y` is a function of `u` and `u` is a function of `x`.
The Chain Rule If `y` is a function of `u` and `u` is a function of `x`, then In functional notation, if `u=gtext[(]xtext[)]` and `y=ftext[(]utext[)]=ftext[(]gtext[(]xtext[))]`, then |
Calculate the derivative of `sqrt(p^2+x^2)`, where `p` is a constant.
Solution As we did with the Product Rule, we solve the problem in both functional and variable notation, this time to make the same point: You don't have to do this. But the two calculations proceed slightly differently in terms of what you have to think about and when.
Our function `sqrt(p^2+x^2)` is a composite of the square root function, say, `ftext[(]utext[)]= sqrt(u)`, and a polynomial function, say, `gtext[(]x text[)]=p^2+x^2`. The derivatives of these functions are, respectively, `f' text[(]utext[)]= 1 text[/] (2 sqrt(u))` and `g' text[(]x text[)]=2x`. The Chain Rule tells us to evaluate `f'` at `gtext[(]x text[)]` and multiply the result by `g' text[(]x text[)]`:
If we set `u=p^2+x^2` and `y=sqrt(u)=sqrt(p^2+x^2)`, then `du text[/] dx=2x`, and `dy text[/] du=1 text[/] (2 sqrt(u))`, so

There is not a great deal of difference between these two ways to solve the problem except in terms of when you have to think about the fact that `f'` must be evaluated at `u=gtext[(]x text[)]`. The Chain Rule appears to be simpler in variable notation, but the simpler notation disguises the fact that the answer has to be in terms of the sole independent variable, `x.` Thus, the calculation is not finished until you replace the “intermediate” variable `u` by its equivalent as a function of `x.`