# The Chain Rule for Functions of Multiple Variables

Learning Objectives

• State the chain rules for one or two independent variables.
• Use tree diagrams as an aid to understanding the chain rule for several independent and intermediate variables.
• Perform implicit differentiation of a function of two or more variables.

In single-variable calculus, we found that one of the most useful differentiation rules is the chain rule, which allows us to find the derivative of the composition of two functions. The same thing is true for multivariable calculus, but this time we have to deal with more than one form of the chain rule. In this section, we study extensions of the chain rule and learn how to take derivatives of compositions of functions of more than one variable.

## Chain Rules for One or Two Independent Variables

Recall that the chain rule for the derivative of a composite of two functions can be written in the form

[dfrac{d}{dx}(f(g(x)))=f′(g(x))g′(x).]

In this equation, both (displaystyle f(x)) and (displaystyle g(x)) are functions of one variable. Now suppose that (displaystyle f) is a function of two variables and (displaystyle g) is a function of one variable. Or perhaps they are both functions of two variables, or even more. How would we calculate the derivative in these cases? The following theorem gives us the answer for the case of one independent variable.

Chain Rule for One Independent Variable

Suppose that (displaystyle x=g(t)) and (displaystyle y=h(t)) are differentiable functions of (displaystyle t) and (displaystyle z=f(x,y)) is a differentiable function of (displaystyle x) and (displaystyle y). Then (displaystyle z=f(x(t),y(t))) is a differentiable function of (displaystyle t) and

[dfrac{dz}{dt}=dfrac{∂z}{∂x}⋅dfrac{dx}{dt}+dfrac{∂z}{∂y}⋅dfrac{dy}{dt}, label{chain1}]

where the ordinary derivatives are evaluated at (displaystyle t) and the partial derivatives are evaluated at (displaystyle (x,y)).

Proof

The proof of this theorem uses the definition of differentiability of a function of two variables. Suppose that f is differentiable at the point (displaystyle P(x_0,y_0),) where (displaystyle x_0=g(t_0)) and (displaystyle y_0=h(t_0)) for a fixed value of (displaystyle t_0). We wish to prove that (displaystyle z=f(x(t),y(t))) is differentiable at (displaystyle t=t_0) and that Equation ef{chain1} holds at that point as well.

Since (displaystyle f) is differentiable at (displaystyle P), we know that

[z(t)=f(x,y)=f(x_0,y_0)+f_x(x_0,y_0)(x−x_0)+f_y(x_0,y_0)(y−y_0)+E(x,y), onumber]

where

[ lim_{(x,y)→(x_0,y_0)}dfrac{E(x,y)}{sqrt{(x−x_0)^2+(y−y_0)^2}}=0. onumber]

We then subtract (displaystyle z_0=f(x_0,y_0)) from both sides of this equation:

[ egin{align*} z(t)−z(t_0) =f(x(t),y(t))−f(x(t_0),y(t_0)) [4pt] =f_x(x_0,y_0)(x(t)−x(t_0))+f_y(x_0,y_0)(y(t)−y(t_0))+E(x(t),y(t)). end{align*}]

Next, we divide both sides by (displaystyle t−t_0):

[z(t)−z(t_0)t−t_0=fx(x_0,y_0)(x(t)−x(t_0)t−t_0)+f_y(x_0,y_0)(y(t)−y(t_0)t−t_0)+E(x(t),y(t))t−t_0. onumber]

Then we take the limit as (displaystyle t) approaches (displaystyle t_0):

[egin{align*} lim_{t→t_0}dfrac{z(t)−z(t_0)}{t−t_0} = f_x(x_0,y_0)lim_{t→t_0} left (dfrac{x(t)−x(t_0)}{t−t_0} ight) [4pt] +f_y(x_0,y_0)lim_{t→t_0}left (dfrac{y(t)−y(t_0)}{t−t_0} ight)[4pt] +lim_{t→t_0}dfrac{E(x(t),y(t))}{t−t_0}. end{align*}]

The left-hand side of this equation is equal to (displaystyle dz/dt), which leads to

[dfrac{dz}{dt}=f_x(x_0,y_0)dfrac{dx}{dt}+f_y(x_0,y_0)dfrac{dy}{dt}+lim_{t→t_0}dfrac{E(x(t),y(t))}{t−t_0}. onumber]

The last term can be rewritten as

[egin{align*} lim_{t→t_0}dfrac{E(x(t),y(t))}{t−t_0} =lim_{t→t_0}dfrac{(E(x,y)}{sqrt{(x−x_0)^2+(y−y_0)^2}}dfrac{sqrt{(x−x_0)^2+(y−y_0)^2}}{t−t_0}) [4pt] =lim_{t→t_0}left(dfrac{E(x,y)}{sqrt{(x−x_0)^2+(y−y_0)^2}} ight)lim_{t→t_0}left(dfrac{sqrt{(x−x_0)^2+(y−y_0)^2}}{t−t_0} ight). end{align*} ]

As (displaystyle t) approaches (displaystyle t_0, (x(t),y(t))) approaches (displaystyle (x(t_0),y(t_0)),) so we can rewrite the last product as

[displaystyle lim_{(x,y)→(x_0,y_0)}dfrac{(E(x,y)}{sqrt{(x−x_0)^2+(y−y_0)^2}}lim_{(x,y)→(x_0,y_0)}(dfrac{sqrt{(x−x_0)^2+(y−y_0)^2}}{t−t_0}). onumber]

Since the first limit is equal to zero, we need only show that the second limit is finite:

[ egin{align*} lim_{(x,y)→(x_0,y_0)} dfrac{sqrt{ (x−x_0)^2+(y−y_0)^2 }} {t−t+0} =lim_{(x,y)→(x_0,y_0)} sqrt{ dfrac { (x−x_0)^2+(y−y_0)^2 } {(t−t_0)^2} } [4pt] =lim_{(x,y)→(x_0,y_0)}sqrt{ left(dfrac{x−x_0}{t−t_0} ight)^2+left(dfrac{y−y_0}{t−t_0} ight)^2} [4pt] =sqrt{ left[lim_{(x,y)→(x_0,y_0)} left(dfrac{x−x_0}{t−t_0} ight) ight]^2+left[lim_{(x,y)→(x_0,y_0)} left(dfrac{y−y_0}{t−t_0} ight) ight]^2}. end{align*} ]

Since (displaystyle x(t)) and (displaystyle y(t)) are both differentiable functions of (displaystyle t), both limits inside the last radical exist. Therefore, this value is finite. This proves the chain rule at (displaystyle t=t_0); the rest of the theorem follows from the assumption that all functions are differentiable over their entire domains.

Closer examination of Equation ef{chain1} reveals an interesting pattern. The first term in the equation is (displaystyle dfrac{∂f}{∂x} cdot dfrac{dx}{dt}) and the second term is (displaystyle dfrac{∂f}{∂y}⋅dfrac{dy}{dt}). Recall that when multiplying fractions, cancelation can be used. If we treat these derivatives as fractions, then each product “simplifies” to something resembling (displaystyle ∂f/dt). The variables (displaystyle x) and (displaystyle y) that disappear in this simplification are often called intermediate variables: they are independent variables for the function (displaystyle f), but are dependent variables for the variable (displaystyle t). Two terms appear on the right-hand side of the formula, and (displaystyle f) is a function of two variables. This pattern works with functions of more than two variables as well, as we see later in this section.

Example (PageIndex{1}): Using the Chain Rule

Calculate (displaystyle dz/dt) for each of the following functions:

1. (displaystyle z=f(x,y)=4x^2+3y^2,x=x(t)=sin t,y=y(t)=cos t)
2. (displaystyle z=f(x,y)=sqrt{x^2−y^2},x=x(t)=e^{2t},y=y(t)=e^{−t})

Solution

a. To use the chain rule, we need four quantities—(displaystyle ∂z/∂x,∂z/∂y,dx/dt), and (displaystyle dy/dt):

• (displaystyle dfrac{∂z}{∂x}=8x)
• (displaystyle dfrac{dx}{dt}=cos t)
• (displaystyle dfrac{∂z}{∂y}=6y)
• (displaystyle dfrac{dy}{dt}=−sin t)

Now, we substitute each of these into Equation ef{chain1}:

[dfrac{dz}{dt}=dfrac{partial z}{partial x} cdot dfrac{dx}{dt}+dfrac{partial z}{partial y} cdot dfrac{dy}{dt}=(8x)(cos t)+(6y)(−sin t)=8xcos t−6ysin t. onumber]

This answer has three variables in it. To reduce it to one variable, use the fact that (displaystyle x(t)=sin t) and (y(t)=cos t.) We obtain

[displaystyle dfrac{dz}{dt}=8xcos t−6ysin t=8(sin t)cos t−6(cos t)sin t=2sin tcos t. onumber]

This derivative can also be calculated by first substituting (displaystyle x(t)) and (displaystyle y(t)) into (displaystyle f(x,y),) then differentiating with respect to (displaystyle t):

[displaystyle z=f(x,y)=f(x(t),y(t))=4(x(t))^2+3(y(t))^2=4sin^2 t+3cos^2 t. onumber]

Then

[displaystyle dfrac{dz}{dt}=2(4sin t)(cos t)+2(3cos t)(−sin t)=8sin tcos t−6sin tcos t=2sin tcos t, onumber]

which is the same solution. However, it may not always be this easy to differentiate in this form.

b. To use the chain rule, we again need four quantities—(displaystyle ∂z/∂x,∂z/dy,dx/dt,) and (displaystyle dy/dt:)

• (displaystyle dfrac{∂z}{∂x}=dfrac{x}{sqrt{x^2−y^2}})
• (displaystyle dfrac{dx}{dt}=2e^{2t})
• (displaystyle dfrac{∂z}{∂y}=dfrac{−y}{sqrt{x^2−y^2}})
• (displaystyle dfrac{dx}{dt}=−e^{−t}.)

We substitute each of these into Equation ef{chain1}:

[egin{align*} dfrac{dz}{dt} =dfrac{ partial z}{ partial x} cdot dfrac{dx}{dt}+dfrac{ partial z}{ partial y}cdot dfrac{dy}{dt} [4pt] =left(dfrac{x}{sqrt{x^2−y^2}} ight) (2e^{2t})+left(dfrac{−y}{sqrt{x^2−y^2}} ight) (−e^{−t}) [4pt] =dfrac{2xe^{2t}−ye^{−t}}{sqrt{x^2−y^2}}. end{align*} ]

To reduce this to one variable, we use the fact that (displaystyle x(t)=e^{2t}) and (displaystyle y(t)=e^{−t}). Therefore,

[ egin{align*} dfrac{dz}{dt} =dfrac{2xe^2t+ye^{−t}}{sqrt{x^2−y^2}} [4pt] =dfrac{2(e^{2t})e^{2t}+(e^{−t})e^{−t}}{sqrt{e^{4t}−e^{−2t}}} [4pt] =dfrac{2e^{4t}+e^{−2t}}{sqrt{e^{4t}−e^{−2t}}}. end{align*} ]

To eliminate negative exponents, we multiply the top by (displaystyle e^{2t}) and the bottom by (displaystyle sqrt{e^{4t}}):

[egin{align*} dfrac{dz}{dt} =dfrac{2e^{4t}+e^{−2t}}{sqrt{e^{4t}−e^{−2t}}}⋅dfrac{e^{2t}}{sqrt{e^{4t}}} [4pt] =dfrac{2e^{6t}+1}{sqrt{e^{8t}−e^{2t}}} [4pt] =dfrac{2e^{6t}+1}{sqrt{e^{2t}(e^{6t}−1)}} [4pt] =dfrac{2e^{6t}+1}{e^tsqrt{e^{6t}−1}}. end{align*}]

Again, this derivative can also be calculated by first substituting (displaystyle x(t)) and (displaystyle y(t)) into (displaystyle f(x,y),) then differentiating with respect to (displaystyle t):

[egin{align*} z =f(x,y) [4pt] =f(x(t),y(t)) [4pt] =sqrt{(x(t))^2−(y(t))^2} [4pt] =sqrt{e^{4t}−e^{−2t}} [4pt] =(e^{4t}−e^{−2t})^{1/2}. end{align*} ]

Then

[ egin{align*} dfrac{dz}{dt} = dfrac{1}{2} (e^{4t}−e^{−2t})^{−1/2} left(4e^{4t}+2e^{−2t} ight) [4pt] =dfrac{2e^{4t}+e^{−2t}}{sqrt{e^{4t}−e^{−2t}}}. end{align*}]

This is the same solution.

Exercise (PageIndex{1})

Calculate (dz/dt ) given the following functions. Express the final answer in terms of (displaystyle t).

[ egin{align*} z =f(x,y)=x^2−3xy+2y^2 [4pt] x =x(t)=3sin2t,y=y(t)=4cos 2t end{align*}]

Hint

Calculate (displaystyle ∂z/∂x,∂z/dy,dx/dt,) and (displaystyle dy/dt), then use Equation ef{chain1}.

(displaystyle dfrac{dz}{dt}=dfrac{∂f}{∂x}dfrac{dx}{dt}+dfrac{∂f}{∂y}dfrac{dy}{dt})

(displaystyle =(2x−3y)(6cos2t)+(−3x+4y)(−8sin2t))

(displaystyle =−92sin 2t cos 2t−72(cos ^22t−sin^22t))

(displaystyle =−46sin 4t−72cos 4t.)

It is often useful to create a visual representation of Equation for the chain rule. This is called a tree diagram for the chain rule for functions of one variable and it provides a way to remember the formula (Figure (PageIndex{1})). This diagram can be expanded for functions of more than one variable, as we shall see very shortly.

In this diagram, the leftmost corner corresponds to (displaystyle z=f(x,y)). Since (displaystyle f) has two independent variables, there are two lines coming from this corner. The upper branch corresponds to the variable (displaystyle x) and the lower branch corresponds to the variable (displaystyle y). Since each of these variables is then dependent on one variable (displaystyle t), one branch then comes from (displaystyle x) and one branch comes from (displaystyle y). Last, each of the branches on the far right has a label that represents the path traveled to reach that branch. The top branch is reached by following the (displaystyle x) branch, then the t branch; therefore, it is labeled (displaystyle (∂z/∂x)×(dx/dt).) The bottom branch is similar: first the (displaystyle y) branch, then the (displaystyle t) branch. This branch is labeled (displaystyle (∂z/∂y)×(dy/dt)). To get the formula for (displaystyle dz/dt,) add all the terms that appear on the rightmost side of the diagram. This gives us Equation.

In Note, (displaystyle z=f(x,y)) is a function of (displaystyle x) and (displaystyle y), and both (displaystyle x=g(u,v)) and (displaystyle y=h(u,v)) are functions of the independent variables (displaystyle u) and (displaystyle v).

Chain Rule for Two Independent Variables

Suppose (displaystyle x=g(u,v)) and (displaystyle y=h(u,v)) are differentiable functions of (displaystyle u) and (displaystyle v), and (displaystyle z=f(x,y)) is a differentiable function of (displaystyle x) and (displaystyle y). Then, (displaystyle z=f(g(u,v),h(u,v))) is a differentiable function of (displaystyle u) and (displaystyle v), and

[dfrac{∂z}{∂u}=dfrac{∂z}{∂x}dfrac{∂x}{∂u}+dfrac{∂z}{∂y}dfrac{∂y}{∂u} label{chain2a}]

and

[dfrac{∂z}{∂v}=dfrac{∂z}{∂x}dfrac{∂x}{∂v}+dfrac{∂z}{∂y}dfrac{∂y}{∂v}. label{chian2b}]

We can draw a tree diagram for each of these formulas as well as follows.

To derive the formula for (displaystyle ∂z/∂u), start from the left side of the diagram, then follow only the branches that end with (displaystyle u) and add the terms that appear at the end of those branches. For the formula for (displaystyle ∂z/∂v), follow only the branches that end with (displaystyle v) and add the terms that appear at the end of those branches.

There is an important difference between these two chain rule theorems. In Note, the left-hand side of the formula for the derivative is not a partial derivative, but in Note it is. The reason is that, in Note, (displaystyle z) is ultimately a function of (displaystyle t) alone, whereas in Note, (displaystyle z) is a function of both (displaystyle u) and (displaystyle v).

Example (PageIndex{2}): Using the Chain Rule for Two Variables

Calculate (displaystyle ∂z/∂u) and (displaystyle ∂z/∂v) using the following functions:

[displaystyle z=f(x,y)=3x^2−2xy+y^2,; x=x(u,v)=3u+2v,; y=y(u,v)=4u−v. onumber]

Solution

To implement the chain rule for two variables, we need six partial derivatives—(displaystyle ∂z/∂x,; ∂z/∂y,; ∂x/∂u,; ∂x/∂v,; ∂y/∂u,) and (displaystyle ∂y/∂v):

[egin{align*} dfrac{∂z}{∂x} =6x−2y dfrac{∂z}{∂y}=−2x+2y [4pt] displaystyle dfrac{∂x}{∂u} =3 dfrac{∂x}{∂v}=2 [4pt] dfrac{∂y}{∂u} =4 dfrac{∂y}{∂v}=−1. end{align*}]

To find (displaystyle ∂z/∂u,) we use Equation ef{chain2a}:

[egin{align*} dfrac{∂z}{∂u} =dfrac{∂z}{∂x}⋅dfrac{∂x}{∂u}+dfrac{∂z}{∂y}⋅dfrac{∂y}{∂u} [4pt] =3(6x−2y)+4(−2x+2y) [4pt] =10x+2y. end{align*}]

Next, we substitute (displaystyle x(u,v)=3u+2v) and (displaystyle y(u,v)=4u−v:)

[egin{align*} dfrac{∂z}{∂u} =10x+2y [4pt] =10(3u+2v)+2(4u−v) [4pt] =38u+18v. end{align*}]

To find (displaystyle ∂z/∂v,) we use Equation ef{chain2b}:

[egin{align*} dfrac{∂z}{∂v} =dfrac{∂z}{∂x}dfrac{∂x}{∂v}+dfrac{∂z}{∂y}dfrac{∂y}{∂v} [4pt] =2(6x−2y)+(−1)(−2x+2y) [4pt] =14x−6y. end{align*}]

Then we substitute (displaystyle x(u,v)=3u+2v) and (displaystyle y(u,v)=4u−v:)

[egin{align*} dfrac{∂z}{∂v} =14x−6y [4pt] =14(3u+2v)−6(4u−v) [4pt] =18u+34v end{align*}]

Exercise (PageIndex{2})

Calculate (displaystyle ∂z/∂u) and (displaystyle ∂z/∂v) given the following functions:

[ z=f(x,y)=dfrac{2x−y}{x+3y},; x(u,v)=e^{2u}cos 3v,; y(u,v)=e^{2u}sin 3v. onumber]

Hint

Calculate (displaystyle ∂z/∂x,; ∂z/∂y,; ∂x/∂u,; ∂x/∂v,; ∂y/∂u,) and (displaystyle ∂y/∂v), then use Equation ef{chain2a} and Equation ef{chain2b}.

(displaystyle dfrac{∂z}{∂u}=0,dfrac{∂z}{∂v}=dfrac{−21}{(3sin 3v+cos 3v)^2})

## The Generalized Chain Rule

Now that we’ve see how to extend the original chain rule to functions of two variables, it is natural to ask: Can we extend the rule to more than two variables? The answer is yes, as the generalized chain rule states.

Generalized Chain Rule

Let (displaystyle w=f(x_1,x_2,…,x_m)) be a differentiable function of (displaystyle m) independent variables, and for each (displaystyle i∈{1,…,m},) let (displaystyle x_i=x_i(t_1,t_2,…,t_n)) be a differentiable function of (displaystyle n) independent variables. Then

[dfrac{∂w}{∂t_j}=dfrac{∂w}{∂x_1}dfrac{∂x_1}{∂t_j}+dfrac{∂w}{∂x_2}dfrac{∂x_2}{∂t_j}+⋯+dfrac{∂w}{∂x_m}dfrac{∂x_m}{∂t_j}]

for any (displaystyle j∈{1,2,…,n}.)

In the next example we calculate the derivative of a function of three independent variables in which each of the three variables is dependent on two other variables.

Example (PageIndex{3}): Using the Generalized Chain Rule

Calculate (displaystyle ∂w/∂u) and (displaystyle ∂w/∂v) using the following functions:

[egin{align*} w =f(x,y,z)=3x^2−2xy+4z^2 [4pt] x =x(u,v)=e^usin v [4pt] y =y(u,v)=e^ucos v [4pt] z =z(u,v)=e^u. end{align*}]

Solution

The formulas for (displaystyle ∂w/∂u) and (displaystyle ∂w/∂v) are

[egin{align*} dfrac{∂w}{∂u} =dfrac{∂w}{∂x}⋅dfrac{∂x}{∂u}+dfrac{∂w}{∂y}⋅dfrac{∂y}{∂u}+dfrac{∂w}{∂z}⋅dfrac{∂z}{∂u} [4pt] dfrac{∂w}{∂v} =dfrac{∂w}{∂x}⋅dfrac{∂x}{∂v}+dfrac{∂w}{∂y}⋅dfrac{∂y}{∂v}+dfrac{∂w}{∂z}⋅dfrac{∂z}{∂v}. end{align*}]

Therefore, there are nine different partial derivatives that need to be calculated and substituted. We need to calculate each of them:

[egin{align*} dfrac{∂w}{∂x}=6x−2y dfrac{∂w}{∂y}=−2x dfrac{∂w}{∂z}=8z [4pt] dfrac{∂x}{∂u}=e^usin v dfrac{∂y}{∂u}=e^ucos v dfrac{∂z}{∂u}=e^u [4pt] dfrac{∂x}{∂v}=e^ucos v dfrac{∂y}{∂v}=−e^usin v dfrac{∂z}{∂v}=0. end{align*}]

Now, we substitute each of them into the first formula to calculate (displaystyle ∂w/∂u):

[egin{align*} dfrac{∂w}{∂u} =dfrac{∂w}{∂x}⋅dfrac{∂x}{∂u}+dfrac{∂w}{∂y}⋅dfrac{∂y}{∂u}+dfrac{∂w}{∂z}⋅dfrac{∂z}{∂u} [4pt] =(6x−2y)e^usin v−2xe^ucos v+8ze^u, end{align*}]

then substitute (displaystyle x(u,v)=e^u sin v,y(u,v)=e^ucos v,) and (displaystyle z(u,v)=e^u) into this equation:

[egin{align*} dfrac{∂w}{∂u} =(6x−2y)e^usin v−2xe^ucos v+8ze^u [4pt] =(6e^usin v−2eucos v)e^usin v−2(e^usin v)e^ucos v+8e^{2u} [4pt] =6e^{2u}sin^2 v−4e^{2u}sin vcos v+8e^{2u} [4pt] =2e^{2u}(3sin^2 v−2sin vcos v+4). end{align*}]

Next, we calculate (displaystyle ∂w/∂v):

[egin{align*} dfrac{∂w}{∂v} =dfrac{∂w}{∂x}⋅dfrac{∂x}{∂v}+dfrac{∂w}{∂y}⋅dfrac{∂y}{∂v}+dfrac{∂w}{∂z}⋅dfrac{∂z}{∂v} [4pt] =(6x−2y)e^ucos v−2x(−e^usin v)+8z(0), end{align*}]

then we substitute (displaystyle x(u,v)=e^usin v,y(u,v)=e^ucos v,) and (displaystyle z(u,v)=e^u) into this equation:

[egin{align*} dfrac{∂w}{∂v} =(6x−2y)e^ucos v−2x(−e^usin v) [4pt] =(6e^u sin v−2e^ucos v)e^ucos v+2(e^usin v)(e^usin v) [4pt] =2e^{2u}sin^2 v+6e^{2u}sin vcos v−2e^{2u}cos^2 v [4pt] =2e^{2u}(sin^2 v+sin vcos v−cos^2 v). end{align*}]

Exercise (PageIndex{3})

Calculate (displaystyle ∂w/∂u) and (displaystyle ∂w/∂v) given the following functions:

[egin{align*} w =f(x,y,z)=dfrac{x+2y−4z}{2x−y+3z} [4pt] x =x(u,v)=e^{2u}cos3v [4pt] y =y(u,v)=e^{2u}sin 3v [4pt] z =z(u,v)=e^{2u}. end{align*}]

Hint

Calculate nine partial derivatives, then use the same formulas from Example (PageIndex{3}).

(displaystyle dfrac{∂w}{∂u}=0)

(displaystyle dfrac{∂w}{∂v}=dfrac{15−33sin 3v+6cos 3v}{(3+2cos 3v−sin 3v)^2})

Example (PageIndex{4}): Drawing a Tree Diagram

Create a tree diagram for the case when

[ w=f(x,y,z),x=x(t,u,v),y=y(t,u,v),z=z(t,u,v) onumber]

and write out the formulas for the three partial derivatives of (displaystyle w).

Solution

Starting from the left, the function (displaystyle f) has three independent variables: (displaystyle x,y), and (displaystyle z). Therefore, three branches must be emanating from the first node. Each of these three branches also has three branches, for each of the variables (displaystyle t,u,) and (displaystyle v).

The three formulas are

[egin{align*} dfrac{∂w}{∂t} =dfrac{∂w}{∂x}dfrac{∂x}{∂t}+dfrac{∂w}{∂y}dfrac{∂y}{∂t}+dfrac{∂w}{∂z}dfrac{∂z}{∂t} [4pt] dfrac{∂w}{∂u} =dfrac{∂w}{∂x}dfrac{∂x}{∂u}+dfrac{∂w}{∂y}dfrac{∂y}{∂u}+dfrac{∂w}{∂z}dfrac{∂z}{∂u} [4pt] dfrac{∂w}{∂v} =dfrac{∂w}{∂x}dfrac{∂x}{∂v}+dfrac{∂w}{∂y}dfrac{∂y}{∂v}+dfrac{∂w}{∂z}dfrac{∂z}{∂v}. end{align*}]

Exercise (PageIndex{4})

Create a tree diagram for the case when

[displaystyle w=f(x,y),x=x(t,u,v),y=y(t,u,v) onumber]

and write out the formulas for the three partial derivatives of (displaystyle w.)

Hint

Determine the number of branches that emanate from each node in the tree.

[egin{align*}dfrac{∂w}{∂t} =dfrac{∂w}{∂x}dfrac{∂x}{∂t}+dfrac{∂w}{∂y}dfrac{∂y}{∂t} [4pt] dfrac{∂w}{∂u} =dfrac{∂w}{∂x}dfrac{∂x}{∂u}+dfrac{∂w}{∂y}dfrac{∂y}{∂u} [4pt] dfrac{∂w}{∂v} =dfrac{∂w}{∂x}dfrac{∂x}{∂v}+dfrac{∂w}{∂y}dfrac{∂y}{∂v} end{align*}]

## Implicit Differentiation

Recall from implicit differentiation provides a method for finding (displaystyle dy/dx) when (displaystyle y) is defined implicitly as a function of (displaystyle x). The method involves differentiating both sides of the equation defining the function with respect to (displaystyle x), then solving for (displaystyle dy/dx.) Partial derivatives provide an alternative to this method.

Consider the ellipse defined by the equation (displaystyle x^2+3y^2+4y−4=0) as follows.

This equation implicitly defines (displaystyle y) as a function of (displaystyle x). As such, we can find the derivative (displaystyle dy/dx) using the method of implicit differentiation:

[egin{align*}dfrac{d}{dx}(x^2+3y^2+4y−4) =dfrac{d}{dx}(0) [4pt] 2x+6ydfrac{dy}{dx}+4dfrac{dy}{dx} =0 [4pt] (6y+4)dfrac{dy}{dx} =−2x[4pt] dfrac{dy}{dx} =−dfrac{x}{3y+2}end{align*}]

We can also define a function (displaystyle z=f(x,y)) by using the left-hand side of the equation defining the ellipse. Then (displaystyle f(x,y)=x^2+3y^2+4y−4.) The ellipse (displaystyle x^2+3y^2+4y−4=0) can then be described by the equation (displaystyle f(x,y)=0). Using this function and the following theorem gives us an alternative approach to calculating (displaystyle dy/dx.)

Theorem: Implicit Differentiation of a Function of Two or More Variables

Suppose the function (displaystyle z=f(x,y)) defines (displaystyle y) implicitly as a function (displaystyle y=g(x)) of (displaystyle x) via the equation (displaystyle f(x,y)=0.) Then

[dfrac{dy}{dx}=−dfrac{∂f/∂x}{∂f/∂y} label{implicitdiff1}]

provided (displaystyle f_y(x,y)≠0.)

If the equation (displaystyle f(x,y,z)=0) defines (displaystyle z) implicitly as a differentiable function of (displaystyle x) and (displaystyle y), then

[dfrac{dz}{dx}=−dfrac{∂f/∂x}{∂f/∂z} ; ext{and}; dfrac{dz}{dy}=−dfrac{∂f/∂y}{∂f/∂z}label{implicitdiff2}]

as long as (displaystyle f_z(x,y,z)≠0.)

Equation ef{implicitdiff1} is a direct consequence of Equation ef{chain2a}. In particular, if we assume that (displaystyle y) is defined implicitly as a function of (displaystyle x) via the equation (displaystyle f(x,y)=0), we can apply the chain rule to find (displaystyle dy/dx:)

[egin{align*} dfrac{d}{dx}f(x,y) =dfrac{d}{dx}(0) [4pt] dfrac{∂f}{∂x}⋅dfrac{dx}{dx}+dfrac{∂f}{∂y}⋅dfrac{dy}{dx} =0 [4pt]dfrac{∂f}{∂x}+dfrac{∂f}{∂y}⋅dfrac{dy}{dx} =0. end{align*}]

Solving this equation for (displaystyle dy/dx) gives Equation ef{implicitdiff1}. Equation ef{implicitdiff1} can be derived in a similar fashion.

Let’s now return to the problem that we started before the previous theorem. Using Note and the function (displaystyle f(x,y)=x^2+3y^2+4y−4,) we obtain

[egin{align*} dfrac{∂f}{∂x} =2x[4pt] dfrac{∂f}{∂y} =6y+4. end{align*}]

Then Equation ef{implicitdiff1} gives

[dfrac{dy}{dx}=−dfrac{∂f/∂x}{∂f/∂y}=−dfrac{2x}{6y+4}=−dfrac{x}{3y+2},]

which is the same result obtained by the earlier use of implicit differentiation.

Example (displaystyle PageIndex{5}): Implicit Differentiation by Partial Derivatives

1. Calculate (displaystyle dy/dx) if y is defined implicitly as a function of (displaystyle x) via the equation (displaystyle 3x^2−2xy+y^2+4x−6y−11=0). What is the equation of the tangent line to the graph of this curve at point (displaystyle (2,1))?
2. Calculate (displaystyle ∂z/∂x) and (displaystyle ∂z/∂y,) given (displaystyle x^2e^y−yze^x=0.)

Solution

a. Set (displaystyle f(x,y)=3x^2−2xy+y^2+4x−6y−11=0,) then calculate (displaystyle f_x) and (displaystyle f_y: f_x=6x−2y+4) (displaystyle f_y=−2x+2y−6.)

The derivative is given by

[displaystyle dfrac{dy}{dx}=−dfrac{∂f/∂x}{∂f/∂y}=dfrac{6x−2y+4}{−2x+2y−6}=dfrac{3x−y+2}{x−y+3}. onumber]

The slope of the tangent line at point (displaystyle (2,1)) is given by

[displaystyle dfrac{dy}{dx}∣_{(x,y)=(2,1)}=dfrac{3(2)−1+2}{2−1+3}=dfrac{7}{4} onumber]

To find the equation of the tangent line, we use the point-slope form (Figure (PageIndex{5})):

[egin{align*} y−y_0 =m(x−x_0)[4pt]y−1 =dfrac{7}{4}(x−2) [4pt] y =dfrac{7}{4}x−dfrac{7}{2}+1[4pt] y =dfrac{7}{4}x−dfrac{5}{2}.end{align*}]

b. We have (displaystyle f(x,y,z)=x^2e^y−yze^x.) Therefore,

[egin{align*} dfrac{∂f}{∂x} =2xe^y−yze^x [4pt] dfrac{∂f}{∂y} =x^2e^y−ze^x [4pt] dfrac{∂f}{∂z} =−ye^xend{align*}]

Using Equation ef{implicitdiff2},

[egin{align*} dfrac{∂z}{∂x} =−dfrac{∂f/∂x}{∂f/∂y} dfrac{∂z}{∂y} =−dfrac{∂f/∂y}{∂f/∂z} [4pt] =−dfrac{2xe^y−yze^x}{−ye^x} ext{and} =−dfrac{x^2e^y−ze^x}{−ye^x} [4pt] =dfrac{2xe^y−yze^x}{ye^x} =dfrac{x^2e^y−ze^x}{ye^x} end{align*}]

Exercise (PageIndex{5})

Find (displaystyle dy/dx) if (displaystyle y) is defined implicitly as a function of (displaystyle x) by the equation (displaystyle x^2+xy−y^2+7x−3y−26=0). What is the equation of the tangent line to the graph of this curve at point (displaystyle (3,−2))?

Hint

Calculate (displaystyle ∂f/dx) and (displaystyle ∂f/dy), then use Equation ef{implicitdiff1}.

Solution

[dfrac { d y } { d x } = left. frac { 2 x + y + 7 } { 2 y - x + 3 } ight| _ { ( 3 , - 2 ) } = dfrac { 2 ( 3 ) + ( - 2 ) + 7 } { 2 ( - 2 ) - ( 3 ) + 3 } = - dfrac { 11 } { 4 } onumber]

Equation of the tangent line: (displaystyle y=−dfrac{11}{4}x+dfrac{25}{4})

## Key Concepts

• The chain rule for functions of more than one variable involves the partial derivatives with respect to all the independent variables.
• Tree diagrams are useful for deriving formulas for the chain rule for functions of more than one variable, where each independent variable also depends on other variables.

## Key Equations

• Chain rule, one independent variable

(displaystyle dfrac{dz}{dt}=dfrac{∂z}{∂x}⋅dfrac{dx}{dt}+dfrac{∂z}{∂y}⋅dfrac{dy}{dt})

• Chain rule, two independent variables

(displaystyle dfrac{dz}{du}=dfrac{∂z}{∂x}⋅dfrac{∂x}{∂u}+dfrac{∂z}{∂y}⋅dfrac{∂y}{∂u}dfrac{dz}{dv}=dfrac{∂z}{∂x}⋅dfrac{∂x}{∂v}+dfrac{∂z}{∂y}⋅dfrac{∂y}{∂v})

• Generalized chain rule

(displaystyle dfrac{∂w}{∂t_j}=dfrac{∂w}{∂x_1}dfrac{∂x_1}{∂t_j}+dfrac{∂w}{∂x_2}dfrac{∂x_1}{∂t_j}+⋯+dfrac{∂w}{∂x_m}dfrac{∂x_m}{∂t_j})

## Glossary

generalized chain rule
the chain rule extended to functions of more than one independent variable, in which each independent variable may depend on one or more other variables
intermediate variable
given a composition of functions (e.g., (displaystyle f(x(t),y(t)))), the intermediate variables are the variables that are independent in the outer function but dependent on other variables as well; in the function (displaystyle f(x(t),y(t)),) the variables (displaystyle x) and (displaystyle y) are examples of intermediate variables
tree diagram
illustrates and derives formulas for the generalized chain rule, in which each independent variable is accounted for

## Complex chain rule for complex valued functions

Let $f=f(z)$ and $g=g(w)$ be two complex valued functions which are differentiable in the real sense, $h(z)=g(f(z))$. Prove the complex chain rule. All partial derivatives: $frac = fracfrac + fracfrac$ and $frac = fracfrac + fracfrac$ Are we supposed to arrive at this through Cauchy-Riemann?

There are several versions of the chain rule for functions of more than one variable, each of them giving a rule for differentiating a composite function.

Theorem. (Chain Rule Involving One Independent Variable) Let $f(x,y)$ be a differentiable function of $x$ and $y$, and let $x=x(t)$ and $y=y(t)$ be differentiable functions of $t.$ Then $z=f(x,y)$ is a differentiable function of $t$ and egin label frac=fracfrac+fracfrac. end

Proof. Because $z=f(x,y)$ is differentiable, we can write the increment $Delta z$ in the following form: egin Delta z=fracDelta x+fracDelta y+epsilon_1Delta x+epsilon_2Delta y end where $epsilon_1 o 0$ and $epsilon_2 o 0$ as both $Delta x o 0$ and $Delta y o 0.$ Dividing by $Delta t eq 0,$ we obtain egin frac=fracfrac+fracfrac+epsilon 1frac+epsilon_2frac. end Because $x$ and $y$ are function of $t$, we can write their increments as egin Delta x=x(t+Delta t) -x(t) qquad ext qquad Delta y=y(t+Delta t)-y(t).end We know that $x$ and $y$ vary continuously with $t$, because $x$ and $y$ are differentiable, and it follows that $Delta x o 0$ and $Delta y o 0$ as $Delta t o 0$ so that $epsilon_1 o 0$ and $epsilon_2 o 0$ as $Delta t o 0.$ Therefore, we have egin frac & =lim_frac & =lim_left(fracfrac+fracfrac+epsilon_1frac+epsilon_2frac ight) & =fracfrac+fracfrac+(0)frac+(0)frac end
as desired.

Example. If $z=x^2y+3x y^4,$ where $x=e^t$ and $y=sin t$, find $frac.$

Solution. The chain rule gives, egin frac &=fracfrac+fracfrac & =left(2e^tsin t+3 ext^4t ight)e^t +left(e^<2t>+12e^tsin ^3t ight) cos t. end as desired.

Example. Two objects are traveling in elliptical paths given by the following parametric equations egin x_1(t)=2 cos t, quad y_1(t)=3 sin t quad x_2(t)=4 sin 2 t, quad y_2(t)=3 cos 2t. end At what rate is the distance between the two objects changing when $t=pi ?$

## 13.5 The Multivariable Chain Rule

In this section we extend the Chain Rule to functions of more than one variable.

###### Theorem 13.5.1 Multivariable Chain Rule, Part I

Let z = f ⁢ ( x , y ) , x = g ⁢ ( t ) and y = h ⁢ ( t ) , where f , g and h are differentiable functions. Then z = f ⁢ ( x , y ) = f ⁢ ( g ⁢ ( t ) , h ⁢ ( t ) ) is a function of t , and

 d ⁢ z d ⁢ t = d ⁢ f d ⁢ t = f x ⁢ ( x , y ) ⁢ d ⁢ x d ⁢ t + f y ⁢ ( x , y ) ⁢ d ⁢ y d ⁢ t = ∂ ⁡ f ∂ ⁡ x ⁢ d ⁢ x d ⁢ t + ∂ ⁡ f ∂ ⁡ y ⁢ d ⁢ y d ⁢ t .

 d ⁢ f d ⁢ t ⁢ ( x , y ) = lim h → 0 ⁡ f ⁢ ( x ⁢ ( t + h ) , y ⁢ ( t + h ) ) - f ⁢ ( x , y ) h .

 Δ ⁢ f = f ⁢ ( x ⁢ ( t + h ) , y ⁢ ( t + h ) ) - f ⁢ ( x , y ) , d ⁢ x = x ⁢ ( t + h ) - x ⁢ ( t ) , and d ⁢ y = y ⁢ ( t + h ) - y ⁢ ( t ) .

Because f is differentiable, Definition 13.4.2 gives us functions E 1 and E 2 so that

 E 1 ⁢ d ⁢ x + E 2 ⁢ d ⁢ y = Δ ⁢ f - f x ⁢ ( x , y ) ⁢ d ⁢ x - f y ⁢ ( x , y ) ⁢ d ⁢ y , lim ( d ⁢ x , d ⁢ y ) → 0 ⁡ E 1 = 0 , and lim ( d ⁢ x , d ⁢ y ) → 0 ⁡ E 2 = 0 .

 d ⁢ f d ⁢ t ⁢ ( x , y ) = lim h → 0 ⁡ f x ⁢ ( x , y ) ⁢ d ⁢ x + f y ⁢ ( x , y ) ⁢ d ⁢ y + E 1 ⁢ d ⁢ x + E 2 ⁢ d ⁢ y h = f x ⁢ ( x , y ) ⁢ lim h → 0 ⁡ d ⁢ x h + f y ⁢ ( x , y ) ⁢ lim h → 0 ⁡ d ⁢ y h + lim h → 0 ⁡ E 1 ⁢ lim h → 0 ⁡ d ⁢ x h + lim h → 0 ⁡ E 2 ⁢ lim h → 0 ⁡ d ⁢ y h = f x ⁢ ( x , y ) ⁢ x ′ ⁢ ( t ) + f y ⁢ ( x , y ) ⁢ y ′ ⁢ ( t ) + 0 ⁢ x ′ ⁢ ( t ) + 0 ⁢ y ′ ⁢ ( t ) . ∎

It is good to understand what the situation of z = f ⁢ ( x , y ) , x = g ⁢ ( t ) and y = h ⁢ ( t ) describes. We know that z = f ⁢ ( x , y ) describes a surface we also recognize that x = g ⁢ ( t ) and y = h ⁢ ( t ) are parametric equations for a curve in the x - y plane. Combining these together, we are describing a curve that lies on the surface described by f . The parametric equations for this curve are x = g ⁢ ( t ) , y = h ⁢ ( t ) and z = f ⁢ ( g ⁢ ( t ) , h ⁢ ( t ) ) .

† † margin: Figure 13.5.1: Understanding the application of the Multivariable Chain Rule.

Consider Figure 13.5.1 in which a surface is drawn, along with a dashed curve in the x - y plane. Restricting f to just the points on this circle gives the curve shown on the surface. The derivative d ⁢ f d ⁢ t gives the instantaneous rate of change of f with respect to t . If we consider an object traveling along this path, d ⁢ f d ⁢ t gives the rate at which the object rises/falls.

We now practice applying the Multivariable Chain Rule.

###### Example 13.5.1 Using the Multivariable Chain Rule

Let z = x 2 ⁢ y + x , where x = sin ⁡ t and y = e 5 ⁢ t . Find d ⁢ z d ⁢ t using the Chain Rule.

Solution Following Theorem 13.5.1 , we find

 f x ⁢ ( x , y ) = 2 ⁢ x ⁢ y + 1 , f y ⁢ ( x , y ) = x 2 , d ⁢ x d ⁢ t = cos ⁡ t , d ⁢ y d ⁢ t = 5 ⁢ e 5 ⁢ t .

Applying the theorem, we have

 d ⁢ z d ⁢ t = ( 2 ⁢ x ⁢ y + 1 ) ⁢ cos ⁡ t + 5 ⁢ x 2 ⁢ e 5 ⁢ t .

This may look odd, as it seems that d ⁢ z d ⁢ t is a function of x , y and t . Since x and y are functions of t , d ⁢ z d ⁢ t is really just a function of t , and we can replace x with sin ⁡ t and y with e 5 ⁢ t :

 d ⁢ z d ⁢ t = ( 2 ⁢ x ⁢ y + 1 ) ⁢ cos ⁡ t + 5 ⁢ x 2 ⁢ e 5 ⁢ t = ( 2 ⁢ sin ⁡ ( t ) ⁢ e 5 ⁢ t + 1 ) ⁢ cos ⁡ t + 5 ⁢ e 5 ⁢ t ⁢ sin 2 ⁡ t .

The previous example can make us wonder: if we substituted for x and y at the end to show that d ⁢ z d ⁢ t is really just a function of t , why not substitute before differentiating, showing clearly that z is a function of t ?

That is, z = x 2 ⁢ y + x = ( sin ⁡ t ) 2 ⁢ e 5 ⁢ t + sin ⁡ t . Applying the Chain and Product Rules, we have

 d ⁢ z d ⁢ t = 2 ⁢ sin ⁡ t ⁢ cos ⁡ t ⁢ e 5 ⁢ t + 5 ⁢ sin 2 ⁡ t ⁢ e 5 ⁢ t + cos ⁡ t ,

which matches the result from the example.

This may now make one wonder “What’s the point? If we could already find the derivative, why learn another way of finding it?” In some cases, applying this rule makes deriving simpler, but this is hardly the power of the Chain Rule. Rather, in the case where z = f ⁢ ( x , y ) , x = g ⁢ ( t ) and y = h ⁢ ( t ) , the Chain Rule is extremely powerful when we do not know what f , g and/or h are . It may be hard to believe, but often in “the real world” we know rate-of-change information (i.e., information about derivatives) without explicitly knowing the underlying functions. The Chain Rule allows us to combine several rates of change to find another rate of change. The Chain Rule also has theoretic use, giving us insight into the behavior of certain constructions (as we’ll see in the next section).

We demonstrate this in the next example.

###### Example 13.5.2 Applying the Multivariable Chain Rule

An object travels along a path on a surface. The exact path and surface are not known, but at time t = t 0 it is known that :

 ∂ ⁡ z ∂ ⁡ x = 5 , ∂ ⁡ z ∂ ⁡ y = - 2 , d ⁢ x d ⁢ t = 3 and d ⁢ y d ⁢ t = 7 .

Find d ⁢ z d ⁢ t at time t 0 .

Solution The Multivariable Chain Rule states that

 d ⁢ z d ⁢ t = ∂ ⁡ z ∂ ⁡ x ⁢ d ⁢ x d ⁢ t + ∂ ⁡ z ∂ ⁡ y ⁢ d ⁢ y d ⁢ t = 5 ⁢ ( 3 ) + ( - 2 ) ⁢ ( 7 ) = 1 .

By knowing certain rates-of-change information about the surface and about the path of the particle in the x - y plane, we can determine how quickly the object is rising/falling.

We next apply the Chain Rule to solve a max/min problem.

###### Example 13.5.3 Applying the Multivariable Chain Rule

Consider the surface z = x 2 + y 2 - x ⁢ y , a paraboloid, on which a particle moves with x and y coordinates given by x = cos ⁡ t and y = sin ⁡ t . Find d ⁢ z d ⁢ t when t = 0 , and find where the particle reaches its maximum/minimum z -values.

Solution It is straightforward to compute

 f x ⁢ ( x , y ) = 2 ⁢ x - y , f y ⁢ ( x , y ) = 2 ⁢ y - x , d ⁢ x d ⁢ t = - sin ⁡ t , d ⁢ y d ⁢ t = cos ⁡ t .

Combining these according to the Chain Rule gives:

 d ⁢ z d ⁢ t = - ( 2 ⁢ x - y ) ⁢ sin ⁡ t + ( 2 ⁢ y - x ) ⁢ cos ⁡ t .

When t = 0 , x = 1 and y = 0 . Thus d ⁢ z d ⁢ t = - ( 2 ) ⁢ ( 0 ) + ( - 1 ) ⁢ ( 1 ) = - 1 . When t = 0 , the particle is moving down, as shown in Figure 13.5.2 .

To find where z -value is maximized/minimized on the particle’s path, we set d ⁢ z d ⁢ t = 0 and solve for t :

 d ⁢ z d ⁢ t = 0 = - ( 2 ⁢ x - y ) ⁢ sin ⁡ t + ( 2 ⁢ y - x ) ⁢ cos ⁡ t 0 = - ( 2 ⁢ cos ⁡ t - sin ⁡ t ) ⁢ sin ⁡ t + ( 2 ⁢ sin ⁡ t - cos ⁡ t ) ⁢ cos ⁡ t 0 = sin 2 ⁡ t - cos 2 ⁡ t cos 2 ⁡ t = sin 2 ⁡ t t = n ⁢ π 4 (for odd n )

We can use the First Derivative Test to find that on [ 0 , 2 ⁢ π ] , z has reaches its absolute minimum at t = π / 4 and 5 ⁢ π / 4 it reaches its absolute maximum at t = 3 ⁢ π / 4 and 7 ⁢ π / 4 , as shown in Figure 13.5.2 .

We can extend the Chain Rule to include the situation where z is a function of more than one variable, and each of these variables is also a function of more than one variable. The basic case of this is where z = f ⁢ ( x , y ) , and x and y are functions of two variables, say s and t .

###### Theorem 13.5.2 Multivariable Chain Rule, Part II

Let z = f ⁢ ( x , y ) , x = g ⁢ ( s , t ) and y = h ⁢ ( s , t ) , where f , g and h are differentiable functions. Then z is a function of s and t , and

∂ ⁡ z ∂ ⁡ s = ∂ ⁡ f ∂ ⁡ x ⁢ ∂ ⁡ x ∂ ⁡ s + ∂ ⁡ f ∂ ⁡ y ⁢ ∂ ⁡ y ∂ ⁡ s , and

∂ ⁡ z ∂ ⁡ t = ∂ ⁡ f ∂ ⁡ x ⁢ ∂ ⁡ x ∂ ⁡ t + ∂ ⁡ f ∂ ⁡ y ⁢ ∂ ⁡ y ∂ ⁡ t .

Let z = f ⁢ ( x 1 , x 2 , … , x m ) be a differentiable function of m variables, where each of the x i is a differentiable function of the variables t 1 , t 2 , … , t n . Then z is a function of the t i , and

 ∂ ⁡ z ∂ ⁡ t i = ∂ ⁡ f ∂ ⁡ x 1 ⁢ ∂ ⁡ x 1 ∂ ⁡ t i + ∂ ⁡ f ∂ ⁡ x 2 ⁢ ∂ ⁡ x 2 ∂ ⁡ t i + ⋯ + ∂ ⁡ f ∂ ⁡ x m ⁢ ∂ ⁡ x m ∂ ⁡ t i .

The proof of Part II follows quickly from Part I, because ∂ ∂ ⁡ t i means that we hold the other variables constant and we are back to the one variable case already proved. A helpful way to remember the derivatives is to examine the following chart

## Contents

The chain rule seems to have first been used by Gottfried Wilhelm Leibniz. He used it to calculate the derivative of a + b z + c z 2 >>> as the composite of the square root function and the function a + b z + c z 2 !> . He first mentioned it in a 1676 memoir (with a sign error in the calculation). The common notation of chain rule is due to Leibniz. [3] Guillaume de l'Hôpital used the chain rule implicitly in his Analyse des infiniment petits. The chain rule does not appear in any of Leonhard Euler's analysis books, even though they were written over a hundred years after Leibniz's discovery.

The simplest form of the chain rule is for real-valued functions of one real variable. It states that if g is a function that is differentiable at a point c (i.e. the derivative g′(c) exists) and f is a function that is differentiable at g(c) , then the composite function fg is differentiable at c , and the derivative is [4]

( f ∘ g ) ′ ( c ) = f ′ ( g ( c ) ) ⋅ g ′ ( c ) .

The rule is sometimes abbreviated as

If y = f(u) and u = g(x) , then this abbreviated form is written in Leibniz notation as:

The points where the derivatives are evaluated may also be stated explicitly:

### Absence of formulas Edit

It may be possible to apply the chain rule even when there are no formulas for the functions which are being differentiated. This can happen when the derivatives are measured directly. Suppose that a car is driving up a tall mountain. The car's speedometer measures its speed directly. If the grade is known, then the rate of ascent can be calculated using trigonometry. Suppose that the car is ascending at 2.5 km/h . Standard models for the Earth's atmosphere imply that the temperature drops about 6.5 °C per kilometer ascended (called the lapse rate). To find the temperature drop per hour, we can apply the chain rule. Let the function g(t) be the altitude of the car at time t , and let the function f(h) be the temperature h kilometers above sea level. f and g are not known exactly: For example, the altitude where the car starts is not known and the temperature on the mountain is not known. However, their derivatives are known: f′ is −6.5 °C/km , and g′ is 2.5 km/h . The chain rule states that the derivative of the composite function is the product of the derivative of f and the derivative of g . This is −6.5 °C/km ⋅ 2.5 km/h = −16.25 °C/h .

One of the reasons why this computation is possible is because f′ is a constant function. A more accurate description of how the temperature near the car varies over time would require an accurate model of how the temperature varies at different altitudes. This model may not have a constant derivative. To compute the temperature change in such a model, it would be necessary to know g and not just g′ , because without knowing g it is not possible to know where to evaluate f′ .

### Composites of more than two functions Edit

The chain rule can be applied to composites of more than two functions. To take the derivative of a composite of more than two functions, notice that the composite of f , g , and h (in that order) is the composite of f with gh . The chain rule states that to compute the derivative of fgh , it is sufficient to compute the derivative of f and the derivative of gh . The derivative of f can be calculated directly, and the derivative of gh can be calculated by applying the chain rule again.

For concreteness, consider the function

This can be decomposed as the composite of three functions:

The chain rule states that the derivative of their composite at the point x = a is:

In Leibniz notation, this is:

The derivative function is therefore:

Another way of computing this derivative is to view the composite function fgh as the composite of fg and h. Applying the chain rule in this manner would yield:

( f ∘ g ∘ h ) ′ ( a ) = ( f ∘ g ) ′ ( h ( a ) ) ⋅ h ′ ( a ) = f ′ ( g ( h ( a ) ) ) ⋅ g ′ ( h ( a ) ) ⋅ h ′ ( a ) .

This is the same as what was computed above. This should be expected because (fg) ∘ h = f ∘ (gh) .

Sometimes, it is necessary to differentiate an arbitrarily long composition of the form f 1 ∘ f 2 ∘ ⋯ ∘ f n − 1 ∘ f n circ f_<2>circ cdots circ f_circ f_!> . In this case, define

or, in the Lagrange notation,

### Quotient rule Edit

The chain rule can be used to derive some well-known differentiation rules. For example, the quotient rule is a consequence of the chain rule and the product rule. To see this, write the function f(x)/g(x) as the product f(x) · 1/g(x) . First apply the product rule:

To compute the derivative of 1/g(x) , notice that it is the composite of g with the reciprocal function, that is, the function that sends x to 1/x . The derivative of the reciprocal function is − 1 / x 2 !> . By applying the chain rule, the last expression becomes:

which is the usual formula for the quotient rule.

### Derivatives of inverse functions Edit

Suppose that y = g(x) has an inverse function. Call its inverse function f so that we have x = f(y) . There is a formula for the derivative of f in terms of the derivative of g . To see this, note that f and g satisfy the formula

To express f' as a function of an independent variable y , we substitute f ( y ) for x wherever it appears. Then we can solve for f' .

For example, consider the function g(x) = e x . It has an inverse f(y) = ln y . Because g′(x) = e x , the above formula says that

This formula is true whenever g is differentiable and its inverse f is also differentiable. This formula can fail when one of these conditions is not true. For example, consider g(x) = x 3 . Its inverse is f(y) = y 1/3 , which is not differentiable at zero. If we attempt to use the above formula to compute the derivative of f at zero, then we must evaluate 1/g′(f(0)) . Since f(0) = 0 and g′(0) = 0 , we must evaluate 1/0, which is undefined. Therefore, the formula fails in this case. This is not surprising because f is not differentiable at zero.

Faà di Bruno's formula generalizes the chain rule to higher derivatives. Assuming that y = f(u) and u = g(x) , then the first few derivatives are:

### First proof Edit

One proof of the chain rule begins with the definition of the derivative:

We will show that the difference quotient for fg is always equal to:

Whenever g(x) is not equal to g(a) , this is clear because the factors of g(x) − g(a) cancel. When g(x) equals g(a) , then the difference quotient for fg is zero because f(g(x)) equals f(g(a)) , and the above product is zero because it equals f′(g(a)) times zero. So the above product is always equal to the difference quotient, and to show that the derivative of fg at a exists and to determine its value, we need only show that the limit as x goes to a of the above product exists and determine its value.

To do this, recall that the limit of a product exists if the limits of its factors exist. When this happens, the limit of the product of these two factors will equal the product of the limits of the factors. The two factors are Q(g(x)) and (g(x) − g(a)) / (xa) . The latter is the difference quotient for g at a , and because g is differentiable at a by assumption, its limit as x tends to a exists and equals g′(a) .

As for Q(g(x)) , notice that Q is defined wherever f is. Furthermore, f is differentiable at g(a) by assumption, so Q is continuous at g(a) , by definition of the derivative. The function g is continuous at a because it is differentiable at a , and therefore Qg is continuous at a . So its limit as x goes to a exists and equals Q(g(a)) , which is f′(g(a)) .

This shows that the limits of both factors exist and that they equal f′(g(a)) and g′(a) , respectively. Therefore, the derivative of fg at a exists and equals f′(g(a)) g′(a) . [5]

### Second proof Edit

Another way of proving the chain rule is to measure the error in the linear approximation determined by the derivative. This proof has the advantage that it generalizes to several variables. It relies on the following equivalent definition of differentiability at a point: A function g is differentiable at a if there exists a real number g′(a) and a function ε(h) that tends to zero as h tends to zero, and furthermore

g ( a + h ) − g ( a ) = g ′ ( a ) h + ε ( h ) h .

Here the left-hand side represents the true difference between the value of g at a and at a + h , whereas the right-hand side represents the approximation determined by the derivative plus an error term.

In the situation of the chain rule, such a function ε exists because g is assumed to be differentiable at a. Again by assumption, a similar function also exists for f at g(a). Calling this function η, we have

f ( g ( a ) + k ) − f ( g ( a ) ) = f ′ ( g ( a ) ) k + η ( k ) k .

The above definition imposes no constraints on η(0), even though it is assumed that η(k) tends to zero as k tends to zero. If we set η(0) = 0 , then η is continuous at 0.

Proving the theorem requires studying the difference f(g(a + h)) − f(g(a)) as h tends to zero. The first step is to substitute for g(a + h) using the definition of differentiability of g at a:

f ( g ( a + h ) ) − f ( g ( a ) ) = f ( g ( a ) + g ′ ( a ) h + ε ( h ) h ) − f ( g ( a ) ) .

The next step is to use the definition of differentiability of f at g(a). This requires a term of the form f(g(a) + k) for some k. In the above equation, the correct k varies with h. Set kh = g′(a) h + ε(h) h and the right hand side becomes f(g(a) + kh) − f(g(a)) . Applying the definition of the derivative gives:

f ( g ( a ) + k h ) − f ( g ( a ) ) = f ′ ( g ( a ) ) k h + η ( k h ) k h . )-f(g(a))=f'(g(a))k_+eta (k_)k_.>

To study the behavior of this expression as h tends to zero, expand kh. After regrouping the terms, the right-hand side becomes:

f ′ ( g ( a ) ) g ′ ( a ) h + [ f ′ ( g ( a ) ) ε ( h ) + η ( k h ) g ′ ( a ) + η ( k h ) ε ( h ) ] h . )g'(a)+eta (k_)varepsilon (h)]h.>

Because ε(h) and η(kh) tend to zero as h tends to zero, the first two bracketed terms tend to zero as h tends to zero. Applying the same theorem on products of limits as in the first proof, the third bracketed term also tends zero. Because the above expression is equal to the difference f(g(a + h)) − f(g(a)) , by the definition of the derivative fg is differentiable at a and its derivative is f′(g(a)) g′(a).

The role of Q in the first proof is played by η in this proof. They are related by the equation:

Q ( y ) = f ′ ( g ( a ) ) + η ( y − g ( a ) ) .

The need to define Q at g(a) is analogous to the need to define η at zero.

### Third proof Edit

Constantin Carathéodory's alternative definition of the differentiability of a function can be used to give an elegant proof of the chain rule. [6]

Under this definition, a function f is differentiable at a point a if and only if there is a function q , continuous at a and such that f(x) − f(a) = q(x)(xa) . There is at most one such function, and if f is differentiable at a then f ′(a) = q(a) .

Given the assumptions of the chain rule and the fact that differentiable functions and compositions of continuous functions are continuous, we have that there exist functions q , continuous at g(a) , and r , continuous at a , and such that,

f ( g ( x ) ) − f ( g ( a ) ) = q ( g ( x ) ) ( g ( x ) − g ( a ) )

g ( x ) − g ( a ) = r ( x ) ( x − a ) .

f ( g ( x ) ) − f ( g ( a ) ) = q ( g ( x ) ) r ( x ) ( x − a ) ,

but the function given by h(x) = q(g(x))r(x) is continuous at a , and we get, for this a

( f ( g ( a ) ) ) ′ = q ( g ( a ) ) r ( a ) = f ′ ( g ( a ) ) g ′ ( a ) .

A similar approach works for continuously differentiable (vector-)functions of many variables. This method of factoring also allows a unified approach to stronger forms of differentiability, when the derivative is required to be Lipschitz continuous, Hölder continuous, etc. Differentiation itself can be viewed as the polynomial remainder theorem (the little Bézout theorem, or factor theorem), generalized to an appropriate class of functions. [ citation needed ]

### Proof via infinitesimals Edit

and applying the standard part we obtain

The generalization of the chain rule to multi-variable functions is rather technical. However, it is simpler to write in the case of functions of the form

As this case occurs often in the study of functions of a single variable, it is worth describing it separately.

### Case of f(g1(x), . , gk(x)) Edit

For writing the chain rule for a function of the form

f(g1(x), . , gk(x)) ,

one needs the partial derivatives of f with respect to its k arguments. The usual notations for partial derivatives involve names for the arguments of the function. As these arguments are not named in the above formula, it is simpler and clearer to denote by

the derivative of f with respect to its i th argument, and by

the value of this derivative at z .

With this notation, the chain rule is

#### Example: arithmetic operations Edit

If the function f is addition, that is, if

The case of exponentiation

is slightly more complicated, as

d d x ( g ( x ) h ( x ) ) = h ( x ) g ( x ) h ( x ) − 1 d d x g ( x ) + g ( x ) h ( x ) ln ⁡ g ( x ) d d x h ( x ) . >left(g(x)^ ight)=h(x)g(x)^>g(x)+g(x)^ln g(x)>h(x).>

### General rule Edit

The simplest way for writing the chain rule in the general case is to use the total derivative, which is a linear transformation that captures all directional derivatives in a single formula. Consider differentiable functions f : R mR k and g : R nR m , and a point a in R n . Let Da g denote the total derivative of g at a and Dg(a) f denote the total derivative of f at g(a) . These two derivatives are linear transformations R nR m and R mR k , respectively, so they can be composed. The chain rule for total derivatives is that their composite is the total derivative of fg at a :

The higher-dimensional chain rule can be proved using a technique similar to the second proof given above. [7]

Because the total derivative is a linear transformation, the functions appearing in the formula can be rewritten as matrices. The matrix corresponding to a total derivative is called a Jacobian matrix, and the composite of two derivatives corresponds to the product of their Jacobian matrices. From this perspective the chain rule therefore says:

That is, the Jacobian of a composite function is the product of the Jacobians of the composed functions (evaluated at the appropriate points).

The higher-dimensional chain rule is a generalization of the one-dimensional chain rule. If k, m, and n are 1, so that f : RR and g : RR , then the Jacobian matrices of f and g are 1 × 1 . Specifically, they are:

The Jacobian of fg is the product of these 1 × 1 matrices, so it is f′(g(a))⋅g′(a) , as expected from the one-dimensional chain rule. In the language of linear transformations, Da(g) is the function which scales a vector by a factor of g′(a) and Dg(a)(f) is the function which scales a vector by a factor of f′(g(a)). The chain rule says that the composite of these two linear transformations is the linear transformation Da(fg) , and therefore it is the function that scales a vector by f′(g(a))⋅g′(a).

Another way of writing the chain rule is used when f and g are expressed in terms of their components as y = f(u) = (f1(u), …, fk(u)) and u = g(x) = (g1(x), …, gm(x)) . In this case, the above rule for Jacobian matrices is usually written as:

The chain rule for total derivatives implies a chain rule for partial derivatives. Recall that when the total derivative exists, the partial derivative in the ith coordinate direction is found by multiplying the Jacobian matrix by the ith basis vector. By doing this to the formula above, we find:

Since the entries of the Jacobian matrix are partial derivatives, we may simplify the above formula to get:

More conceptually, this rule expresses the fact that a change in the xi direction may change all of g1 through gm, and any of these changes may affect f.

In the special case where k = 1 , so that f is a real-valued function, then this formula simplifies even further:

This can be rewritten as a dot product. Recalling that u = (g1, …, gm) , the partial derivative ∂u / ∂xi is also a vector, and the chain rule says that:

#### Example Edit

Given u(x, y) = x 2 + 2y where x(r, t) = r sin(t) and y(r,t) = sin 2 (t) , determine the value of ∂u / ∂r and ∂u / ∂t using the chain rule.

#### Higher derivatives of multivariable functions Edit

Faà di Bruno's formula for higher-order derivatives of single-variable functions generalizes to the multivariable case. If y = f(u) is a function of u = g(x) as above, then the second derivative of fg is:

All extensions of calculus have a chain rule. In most of these, the formula remains the same, though the meaning of that formula may be vastly different.

One generalization is to manifolds. In this situation, the chain rule represents the fact that the derivative of fg is the composite of the derivative of f and the derivative of g. This theorem is an immediate consequence of the higher dimensional chain rule given above, and it has exactly the same formula.

The chain rule is also valid for Fréchet derivatives in Banach spaces. The same formula holds as before. [8] This case and the previous one admit a simultaneous generalization to Banach manifolds.

In differential algebra, the derivative is interpreted as a morphism of modules of Kähler differentials. A ring homomorphism of commutative rings f : RS determines a morphism of Kähler differentials Df : ΩR → ΩS which sends an element dr to d(f(r)), the exterior differential of f(r). The formula D(fg) = DfDg holds in this context as well.

The common feature of these examples is that they are expressions of the idea that the derivative is part of a functor. A functor is an operation on spaces and functions between them. It associates to each space a new space and to each function between two spaces a new function between the corresponding new spaces. In each of the above cases, the functor sends each space to its tangent bundle and it sends each function to its derivative. For example, in the manifold case, the derivative sends a C r -manifold to a C r−1 -manifold (its tangent bundle) and a C r -function to its total derivative. There is one requirement for this to be a functor, namely that the derivative of a composite must be the composite of the derivatives. This is exactly the formula D(fg) = DfDg .

There are also chain rules in stochastic calculus. One of these, Itō's lemma, expresses the composite of an Itō process (or more generally a semimartingale) dXt with a twice-differentiable function f. In Itō's lemma, the derivative of the composite function depends not only on dXt and the derivative of f but also on the second derivative of f. The dependence on the second derivative is a consequence of the non-zero quadratic variation of the stochastic process, which broadly speaking means that the process can move up and down in a very rough way. This variant of the chain rule is not an example of a functor because the two functions being composed are of different types.

Rather than discuss a Stack Exchange article, perhaps a more down-to-earth tutorial would be better:

or this video from Khan Academy:

or this one from MathIsPower4u.com:

Rather than discuss a Stack Exchange article, perhaps a more down-to-earth tutorial would be better:

or this video from Khan Academy:

It's not a stack exchange article, it's a specific question I have about how the chain rule changes if I have a function composed of different variables, where each variable is composed of its own set of variables.

My question is, how does the chain rule change when say

f = f(x,y) and x = x(u,v) and y = y(u,b)?

The variabes x and y are both functions of the variable u, but x is also a function of v while y is a function of b.

How about if say f = f(x,y) and x = x(u,v) while y = y(a,b)?

Now f depends on x and y but x and y depend on a completely different set of variables.

Summary:: I need help understanding the chain rule.

I want to understand how the chain rule works, and what exactly the person who answered my question was saying. I haven't taken analysis and I know very little linear algebra, so it went over my head.

To answer your specific question. If you have ##f = f(x, y)## then you are defining ##f## as a function of two variables. There is some implicit rule for taking an ordered pair ##(x, y)## and outputing a number ##f(x, y)##.

If additionally you define ##x = x(u, v)## and ##y = y(a, b)##, you have two additional functions of two variables. We now have three different functions of two variables. This allows us to define a function of four variables (let's call it ##g##) where: ##g(u, v, a, b) = f(x(u, v), y(a, b))##.

The function ##g##, like any function of four variables, has four partial derivatives:
$frac = fracfrac, frac = fracfrac, frac = fracfrac, frac = fracfrac$
The next thing you should do is to test this out with an example. E.g.
$f(x, y) = cos(x)sin(y), x(u, v) = 2u + 3v, y(a.b) = 2a^2 + b^3$ which gives
$g(u, v, a, b) = cos(2u + 3v)sin(2a^2 + b^3)$
You can partially differentiate ##g## directly and then check the partial derivatives match the above formulas.

You may also be interested in these Insights on the multivariable chain rule:

## Homogeneous Functions

A function (f(x_1,cdots,x_n)) is called homogeneous of degree (k) if for any (t>0) and for all ((x_1,cdots,x_n)) in the domain of (f) :
[f(tx_1,cdots,tx_n)=t^k f(x_1,cdots,x_n)]
In other words, a function is homogeneous if we multiply its argument by a factor, its values will be multiplied by some power of this factor. Here are some examples of homogeneous functions:

• The function (f(x,y)=x^2-5xy+y^2) is homogeneous of degree 2 because:
[f(tx,ty)=(tx)^2-5(tx)(ty)+(ty)^2=t^2(x^2-5xy+y^2)=t^2 f(x,y).]
• The function (f(x,y,z)=x^5y z^3) is homogeneous of degree 8 because : [f(tx,ty,tz)=(tx)^5 (ty) (tz)^3=(t^5 t^3) (x^5 y z^3)=t^8 f(x,y,z).]
• (Recall (a^b a^c=a^))
• (Recall (ln=ln+ln) and (ln=bln))

Theorem 3. Euler’s theorem: If (f(x,y)) is a homogeneous function of degree (k) then:
[xfrac+yfrac=k f.]
In general, if (f(x_1,cdots,x_n)) is homogeneous of degree (k) then:
[x_1frac+cdots+x_nfrac=k f.]

#### Hide the proof

Proof: We prove it for its simplest form. The proof for its general form is similar.

We know: [f(tx,ty)=t^k f(x,y)] If we differentiate this equation with respect to (t) . If we place (u=tx) and (v=tx) and use the chain rule, we have:
eginfracunderbrace>_<=x>+fracunderbrace>_<=y>=k t^ f(x,y). ag<*>end
Because this is true for all (t>0) , it is true when (t=1) . Plugging (t=1) in the above equation, gives (xfrac+yfrac=k f) .

According to the chain rule:

[frac=underbrace>_<=phi'(z)>frac=phi'(z)frac,]
[frac=underbrace>_<=phi'(z)>frac=phi'(z)frac.]
Therefore
[egin xfrac+yfrac&=xphi'(z)frac+yphi'(z)frac=phi'(z)underbrace+yfrac ight]>_<=kz> &=kzphi'(z).end] [Because (z) is homogeneous of order (k), according to Theorem 3 (xfrac+yfrac=kz).]

We note that if we put (z=dfrac) , (z) is homogeneous of degree 2, and (u=phi(z)=arcsin z) . Therefore, using the result of the previous example, we have:
[egin xfrac+yfrac&=k zphi'(z) &=2 z frac<1>> &=2 sin ufrac<1>>
&=2frac<|cos u|>
&=2frac
&=2 an u
end] Because (u=arcsin z) , therefore (-frac<2>leq uleq frac<2>) . In this interval (cos ugeq 0) . That is (|cos u|=cos u) .

[1] This relationship is valid only when (x>0) and (y>0) but for our purpose it is enough.

## The Chain Rule Type 2 for Two Variable Functions, Two Parameters

 Theorem 1 (The Chain Rule Type 2 for Two Variable Functions): Let $z = f(x, y)$ be a two variable real-valued differentiable function with continuous first partial derivatives and let $x = x(s, t)$ and $y = y(s, t)$ . Then $frac = frac frac + frac frac$ , and, $frac = frac frac + frac frac$ .

Let's look at some examples

### Example 1

Let $z = -2e^y sin x$ , $x = s^3t^2$ and $y = tcos s$ . Find the partial derivatives $frac$ and $frac$ .

We first compute the follow partial derivatives: $frac = -2e^y cos x$ , $frac = -2e^y sin x$ , $frac = 3s^2t^2$ , $frac = 2s^3t$ , $frac = -t sin s$ , and $frac = cos s$ . Applying the formula in theorem 1 and we get that:

## Math Insight

The following are examples of using the multivariable chain rule. For examples involving the one-variable chain rule, see simple examples of using the chain rule or the chain rule from the Calculus Refresher.

#### Example 1

Let $vc: R ightarrow R^2$ and $f: R^2 ightarrow R$ (confused?) be defined by egin vc(t) &= (t^3, t^4) f(x,y) &= x^2y. end (You can think of this as the mountain climbing example where $f(x,y)$ is height of mountain at point $(x,y)$ and the path $vc(t)$ gives your position at time $t$.) Let $h(t)$ be the composition of $f$ with $vc$ (which would give your height at time $t$): egin h(t) = (f circ vc) (t) = f(vc(t)). end Calculate the derivative $displaystyle h'(t) = diff(t)$ (i.e., the change in height) via the chain rule.

Solution A: We'll use the formula using matrices of partial derivatives: egin D(t) = Df(vc(t)) D>(t). end

We calculate the matrices of partial derivatives of $f$ and $vc$. egin Df(x,y) &= left[ pdiff(x,y) quad pdiff(x,y) ight] &=left[ egin 2xy & x^2 end ight] Dvc(t) &= left[ egin g_1'(t) g_2'(t) end ight] = left[ egin 3t^24t^3 end ight] end We need to evaluate $Df$ at the point $vc(t)$: egin Df(vc(t)) =Df(t^3, t^4)= left[ egin 2(t^3)(t^4) & (t^3)^2 end ight] = left[ egin 2t^7 & t^6 end ight] end The derivative of $h$ is egin h'(t)=diff(t) = Dh(t) &= Df(vc(t)) Dvc(t) &= left[ egin 2t^7 & t^6 end ight] left[ egin 3t^24t^3 end ight] &= (2t^7)(3t^2) + (t^6)(4t^3)= 6 t^9 + 4 t^9 &= 10 t^9 end

Solution B: We'll start immediately with the formula in component form: egin diff(t) = pdiff(vc(t)) diff (t) + pdiff(vc(t))diff (t). end We calculate egin pdiff(x,y) &= 2xy pdiff(x,y) &= x^2 pdiff(vc(t)) &= pdiff(t^3,t^4) = 2(t^3)(t^4) = 2t^7 pdiff(vc(t)) &= pdiff(t^3,t^4) = (t^3)^2 = t^6 diff(t) &= 3t^2 diff(t) &= 4t^3. end Therefore, egin diff(t) = (2t^7)(3t^2) + (t^6)(4t^3) =6 t^9 + 4 t^9 = 10 t^9. end

#### Example 1'

Verify the chain rule for example 1 by calculating an expression for $h(t)$ and then differentiating it to obtain $displaystyle diff(t)$.

Solution: $h(t) = f(vc(t)) = f(t^3,t^4) = (t^3)^2(t^4) = t^<10>$. egin h'(t) = diff(t) = 10t^9, end which matches the solution to Example 1, verifying that the chain rule got the correct answer.

For this simple example, doing it without the chain rule was a lot easier. However, that is not always the case. And, in the next example, the only way to obtain the answer is to use the chain rule.

#### Example 2

We continue the mountain climbing example of Example 1. But now, let's say we don't know the terrain ahead of time. This means we do not yet know the height $f(x,y)$ at the position $(x,y)$. We do, however, know our path through mountain as before, it is given by $vc(t) = (t^3, t^4).$

Calculate the change in height that you'll experience along the path, i.e., calculate the derivative of $h(t) = f(vc(t))$. In this case, since we don't know $f$, the answer will be given in terms of the function $f(x,y)$.

Solution: We'll just copy solution A, above. This time, though, we must leave the matrix of partial derivatives of $f$ as egin Df(x,y)= left[ egin displaystyle pdiff(x,y) & displaystyle pdiff(x,y) end ight] end since we don't know what $f(x,y)$ is. We can substitute in the values along the path $vc(t)$: egin Df(vc(t)) = Df(t^3,t^4)= left[ egin displaystyle pdiff(t^3,t^4) & displaystyle pdiff(t^3,t^4) end ight]. end Since $Dvc(t)$ is the same as in solution A, above, we calculate the derivative of $h$ as egin h'(t) = Dh(t) &= Df(vc(t)) Dvc(t) &= left[ egin displaystyle pdiff(t^3,t^4) & displaystyle pdiff(t^3,t^4) end ight] left[ egin 3t^24t^3 end ight] &=3t^2pdiff(t^3,t^4) +4t^3pdiff(t^3,t^4). end We leave the answer in this form. Of course, as soon as we know what $f(x,y)$ is, we can simply compute its partial derivatives and plug the result into this formula.

#### Example 3

We continue using the same function $f(x,y) = x^2y$ to describe the height of the mountain at position $(x,y)$. We embellish the above examples by letting $g: R^2 o R^2$ be defined by $vc(s,t) = (t-s^2,ts^2)$. (We could think of having many paths through the mountain that depend on a skill level $s$. Then, $(x,y)=vc(s,t)$ could be the position of a person at time $t$ with skill level $s$.)

Compute $pdiff<> (f circ vc)(s,t)$ and $pdiff<> (f circ vc)(s,t)$, i.e., the partial derivatives with respect to $s$ and $t$ of the height of a person in the mountains whose position is given by $vc(s,t)$.

Solution: Let $h(s,t) = (f circ vc)(s,t) = f(vc(s,t))$. We need to calculate $displaystyle pdiff(s,t)$ and $displaystyle pdiff(s,t)$. The chain rule says that egin Dh(s,t) = D(fcirc vc)(s,t) = Df(vc(s,t)) Dvc(s,t). end Since egin Dh(s,t) = left[ pdiff(s,t) quad pdiff(s,t) ight], end the answers we want are just the two components of $Dh(s,t)$. We just need to calculate the matrices $Df(vc(s,t))$ and $Dvc(s,t)$, then multiply them together.

To make it easier in case you have to do such a problem again, we'll perform the matrix multiplication before writing in the specific values for $f(x,y)$ and $vc(s,t)$. Then, we'll end up with the chain rule written in component form, which may be easier to use.

The function $f(x,y)$ hasn't changed, so its matrix of partial derivatives is egin Df(x,y)= left[ egin displaystyle pdiff(x,y) & displaystyle pdiff(x,y) end ight]. end For the chain rule, we need this evaluated at $(x,y)=vc(s,t)$ egin Df(vc(s,t))= left[ egin displaystyle pdiff(vc(s,t)) & displaystyle pdiff(vc(s,t)) end ight]. end Since $vc: R^2 o R^2$, its matrix of partial derivatives is a $2 imes 2$ matrix. If we denote its components as $vc(s,t) = (g_1(s,t), g_2(s,t))$, its matrix of partial derivatives is egin Dvc(s,t)= left[ egin displaystylepdiff(s,t)& displaystylepdiff(s,t) displaystylepdiff(s,t)& displaystylepdiff(s,t) end ight]. end The chain rule $Dh(s,t) = Df(vc(s,t)) Dvc(s,t)$ becomes egin left[ egin displaystyle pdiff(s,t) & displaystyle pdiff (s,t) end ight] = left[ egin displaystyle pdiffigl(vc(s,t)igr) & displaystyle pdiffigl(vc(s,t)igr) end ight] left[ egin displaystyle pdiff(s,t) & displaystyle pdiff(s,t) displaystyle pdiff(s,t) & displaystyle pdiff(s,t) end ight] end We can compute the matrix product on the right-hand side the result is a $1 imes 2$ matrix (i.e., the same size of $Dh(s,t)$). We obtain one equation by matching the first component of $Dh(s,t)$ with the first component of this multiplied-out matrix. We obtain a second equation by matching the second component of $Dh(s,t)$ with the second component of this multiplied-out matrix. The resulting two equations are egin pdiff(s,t) &= pdiff(vc(s,t))pdiff(s,t) + pdiff(vc(s,t))pdiff(s,t) pdiff(s,t) &= pdiff(vc(s,t))pdiff(s,t) + pdiff(vc(s,t))pdiff(s,t). end This is the chain rule written out in component form for $h : R^2 o R$, $f : R^2 o R$, and $vc : R^2 o R^2$. It is equation () from the special case page.

Now, we compute the answer to our specific problem by substituting in for $f(x,y) = x^2y$ and $vc(s,t) = (t-s^2, ts^2)$. egin pdiff(x,y) &= 2xy &pdiff(vc(s,t)) &= 2ts^2(t-s^2) pdiff(x,y) &= x^2 & pdiff(vc(s,t)) &= (t-s^2)^2 pdiff(s,t) &= -2s &pdiff(s,t) &= 2st pdiff(s,t) &= 1 &pdiff(s,t) &= s^2 end Finally, we get our answers. egin pdiff<> (f circ vc)(s,t) = pdiff(s,t) &= pdiff(vc(s,t))pdiff(s,t) + pdiff(vc(s,t))pdiff(s,t) &= 2ts^2(t-s^2)(-2s) + (t-s^2)^2(2st) &= -4ts^3(t-s^2) + 2st(t^2-2ts^2+s^4) &= -4s^3t^2 + 4s^5t +2st^3 -4s^3t^2 + 2s^5t &= -8s^3t^2+ 6s^5t+2st^3 end egin pdiff<> (f circ vc)(s,t) = pdiff(s,t) &= pdiff(vc(s,t))pdiff(s,t) + pdiff(vc(s,t))pdiff(s,t) &= 2ts^2(t-s^2)(1) + (t-s^2)^2(s^2) &= 2s^2t^2 - 2s^4t + s^2t^2 -2s^4t + s^6 &= 3s^2t^2- 4s^4t+ s^6 end