Matrices as derivatives

If you don't know about matrices (at least how to multiply them), read http://www.mathsisfun.com/algebra/matrix-multiplying.html first. (This is very elementary.)

Now, if you have m functions of n variables each, then you can put their partial derivatives into an m-by-n matrix; for example, if you have 2 functions of 3 variables each, say u = f(x, y, z) and v = g(x, y, z), then the partial derivatives fit into a 2-by-3 matrix

u/∂x u/∂y u/∂z
v/∂x v/∂y v/∂z⎦;
we may call this matrix d(u, v)/d(x, y, z). You can also think of this as the result of applying the matrix of functions
D1f D2f D3f
D1g D2g D3g⎦,
which may be called D(f, g). Then we have d(u, v)/d(x, y, z) = D(f, g)(x, y, z).

If you have an ordinary function y = f(x), you can think of this as a group of only 1 function of only 1 variable each, so that d(y)/d(x) = D(f)(x) is a 1-by-1 matrix, consisting of a single entry, which is the usual derivative dy/dx = f′(x). That is, d(y)/d(x) = [dy/dx], and D(f) = [f′].

If you have a parametrized curve in 3 dimensions, say P = (x, y, z) = (f(t), g(t), h(t)), then this is a group of 3 functions of 1 variable each, so that d(x, y, z)/d(t) = D(f, g, h)(t) is a 3-by-1 matrix, consisting of a single column with 3 entries, which are the components of the velocity vector dP/dt = ⟨f′(t), g′(t), h′(t)⟩. (For this reason, ordinary vectors that represent change of a point are sometimes called column vectors.)

If you have a function of 3 variables, say u = F(x, y, z), you can think of this as a group of 1 function of 3 variables each, so that d(u)/d(x, y, z) = D(F)(x, y, z) is a 1-by-3 matrix, consisting of a single row with 3 entries, which are the components of the gradient vector ⟨∂u/∂x, ∂u/∂y, ∂u/∂z⟩ = ∇F(x, y, z). (For this reason, vectors such as gradients that represent change with respect to a point are sometimes called row vectors.)

If you have both (x, y, z) = (f(t), g(t), h(t)) and u = F(x, y, z), then composition makes u an ordinary function of t; specifically, u = F ∘ (f, g, h). Recall the defining property of the gradient from page 26 of my notes: (F ∘ (f, g, h))′(t) = ∇F(f(t), g(t), h(t)) · ⟨f′(t), g′(t), h′(t)⟩; or du/dt = ⟨∂u/∂x, ∂u/∂y, ∂u/∂z⟩ · ⟨dx/dt, dy/dt, dz/dt⟩. The same thing can be expressed using matrix multiplication as d(u)/d(t) = d(u)/d(x, y, z) d(x, y, z)/d(t), because a matrix row by a matrix column are multiplied by the same method as the dot product.

More generally, all of the forms of the Chain Rule in Section 13.4 of the textbook can be expressed using matrix multiplication. If a point R (in any number of dimensions) is a function of a point Q (in any number of dimensions), which is a function of a point P (in any number of dimensions), then dR/dP = dR/dQ dQ/dP.


Go back to the the course homepage.
This web page was written between 2003 and 2018 by Toby Bartels, last edited on 2018 April 17. Toby reserves no legal rights to it.

The permanent URI of this web page is http://tobybartels.name/MATH-2080/2018SP/matrices/.

HTML 5