Ordinary Tensor Differentiation

4 3 2 1 -1 -2 0 1 2 3 4 5 6 males females

I am trying to get my head around tensors because I want to understand the General Theory of Relativity. I already get the basics, as described in the article What is a tensor?. This article is a continuation which focuses on differentiation. It continues the examples from the first article, including the vector-based Singles Clubs and the shoe/snack transformation. If you haven't read the other article, then that statement would make very little sense.

Please note: you'll need a modern HTML5 browser to see the graphs on this page. They are done using SVG. And you'll need Javascript enabled to see the equations which are written in Tex and rendered using MathJax and may take a few seconds to run. This page also has some graphs created using Octave, an open-source mathematical programming package. Click here to see the commands.


Differentation is one half of calculus, a beautiful and intricate branch of mathematics discovered by Isaac Newton and Gottfried Leibnez in the late 1600s. For example, this equation relates shoes to females and males, as introduced in the first tensor article.

\(shoes = 2 * females + 2 * males\)

Differentiation descibes how changes in one variable affect another variable. So when the number of females changes (or differs) by 1, the number of shoes changes by 2. One more female = 2 more shoes. We can use the Greek lower case delta \(\partial\) and write:

\(\frac {change\ in\ shoes} {change\ in\ females} = \frac{\partial\ shoes} {\partial\ females} = \frac{\partial s}{\partial f} = 2\)

When you differentiate an equation like that with more than one variable (females and males) with respect to just one of those variables (females), it's called partial differentiation. The result of differentiation is a derivative, so "differentiate" is very similar to "find the derivative". The result of partial differentiations is a partial derivative.

Jacobian Matrix

There are three other partial derivatives for all the different relationships between the two coordinate systems. I'll use the letter s for shoes and k for snacks:

\(\frac {\partial\ shoes} {\partial\ males} = \frac {\partial s} {\partial m} = 2\)

\(\frac {\partial\ snacks} {\partial\ females} = \frac {\partial k} {\partial f} = 3\)

\(\frac {\partial\ snacks} {\partial\ males} = \frac {\partial k} {\partial m} = 4\)

We can put all these into a matrix together:

\(\frac {\partial\ snacks\ and\ shoes} {\partial\ females\ and\ males} = \begin{bmatrix} \frac {\partial s} {\partial f} & \frac {\partial s} {\partial m} \\ \frac {\partial k} {\partial f} & \frac {\partial k} {\partial m} \end{bmatrix} = \begin{bmatrix} 2 & 2 \\ 3 & 4 \end{bmatrix} \)

This matrix is called the Jacobian matrix. It is a matrix of all the partial derivatives between two coordinate systems. It is the matrix we used for vector and tensor tranformations in the first article. It is the matrix used for all tensor transformations from here on out.

Coordinate Transformation Matrix

Before continuing, I'd like to explain something which confused me a lot: there is another type of matrix which looks very similar to the Jacobian. It is the coordinate transformation matrix or linear transformation matrix used in linear algebra. I got very muddled for a couple months over this distinction and finally resolved it with a question on Physics Stack Exchange, for which I am very grateful.

In summary a coordinate transformation matrix comes from linear algebra. Because these are basically linear algebra equations:

\(shoes = 2 * females + 2 * males\)

\(snacks = 3 * females + 4 * males\)

And we can rewrite these equations using vectors and a matrix:

\(\begin{bmatrix} shoes \\ snacks \end{bmatrix} = \begin{bmatrix} 2 & 2 \\ 3 & 4 \end{bmatrix} \begin{bmatrix} females \\ males \end{bmatrix} = \begin{bmatrix} 2 * females + 2 * males \\ 3 * females + 4 * males \end{bmatrix} \)

The coordinate transformation matrix we end up with is exactly the same as the Jacobian above. And because of that we were able to graph the transformations. But it is not always the same.

Coordinate Transformation Matrix Versus Jacobian

Let's imagine a different set of equations:

\(shoes = 2 * females + 2 * males\)

\(couples = females * males\)

The couples variable represents the total number of possible heterosexual couple combinations. The coordinate transformation matrix T would look like:

\(T = \begin{bmatrix} 2 & 2 \\ 0 & f \end{bmatrix} \)

And these algebra equations can be written as:

\(\begin{bmatrix} shoes \\ couples \end{bmatrix} = \begin{bmatrix} 2 & 2 \\ 0 & females \end{bmatrix} \begin{bmatrix} females \\ males \end{bmatrix} = \begin{bmatrix} 2 * females + 2 * males \\ females * males \end{bmatrix} \)

But the Jacobian matrix would be:

\(\frac {\partial\ shoes\ and\ couples} {\partial\ females\ and\ males} = \begin{bmatrix} \frac {\partial s} {\partial f} & \frac {\partial s} {\partial m} \\ \frac {\partial c} {\partial f} & \frac {\partial c} {\partial m} \end{bmatrix} = \begin{bmatrix} 2 & 2 \\ m & f \end{bmatrix} \)

The difference is in the partial derivatve of couples with respect to females:

\( \frac {\partial\ couples}{\partial\ females} = \frac {\partial c} {\partial f} (f^1 * m) = f^0 * m = m\)

This means that for every female that joins the group, the number of possible couples increases by m, the number of males. This makes sense. And the reverse is also true. For every male joining, the number of couples increases by f.

In this case though the Jacobian is clearly different from the coordinate transformation matrix. The reason is that the coordinate transformation is not linear. It multiplies females times males. The reason they were the same before was their simplicity. All the variables were by themselves, not multiplied with other variables.

In these articles, the matrix T always refers to a Jacobian because that's what tensor transformations use.

Non-Linear Differentiation

The shoes/females partial differentiation above was pretty basic. Differentiation gets much more interesting for non-linear equations, equations with a \(square^2\) or \(\sqrt{root}\).

For example if every one of our females and males gave a hug to every other person including themselves, then we could write an equation:

\(hugs = people * people = people^2\)

Now how can we compute how many extra hugs need to be given for every extra person in the group? In other words, what is:

\(\frac {change\ in\ hugs} {change\ in\ people} = \frac{\partial\ hugs} {\partial\ people} = \frac{\partial h}{\partial p} = \ ?\)

We can try some values. If there are 10 people, then there will be 100 hugs. Adding 1 more person leads to 121 hugs, 21 more. A 12th person means 23 more hugs. So the relationship between hugs and people is not obvious.

To differentiate we can write this out as an equation, where \(\Delta p\) represents the additional people:

\(\frac {change\ in\ hugs} {change\ in\ people} = \frac{\partial h}{\partial p} = \lim\limits_{\Delta p \to 0} \frac{(p\ + \Delta p)^2\ -\ p^2}{\Delta p} \)

This equation says that we will take the number of hugs for \(p\ +\ \Delta p\) people and subtract the number of hugs for just \(p\) people (this is the change in the number of hugs). Then we will divide by the \(\Delta p\) to get the hugs per person. And finally imagine that \(\Delta p\) is actually really really small. In fact it goes to zero (that's what the "limit" is for). We can reduce the equation and get a result:

\(\frac{\partial h}{\partial p} = \lim\limits_{\Delta p \to 0} \frac{(p\ + \Delta p)^2\ -\ p^2}{\Delta p} = \lim\limits_{\Delta p \to 0} \frac{p^2 + 2 p\Delta p + \Delta p^2 - p^2}{\Delta p} = \lim\limits_{\Delta p \to 0} \frac{2 p\Delta p + \Delta p^2}{\Delta p} = \lim\limits_{\Delta p \to 0} (2p + \Delta p) = 2p \)

So when there are 10 people, the number of hugs is changing at a rate of \(2p = 20\). And for 11 people, \(2p = 22\). This doesn't quite match the numbers above because differentiation accounts for infinitesimally small changes. If there were 10.001 people, there would be 100.020001 hugs, which is much closer to 20 hugs per person. As the change in the number of people \(\Delta p\) approaches zero, the change in hugs approaches exactly \(2p\).

I imagine that originally this method was used to differentiate every equation, but quickly some tricks were noticed. For powers like \(p^2\) and \(p^3\), you can find the derivative by multiplying by the power, and reducing the power by one. So the derivatives of these equations are:

\(\frac{\partial}{\partial p} (p^2) = 2 p^{2-1} = 2p^1 = 2p\)

\(\frac{\partial}{\partial p} (p^3) = 3 p^{3-1} = 3p^22\)

Any variables with a lower power of 1 turn into the number 1. This is because the power of 1 becomes a power of 0 after differentiation, and anything to the power of 0 is 1:

\(\frac{\partial}{\partial p} (p) = 1 p^{1-1} = p^0 = 1\)

It also words for fractional powers:

\(\frac{\partial}{\partial p} (\sqrt{p}) = \frac{\partial}{\partial p} (p^{\frac12}) = \frac12 p^{-\frac12} = \frac1{2\sqrt{p}} \)

It's also worth nothing that the derivative of a single number like 4 is 0. Because numbers like 4 never change:

\(\frac{\partial}{\partial p} (4) = 0 \)

Chain Rule

Those are the basics, but we will also need the chain rule. This occurs when you have equations within equations. You have to find the derivative of the outer bit and multiply it by the derivative of the inner bit. For example, we can differentiate hugs in terms of females:

\(h = p^2 = (f + m)^2\)

derivative of the outer bit = \(\frac{\partial}{\partial p}(p^2) = 2p \)

derivative of the inner bit = \(\frac{\partial}{\partial f}(f + m) = f^0 + 0 = 1\)

final derivative = \(\frac{\partial h}{\partial f} = 2p * 1 = 2 (f+m) = 2f + 2m \)

So for every extra female, there are \(2f + 2m\) extra hugs, which agrees with the result above. If there are 6 women and 4 men, there will be \(2f + 2m = 2p = 20\) extra hugs per infinitesimal female/person. The chain rule can be checked by expanding the original equation:

\(h = p^2 = (f + m)^2 = f^2 + 2fm + m^2\)

\(\frac{\partial h}{\partial f} = 2f^1 + 2f^0m = 2f + 2m\)

A consequence of this rule deals with the multiplication of two equations. The derivative of the first times the second is added to the derivative of the second times the first. For example:

\(h = p^2 = (f+m)^2 = (f + m)(f + m)\)

\(\frac{\partial h}{\partial f} = \frac{\partial}{\partial f}(f+m) (f+m) + (f+m) \frac{\partial}{\partial f}(f+m) = 1 * (f+m) + (f+m) * 1 = 2f + 2m\)

This will be very important in two sections time when we look at tensor differentiation for the first time. That is a lot of information to take in at once if you have never seen calculus before. So I also recommend this graphical and fun treatment of derivatives.

Tensor Fields

Differentiation deals with equations, but the vectors and tensors we've so far dealt with are just numbers, or points in a coordinate system. How does one differentiation a point? The answer is that one doesn't. The term "tensor differentiation" actually refers to "tensor field differentiation".

Imagine a grassy field with a single cow. Now imagine a cow attached to every single blade of grass. It would be very crowded, but you could call it a cow field. A tensor field is similar, with a tensor attached to every single point in a coordinate system.

Since vectors are a type of tensor, it's easier to think about vector fields instead. Let's define a vector field in our female/male coordinate system. The 2 components of our vector field will be the total number of people \(f + m\) and the number 1.

\(V = \begin{bmatrix} f + m \\ 1 \end{bmatrix} \)

So from every point on the female/male coordinate system we can draw an arrow that goes \(f+m\) across and 1 up. It looks like this:

Vector field V

For example at point (1,1), there is an arrow going (2,1) and ending up at (3,2). Of course, I've only drawn vectors from the grid line intersections. A real vector field has vectors everywhere, at (1.1, 1.1) and (1.001, 1.001) and so on. Like infinite cows in a field each nibbling on infinitely small blades of grass.

And it is possible to differentiate this vector (a.k.a tensor) field. We have to differentiate it separately with respect to the women and the men, so we'll end up with twice as many components. \(V^1\) and \(V^2\) represent the two parts of the vector field:

\(\frac{\partial}{\partial f} V = \begin{bmatrix} \frac{\partial}{\partial f} V^1 \\ \frac{\partial}{\partial f} V^2 \end{bmatrix} = \begin{bmatrix} \frac{\partial}{\partial f} (f + m) \\ \frac{\partial}{\partial f} (1) \end{bmatrix} = \begin{bmatrix} 1 \\ 0 \end{bmatrix} \)

\(\frac{\partial}{\partial m} V = \begin{bmatrix} \frac{\partial}{\partial m} V^1 \\ \frac{\partial}{\partial m} V^2 \end{bmatrix} = \begin{bmatrix} \frac{\partial}{\partial m} (f + m) \\ \frac{\partial}{\partial m} (1) \end{bmatrix} = \begin{bmatrix} 1 \\ 0 \end{bmatrix} \)

These partial derivatives are vectors in their own right and we can add them to the graph. They are the same for every point, so I'll just show them once at point (1,1). They are both the same (1,0) so both point across:

Partial derivatives at (1,1)

What does this actually mean? The partial derivates describe what happens to the vector field as we move through the coordinate system. For example, moving 1 space to the right, means moving one female to the right. As we do that the top part of the vector field grows by 1 and the bottom bit stays the same, just as the \(\frac{\partial}{\partial f} V\) predicts.

We can join the two partials together by randomly introducing another letter which respresents both the female and male components. Physics books often do this, throwing in a \(\partial x\) at every available opportunity. But I've also seen a further shorthand, with just a lone \(partial\). I prefer that, but they all do the same combining trick. The \(\equiv\) just means that they are equivalent notation:

\(\frac{\partial}{\partial f\&m}V \equiv \frac{\partial}{\partial x}V \equiv \partial V = \begin{bmatrix} \frac{\partial}{\partial f} V & \frac{\partial}{\partial m} V \end{bmatrix} = \begin{bmatrix} \frac{\partial}{\partial f} V^1 & \frac{\partial}{\partial m} V^1 \\ \frac{\partial}{\partial f} V^2 & \frac{\partial}{\partial m} V^2 \end{bmatrix} = \begin{bmatrix} 1 & 1 \\ 0 & 0 \end{bmatrix} \)

It is also possible, even recommended, to transform this vector field into the shoes/snacks coordinate system. A transformed \(V\) inherits a ~ and becomes \(\tilde{V}\):

\(\tilde{V} = TV = \begin{bmatrix} 2 & 2 \\ 3 & 4 \end{bmatrix} \begin{bmatrix} f + m \\ 1 \end{bmatrix} = \begin{bmatrix} 2 * (f + m) + 2 * 1 \\ 3 * (f + m) + 4 * 1 \end{bmatrix} = \begin{bmatrix} 2f + 2m + 2 \\ 3f + 3m + 4 \end{bmatrix} \)

This means that if we graphed the vector field \(V\) in the shoes/snacks coordinates, the arrows would be much bigger. The arrow from the point (1,1) would go (6,10) and end up at (7,11).

And then we can differentiate the transformed vector:

\(\frac{\partial}{\partial f} \tilde{V} = \begin{bmatrix} \frac{\partial}{\partial f} (2f + 2m + 2) \\ \frac{\partial}{\partial f} (3f + 3m + 4) \end{bmatrix} = \begin{bmatrix} 2 \\ 3 \end{bmatrix} \)

\(\frac{\partial}{\partial m} \tilde{V} = \begin{bmatrix} \frac{\partial}{\partial m} (2f + 2m + 2) \\ \frac{\partial}{\partial m} (3f + 3m + 4) \end{bmatrix} = \begin{bmatrix} 2 \\ 3 \end{bmatrix} \)

And we can write it all together:

\(\partial \tilde{V} = \begin{bmatrix} \frac{\partial}{\partial f} \tilde V^1 & \frac{\partial}{\partial m} \tilde V^1 \\ \frac{\partial}{\partial f} \tilde V^2 & \frac{\partial}{\partial m} \tilde V^2 \end{bmatrix} = \begin{bmatrix} 2 & 2 \\ 3 & 3 \end{bmatrix}\)

What does this actually mean in terms of snacks and shoes? Not much really. Our vector field with components \(f+m\) and 1 has long since lost touch with the actual number of females/males and shoes/snacks.

Also note that we could have done all of the above with respect to shoes/snacks instead of females/males. Using the equations from the beginning, our vector field in the shoes/snacks system is:

\(V = \begin{bmatrix} f + m \\ 1 \end{bmatrix} = \begin{bmatrix} (2s-k) + (-1.5s+k) \\ 1 \end{bmatrix} = \begin{bmatrix} 0.5s \\ 1 \end{bmatrix} \)

This makes sense, the total number of females and males \(f + m\) is equal to half the number of shoes \(0.5s\). And its derivative with respect to shoes/snacks is:

\(\tilde \partial V = \begin{bmatrix} \frac{\partial}{\partial s} V & \frac{\partial}{\partial k} V \end{bmatrix} = \begin{bmatrix} \frac{\partial}{\partial s} V^1 & \frac{\partial}{\partial k} V^1 \\ \frac{\partial}{\partial s} V^2 & \frac{\partial}{\partial k} V^2 \end{bmatrix} = \begin{bmatrix} 0.5 & 0 \\ 0 & 0 \end{bmatrix} \)

Note the squiggly thing on top of the \(\tilde \partial\) indicates we are differentiating with respect to the transformed coordinates. To complete the picture, we can also write \(\tilde{V}\) in terms of the shoes/snacks:

\(\tilde{V} = TV = \begin{bmatrix} 2f + 2m + 2 \\ 3f + 3m + 4 \end{bmatrix} = \begin{bmatrix} 2(2s-k) + 2(-1.5s+k) + 2 \\ 3(2s-k) + 3(-1.5s+k) + 4 \end{bmatrix} = \begin{bmatrix} s + 2 \\ 1.5s + 4 \end{bmatrix} \)

And we can differentiate the transformed vector field with respect to the transformed shoes/snacks coordinates, so now both the \(\tilde{V}\) and \(\tilde \partial\) have squigglies.

\(\tilde \partial \tilde V = \begin{bmatrix} 1 & 0 \\ 1.5 & 0 \end{bmatrix} \)

Differentiation Seems Covariant

There is one more friendly bit of notation we can use. In the previous article I described tensor multiplication in detail. We can use that notation instead of just \(\partial V\):

\(\partial_a V^b\)

This immediately tells the reader (you) that taking the partial derviatives of V forms a two dimensional tensor-like thing. It is like a vector direct product and results in a matrix with a separate entry for each of the a partial derivatives of the b components of V.

\(\partial_a V^b = \begin{bmatrix} \frac{\partial}{\partial f} & \frac{\partial}{\partial m} \end{bmatrix} \otimes \begin{bmatrix} V^1 \\ V^2 \end{bmatrix} = \begin{bmatrix} \frac{\partial}{\partial f} V^1 & \frac{\partial}{\partial m} V^1 \\ \frac{\partial}{\partial f} V^2 & \frac{\partial}{\partial m} V^2 \end{bmatrix} \)

Also note that the index of \(\partial_a\) is a subscript and is written horizontally. This indicates that differentiation is trying to be covariant like a linear function. Covariant vectors transform using S instead of T. We can use S to transform the derivative with respect to females/males into the derivative with respect to shoes/snacks:

\(\tilde \partial_a V^b = (\partial_a V^b) {S^a}_c = \begin{bmatrix} 1 & 1 \\ 0 & 0 \end{bmatrix} \begin{bmatrix} 2 & -1 \\ -1.5 & 1 \end{bmatrix} = \begin{bmatrix} 0.5 & 0 \\ 0 & 0 \end{bmatrix} \)

In S the columns indexed c are multiplied/added and the rows indexed a are contracted away. We start and end with something which looks like a rank (1,1) tensor. But although it looks and smells like a tensor, it isn't actually one, as you'll see in the next section.

The Problem With Ordinary Tensor Diffentiation

Now we have done a lot of nasty stuff to that vector field. It's been differentiated \(\partial V\), it's been transformed \(\tilde V\), and for the grand finale it was transformed and then differentiated \(\tilde \partial \tilde V\). But can we cut out the middle man? Can we go straight from \(\partial V\) to \(\tilde \partial \tilde V\)? Can we write an equation with \(\tilde \partial \tilde V\) on the left and \(\partial V\) somewhere on the right?

First of all, we know that we can transform and then differentiate:

\(\tilde \partial \tilde V = \tilde \partial (TV) = \partial (TV) S\)

We can see this in action as above:

\(\tilde \partial \tilde V = \partial (T V) S = \partial \begin{bmatrix} 2f + 2m + 3 \\ 3f + 3m + 4 \end{bmatrix} \begin{bmatrix} 2 & -1 \\ -1.5 & 1 \end{bmatrix} = \begin{bmatrix} 2 & 2 \\ 3 & 3 \end{bmatrix} \begin{bmatrix} 2 & -1 \\ -1.5 & 1 \end{bmatrix} = \begin{bmatrix} 1 & 0 \\ 1.5 & 0 \end{bmatrix} \)

But does that mean we can also differentiate and then transform? In other words, does \(\tilde \partial \tilde V = T \partial V S\) ?

The answer is a resounding "NO", and that is what gives ordinary tensor differentiation such a bad reputation:

\(\partial (TV) S \ne T \partial V S \)

The reason they are not the same is the chain rule. If you transform and then differentiate you have to compute \(\partial (TV)\). By the chain rule this will become:

\(\partial (TV) = \partial T V + T \partial V\)

And the whole equation becomes:

\(\tilde \partial \tilde V = \partial (TV) S = \partial T V S + T \partial V S\)

The term on the right \(T \partial V S\) is what we would expect from a well behaved tensor differentiation technique. If it was just that, then everything would be fine. But the term on the left \(\partial T V S\) throws the whole thing out of wack. It's an ugly spanner in the works. \(\partial T\) is 3 dimensional box of numbers. In our example, it doesn't actually matter, because the \(\partial T\) is a 2x2x2 box of zeros, and so turns everything it is multiplied by into nothingness as well. But if \(T\) had been something a bit more spectacular like:

\( T = \begin{bmatrix} f & 0 \\ 0 & 1 \end{bmatrix} \)

Then differentiating it would lead to (imagine this as a 3 dimensional matrix with a front half and smaller lower back half):

\(\partial T = \begin{bmatrix} \frac{\partial}{\partial f} \frac{\partial s}{\partial f} & \frac{\partial}{\partial f} \frac{\partial s}{\partial m} \\ \frac{\partial}{\partial f} \frac{\partial k}{\partial f} & \frac{\partial}{\partial f} \frac{\partial k}{\partial m} \end{bmatrix} _\class{matrix3d}{\begin{bmatrix} \frac{\partial}{\partial m} \frac{\partial s}{\partial f} & \frac{\partial}{\partial m} \frac{\partial s}{\partial m} \\ \frac{\partial}{\partial m} \frac{\partial k}{\partial f} & \frac{\partial}{\partial m} \frac{\partial k}{\partial m} \end{bmatrix}} = \begin{bmatrix} 1 & 0 \\ 0 & 0 \end{bmatrix} _\class{matrix3d}{\begin{bmatrix} 0 & 0 \\ 0 & 0 \end{bmatrix}} \)

And the extra term \(\partial T V S\) would not be zero. Part of the definition of tensors is that tensors must transform by multiplying by \(S\) and \(T\). Because ordinary tensor differentiation throws in that extra gumph, this is no longer the case. It ultimately means is that the ordinary derviative of a tensor field is not a tensor field. \(\tilde \partial \tilde V\) is not a tensor. And that's why ordinary tensor differentiation is so frowned upon in the tensor world.

I am not one to take things for granted so I will now prove this to myself using the T above and its corresponding inverse S:

\(\tilde \partial \tilde V = \partial (T V) S = \partial ( \begin{bmatrix} f & 0 \\ 0 & 1 \end{bmatrix} \begin{bmatrix} f + m \\ 1 \end{bmatrix} ) \begin{bmatrix} \frac1f & 0 \\ 0 & 1 \end{bmatrix} = \) \( \partial \begin{bmatrix} f^2 + fm \\ 1 \end{bmatrix} \begin{bmatrix} \frac1f & 0 \\ 0 & 1 \end{bmatrix} = \begin{bmatrix} 2f + m & f \\ 0 & 0 \end{bmatrix} \begin{bmatrix} \frac1f & 0 \\ 0 & 1 \end{bmatrix} = \begin{bmatrix} 2 + \frac{m}{f} & f \\ 0 & 0 \end{bmatrix} \)

And now the errant foolhardy method:

\(\tilde \partial \tilde V \ne T \partial V S = \begin{bmatrix} f & 0 \\ 0 & 1 \end{bmatrix} \partial \begin{bmatrix} f + m \\ 1 \end{bmatrix} \begin{bmatrix} \frac1f & 0 \\ 0 & 1 \end{bmatrix} = \) \( \begin{bmatrix} f & 0 \\ 0 & 1 \end{bmatrix} \begin{bmatrix} 1 & 1 \\ 0 & 0 \end{bmatrix} \begin{bmatrix} \frac1f & 0 \\ 0 & 1 \end{bmatrix} = \begin{bmatrix} f & f \\ 0 & 0 \end{bmatrix} \begin{bmatrix} \frac1f & 0 \\ 0 & 1 \end{bmatrix} = \begin{bmatrix} 1 & f \\ 0 & 0 \end{bmatrix} \)

See? They are different. The chain rule was not respected in the bottom one and so the result is lacking something in the top left corner. This difference is precisely \(\partial T V S\):

\(\partial T V S = \begin{bmatrix} 1 & 0 \\ 0 & 0 \end{bmatrix} _\class{matrix3d}{\begin{bmatrix} 0 & 0 \\ 0 & 0 \end{bmatrix}} \begin{bmatrix} f + m \\ 1 \end{bmatrix} \begin{bmatrix} \frac1f & 0 \\ 0 & 1 \end{bmatrix} = \begin{bmatrix} f+m & 0 \\ 0 & 0 \end{bmatrix} \begin{bmatrix} \frac1f & 0 \\ 0 & 1 \end{bmatrix} = \begin{bmatrix} 1 + \frac{m}{f} & 0 \\ 0 & 0 \end{bmatrix} \)

Differential geometry

And now for a realish example. I'm only now (13 September 2015) starting to get to grips with how all of the stuff above relates to general relativity.

Specifically, the field of differential geometry studies how geometric objects like spheres and vector fields behave when things are differentiated in and around them.

One of the main ideas seems to be that it is much easier to work with flat surfaces. For example, it is very convenient to pretend the Earth is flat when planning a day out. Why bother introducing the curvature of the Earth when calculating the route to your picnic spot? Simply imagine that the sphere we live on is actually a plane, and all the computations of angles and distances become much eaiser. Then if you do end up working for air traffic control one day, and the curvature of the Earth does become a factor, you can still pretend it's flat but just introduce some special rules into your equations (I have no idea if this is how air traffic is actually controlled).

I think the same thing happens in general relativity. The Sun bends space and time around it. Rather than factoring all the complicated bending into every equation, it is eaiser to pretend that space is flat and unbent over small distances, and then compensate for the bending later when needed.

Those "special rules" and "compensations" are the leftover bits in ordinary tensor differentiation, things like \(\partial T V S\).


For example, imagine that you live in outer space, on the surface of a giant floating cylinder called Cyland. It looks like this:

The habitable cylinder Cyland

This summer you are planning to leave your cosy home in Math Town, located at coordinates \((x,y) = (-\pi/2, -\pi/2) \approx (-1.57, -1.57)\) and take a long holiday to see the famous Cyland beauty point at \((x,y) = (\pi/2, \pi/2) \approx (1.57, 1.57)\). Here is a map of the journey you are planning, which nicely follows the equation \(y=x\):

Your journey on a plane

Although your map seems flat, in reality it is wrapped around a cylinder. Shown on Cyland, your journey looks like this:

Your journey on the cylinder

Now I'll rotate Cyland a bit to the right, so we can view it from the front:

Your journey from the front

Let's drop the third dimension. (You live in outer space. You're allowed to do that kind of thing now and then.) So now we are back to a two dimensional space. This is the path that somebody watching your holiday from a distance would see.

line in curved space

In summary, the first and last plots show the same exact journey from two different points of view. One from your point of view, where the world seems like a vast flat plane. The other from outer space where the curvature of your cylindrical habitat is clearly visible:

y=x on a plane line in curved space

You may recall from the article on special relativity that the laws of physics are the same in all inertial frames of reference. So it shouldn't matter which point of view we use, the equations of physics (things like speed and acceleration) should yield the same results.

Cyland transformation

Let's check if this is true. First we'll label the coordinates. We're already using x and y for the flat map. For Cyland, we can use s and u for sideways and up/down. The equations for going between the two cordinate systems are:

\(s = x\)

\(u = \sin(y)\)

And for the other direction:

\(x = s\)

\(y = \arcsin(u)\)

We'll use a very simple vector field for this example. Imagine this is the Cyland nightly news. There is a strong south-westerly wind blowing (coming from the south-west, towards the north-east). Mathematically we can describe the wind as a vector field. I chose 0.2 as it's nicely graphable on this scale:

\(V = \begin{bmatrix} 0.2 \\ 0.2 \end{bmatrix} \)

Because it is constant over the whole of the map, the derivative of this wind is zero. It does not change direction or magnitude as you move around the map:

\(\partial V = \partial \begin{bmatrix} 0.2 \\ 0.2 \end{bmatrix} = \begin{bmatrix} 0 & 0 \\ 0 & 0 \end{bmatrix}\)

The weather map and its derivative look like this (yes the second one is meant to be blank because the derivative is 0):

Vector field [0.2,0.2] Vector field with no derivative

Now we need to compute the transformation matrix T for going between the flat map and Cyland coordinates. This uses the derivative of a sine:

\(T = \frac {\partial\ Cyland} {\partial\ flat\ map} = \begin{bmatrix} \frac {\partial s} {\partial x} & \frac {\partial s} {\partial y} \\ \frac {\partial u} {\partial x} & \frac {\partial u} {\partial y} \end{bmatrix} = \begin{bmatrix} 1 & 0 \\ 0 & \cos (y) \end{bmatrix} \)

To convert the T into Cyland coordinates we have to use a geometric relation:

\(T = \frac {\partial\ Cyland} {\partial\ flat\ map} = \begin{bmatrix} 1 & 0 \\ 0 & \cos (y) \end{bmatrix} = \begin{bmatrix} 1 & 0 \\ 0 & \cos (\arcsin u) \end{bmatrix} = \begin{bmatrix} 1 & 0 \\ 0 & \sqrt {1 - u^2} \end{bmatrix} \)

And now we can transform the vector field:

\(\tilde V = TV = \begin{bmatrix} 1 & 0 \\ 0 & \sqrt {1 - u^2} \end{bmatrix} \begin{bmatrix} 0.2 \\ 0.2 \end{bmatrix} = \begin{bmatrix} 0.2 \\ 0.2 \sqrt {1 - u^2} \end{bmatrix} \)

This is graphed below on the left. Notice that at the top and bottom edges the wind looks horizontal. This is because we are viewing the wind as it curves around the cylinder. The wind's vertical component is compressed. And its derivative is:

\(\tilde \partial \tilde V = \tilde \partial \begin{bmatrix} 0.2 \\ 0.2 \sqrt {1 - u^2} \end{bmatrix} = \begin{bmatrix} 0 & 0 \\ 0 & \frac {-0.4u} {\sqrt {1 - u^2}} \end{bmatrix} \)

This derivative has four parts, corresponding to how s and u change with respect to s and u. The only non-zero part is how u changes with respect to u. I have graphed this below on the right. You can see that at the top and bottom edge (where the view from outer space is most distorted), the change is huge. In the middle it is zero:

Cyland vector field Cyland vector field

The whole point of this Cyland exercise is to show that ordinary tensor differentiation doesn't work because we can't transform the wind derivative from the flat map (where there is no derivative) to Cyland (where there definitely is one). Mathematically that's because:

\(T \partial V \ne \partial (TV)\)

The derivative of V is zero and the derivative of TV is not. The discrepancy in this calculation is the leftover bit which causes so many problems:

\(\partial T V = \begin{bmatrix} 0 & 0 \\ 0 & 0 \end{bmatrix} _\class{matrix3d}{\begin{bmatrix} 0 & 0 \\ 0 & \frac {-2u}{\sqrt {1 - u^2}} \end{bmatrix}} \begin{bmatrix} 0.2 \\ 0.2 \end{bmatrix} = \begin{bmatrix} 0 & 0 \\ 0 & \frac {-0.4u} {\sqrt {1 - u^2}} \end{bmatrix}\)

Which is exactly what we were expecting!

This example has hopefully shown you some of the issues involved in differential geometry. By the laws of relativity, the wind and its derivative should behave the same no matter where we are viewing from. Whether we're standing on Cyland are observing from a distance in outer space, we should be able to calculate the same derivative (using the transformation rules). Using ordinary tensor differentiation, we don't. And that's a problem.


We have so far covered approximately 2 pages from the text book I am reading: page 62 on tensor fields and 68 on ordinary tensor differentiation. Now it's going to get difficult. Next up is a fix for the woes above using the Lie Derivative.

Octave commands

Below are the Octave commands used to create the graphs shown in this page:

%Set up the figure for outputting to file
h = figure();
set (h,'PaperSize',[4,4]);
set (h,'PaperPosition',[0,0,4,4]);

hold off;
x = [-pi/2:0.01:pi/2];
plot (x, x, 'LineWidth', 2, 'r-');
grid on; xlabel ('x'); ylabel ('y');
title ("Your journey on a flat map");
print "plots/otd-plane-xy.gif";

%cylinder as mesh, as I can't plot a line on top of "surf (cylinder())"
hold off;
[mx my] = meshgrid ([-1:0.1:1], [-2:0.4:2]);
hold off; mesh (mx, my, sqrt (1 - mx.^2));
hold on;
title ("Cyland floating out in space");
mesh (mx, my, -sqrt (1 - mx.^2));
view (142, 90-71); %rotate for a nice view
print "plots/otd-cyland.gif";

%plot a path on the cylinder
plot3 (cos(t), t, sin(t), 'LineWidth', 1); %prints yellow first time for some reason
plot3 (cos(t), t, sin(t), 'LineWidth', 4);
view (142, 90-71); %rotate for a nice view (from numbers at the bottom)
title ("Cyland with your journey wrapped onto it");
print "plots/otd-cyland-with-path.gif";

view (91, 90-79);
title ("Your journey in 3D from the front");
print "plots/otd-cyland-rotated.gif";

%now just plot the sine
hold off;
plot (x, sin(x), 'LineWidth', 2, 'r-');
grid on; xlabel ('s'); ylabel ('u');
title ("Your journey in 2D from the front");
print "plots/otd-cyland-xsinx.gif";

%Vector field [0.2,0.2]
hold off;
[x, y] = meshgrid (-pi/2:pi/10:pi/2);
quiver (x, y, 0.2, 0.2, 'g-', 'AutoScale', 'off', 'MaxHeadSize', 0.1);
title ("Wind on the flat map");
grid on; xlabel ('x'); ylabel ('y');
print "plots/otd-plane-vector-field-11.gif";

%Blank map
hold off; 
[x, y] = meshgrid (-pi/2:pi/10:pi/2);
quiver (x, y, 0, 0, 'b-', 'AutoScale', 'off', 'MaxHeadSize', 0.1);
title ("How flat map wind is changing");
grid on; xlabel ('x'); ylabel ('y');
print "plots/otd-plane-vector-field-11-derivative.gif";

%Vector field in Cyland
hold off;
[s, u] = meshgrid (-1:0.2:1);
quiver (s, u, 0.2, 0.2 * sqrt (1-u.^2), 'b-', 'AutoScale', 'off', 'MaxHeadSize', 0.1);
title ("Same wind seen from outer space");
grid on; xlabel ('s'); ylabel ('u');
print "plots/otd-cyland-vector-field.gif";

%Vector field derivative in Cyland
hold off;
[s, u] = meshgrid (-1:0.2:1);
quiver (s, u, 0.2, -0.4 * u ./ sqrt (1-u.^2), 'b-', 'AutoScale', 'off', 'MaxHeadSize', 0.1);
title ("How the u coord of wind changes");
grid on; xlabel ('s'); ylabel ('u');
print "plots/otd-cyland-vector-field-derivative.gif";