1
$\begingroup$

This question arises after reading through several Stack Exchange posts and after a long chat with another user in a previous question I asked about this topic. The following "contradiction" seems to occur with the following conventions and definitions:

1.


Let $V$ be $n-$dimensional $\mathbb{R}-$vector space with a basis $(e_\mu)_{\mu=1, ... , n},$ and let $V^*$ be the dual vector space with the dual basis $(e^{* \nu})_{\mu=1, ... , n}.$

Let $$M = e_\mu M^\mu_{\ \ \nu} \otimes e^{* \nu} \in V \otimes V^{*} \cong {\cal L}(V;V)$$ be a linear map from $V$ to $V.$ We use the North-West South-East convention for the position of the indices on $M^\mu _{\ \ \nu}$.

Let $$M^T = e^{*\nu} (M^T)_\nu^{\ \ \mu} \otimes e_{\mu} \in V \otimes V^{*} \cong {\cal L}(V^*;V^*)$$ be the transposed linear map from $V^*$ to $V^*.$ We use the South-West North-East convention for the position of the indices on $(M^T)_\nu ^{\ \ \mu}$.

Let $$g =e^{*\mu} g_{\mu \nu} \odot e^{*\nu} \in \mathsf{Sym}^2 V^{*}= V^{*} \odot V^{*}.$$ be an (indefinite) metric, i.e. an invertible element in the symmetrized tensor product.

Reference: Why is not ${(\Lambda^T)^\mu}_\nu = {\Lambda_\nu}^\mu$?

2.


$$(M^T)_{\mu}^{\ \ \nu} := M^{\nu}_{\ \ \mu}$$

References: Why is not ${(\Lambda^T)^\mu}_\nu = {\Lambda_\nu}^\mu$? , Transpose of (1,1) tensor

3.


$$M^\mu_{\ \ \nu} := g_{\mu \alpha} g^{\nu \beta} M^{\alpha}_{\ \ \beta} $$

Edit: This reference to a stackexhange question no longer exists. This step is in error.

4.


$$ g_{\mu \alpha} g^{\nu \beta} M^{\alpha}_{\ \ \beta} = M_{\mu}^{\ \ \nu}$$

5. Combining 2, 3, and 4 yields

$$M^{\nu}_{\ \ \mu} = M_{\mu}^{\ \ \nu}$$ Which doesn't seem like it is always true.

Edit: After looking at the replies, I think I can summarize my confusion as follows (this may be helpful for others):

There exists the metric and the inverse metric $$g: V \to V^*$$ $$g^{-1}: V^* \to V$$

Suppose we have a linear transformation $M: V \to V.$ There exists a set of four naturally associated linear transformations: $$M: V \to V$$ $$(g \circ M): V \to V^*$$ $$(M \circ g^{-1}): V^* \to V$$ $$(g \circ M \circ g^{-1}): V^* \to V^*$$

These are the maps one obtains by applying the metric to lower/raise indices. However, given $M$, there also exists another naturally associated linear map, the transpose. As explained in J. Murray's answer, the proper way to view the transpose is as a map $M^T: V^* \to V^*.$ This gives us four new *distinct *maps: $$M^T: V^* \to V^*$$ $$(g^{-1} \circ M^T): V^* \to V$$ $$(M^T \circ g): V \to V^*$$ $$(g^{-1} \circ M^T \circ g): V \to V$$ It turns out that $(g^{-1} \circ M^T \circ g): V \to V$ is the adjoint $M^{\dagger}$. In general though, these four new maps are different than the previous four, and this is where things got confusing for me. Things get particuarly interesting if we think of $M$ as a type $(1,1)$ tensor. In this case, there are three other "naturally associated" type $(1,1)$ tensors - but all of them are different!

$\endgroup$
1
  • 5
    $\begingroup$ $M^\mu_{\ \ \nu} := g_{\mu \alpha} g^{\nu \beta} M^{\alpha}_{\ \ \beta}$ You can’t have $\mu$ be upper on the left and lower on the right. Similarly for $\nu$. $\endgroup$ Commented Feb 2, 2021 at 21:37

2 Answers 2

1
$\begingroup$

There's a lot of confusing stuff around, which I have regrettably contributed to at various times. I'll try to set the record straight. Let $V$ be a finite-dimensional, real vector space, and $V^*$ its dual, consisting of all linear functionals $\omega : V \rightarrow \mathbb R$. This space will be endowed with a metric $g:V\times V \rightarrow \mathbb R$ which induces a dual metric $\tilde g : V^* \times V^* \rightarrow \mathbb R$. In the following, we consider a linear transformation $A : V \rightarrow V$.


The adjoint of $A$, denoted by $A^\dagger$, is also a linear map from $V\rightarrow V$ which is defined relative to the metric via $$g\bigg(X,A(Y)\bigg)= g\bigg(A^\dagger(X),Y\bigg)$$ for all $X,Y\in V$. In component form, one finds that

$$(A^\dagger)^\mu_{\ \ \nu} = g_{\nu \alpha} A^\alpha_{\ \ \beta}\tilde g^{\mu\beta} \qquad (1)$$


Along with the adjoint, we can define the transpose of $A$, denoted by $A^\mathrm T$, which is a map from $V^*\rightarrow V^*$. The defining property of the transpose is that for every $X\in V$ and $\omega \in V^*$, we have $$\omega\bigg(A(X)\bigg) = \bigg(A^\mathrm T \omega\bigg)(X)$$ In component form, one finds that $$(A^\mathrm T)_\nu^{\ \ \mu} = A^\mu_{\ \ \nu} \qquad (2)$$

Note that unlike the adjoint, the transpose does not require a metric to define. However, if we do have a metric, then we can take the transpose (which again, is a map from $V^*\rightarrow V^*$) and, via the musical isomorphism, define a corresponding map from $V\rightarrow V$ (explicitly, we would start with a vector, map it to a covector with the metric, apply $A^\mathrm T$, and then map the result to a vector with the dual metric). If you do this, you obtain the adjoint as defined above.


Both $A$ and $A^\dagger$ are maps from $V\rightarrow V$. As a result, they have two indices - by convention, the first upstairs and the second downstairs. In contrast, $A^\mathrm T$ is a map from $V^*\rightarrow V^*$, and it therefore its first index is downstairs and the second index is upstairs. All of this is nice and clear.

When we start raising and lowering indices with the metric, we start to muddy the waters. Application of the raising/lowering convention to (1) yields

$$(A^\dagger)^\mu_{\ \ \nu} = A_\nu^{\ \ \mu}\qquad (3)$$ similarly, we can raise and lower (2) to obtain $$(A^\mathrm T)^\mu_{\ \ \nu} = A_\nu^{\ \ \mu}\qquad (4)$$

  • This suggests that $A^\dagger = A^\mathrm T$ - that the adjoint map is equal to the transpose map. This is wrong, because they are linear maps on different spaces.
  • It also suggests that they have the same components, which is also wrong.
    • The components of the transpose map, which are naturally written $(A^\mathrm T)_\nu^{\ \ \mu}$, are obtained by writing out $A^\mu_{\ \ \nu}$ as a matrix and then exchanging the rows and columns. In other words, $(A^\mathrm T)_2^{\ \ 1}$ (the $(2,1)$ component of $A^\mathrm T$) is simply equal to $A^1_{\ \ 2}$ (the $(1,2)$ component of $A$).
    • The components of the adjoint map, which are naturally written $(A^\dagger)^\mu_{\ \ \nu}$, are obtained from the components of $A$ by contraction with $g$ and $\tilde g$, as per (1).

The confusion ultimately arises because indices should be raised and lowered on tensors, not mere linear transformations. When we raise and lower indices on tensors, the space on which the resulting objects act is immediately obvious from the index placement. If we insist on using this convention on linear combinations, confusion between things like the adjoint and the transpose arise.


Let's now consider a linear transformation $\Lambda : V\rightarrow V$ which is orthogonal with respect to $g$. This means that

$$g\bigg(\Lambda(X),\Lambda(Y)\bigg) = g(X,Y)$$

Applying the definition of the adjoint, this means that

$$g\bigg(\Lambda^\dagger\big(\Lambda(X)\big),Y\bigg) = g(X,Y)$$

implying that $\Lambda^\dagger = \Lambda^{-1}$. It is not correct, however, to say that $\Lambda^\mathrm T = \Lambda^{-1}$; the fundamental reason is that $\Lambda^\mathrm T$ is a map from $V^*\rightarrow V^*$ while $\Lambda$ is a map from $V\rightarrow V$.


As a final point, in elementary linear algebra we often say that the adjoint of a real matrix is equal to its transpose. The reason we get away with this is that the inner product in such situations is implicitly given by $g_{\mu\nu} = \delta_{\mu\nu} = \mathrm{diag}(1,1,1,\ldots)$. In such cases, it's easy to see from the definitions (1) and (2) that the components of $A^\dagger$ are equal to the components of $A^\mathrm T$.

$\endgroup$
1
$\begingroup$

The root of your precise problem was already given by G.Smith in the comments. However, I wrote this extensive answer because it might helps to prevent future misunderstandings.

As you have stated correctly in the first part of your question, we can take a linear map $M$ as an element of the tensor product space $V\otimes V^*$. The transposed linear map $M^T$ is the map $$M^T: V^* \rightarrow V^*, f \mapsto f\circ M$$ which we can think of as an element of the space $V^*\otimes V$. Note that we did neither use a basis of the vector space $V$ nor a metric tensor so far. Thus, the transpose is perfectly well-defined without even knowing about those concepts.

However, most calculations are done in coordinates, so let $(e_\mu)_{\mu = 1,...,d}$ be a basis of $V$ and denote its dual basis by $(e^{*\mu})_{\mu = 1,...,d}$. We represent $M$ in this basis by $M={M^\mu}_\nu e_\mu\otimes e^{*\nu}$. Using this, and the definition of the transposed map $M^T$, we obtain the representation ot the transpose $${(M^T)_\mu}^\nu := (M^T(e^{*\nu}))(e_\mu) = (e^{*\nu}\circ M)(e_\mu) = {M^\beta}_\alpha e^{*\nu}(e_\beta) \cdot e^{*\alpha}(e_\mu) = {M^\nu}_\mu,$$ where we used $e^{*\mu}(e_\nu) = \delta^\mu_\nu$. This is precisely what you have stated above in point 2.

Ok, so let's tackle the final part, the metric. For us, the most important property of the metric $g\in\operatorname{Sym}^2V^*$ is the non-degeneracy, i.e. that the only vector mapped to the zero vector is the zero vector itself. This property is so nice, because it implies that the metric taken as a map $$g: V\rightarrow V^*, v\mapsto g(v,\bullet)$$ is a bijection. Thus, it allows us to identify the vector space $V$ with its dual $V^*$. Since every bijective map has its well-defined inverse, we introduce $\tilde{g} = g^{-1}:V^*\rightarrow V$. Getting back to coordinates, we will denote the representations of those functions by $$g = g_{\mu\nu} e^{*\mu}\otimes e^{*\nu}$$ for the metric tensor and $$\tilde{g} = g^{\mu\nu} e_\mu \otimes e_\nu$$ for its inverse respectively. You can also use the symmetric tensor product $\odot$ here. Now that we have defined everything, let's get to the interesting question: What is it good for?

It seems to be a national sport to raise and lower indices as much as possible. Behind every single operation of this kind, there is one of the bijection $g:V\rightarrow V^*$ or $\tilde{g}:V^*\rightarrow V$. The identity (4.) is precisely what you get when applying both to the map $M: V\rightarrow V$ to get the composit map $$V^*\rightarrow_g V \rightarrow_M V \rightarrow_{\tilde{g}} V^*$$ with its coordinate representation ${M_\mu}^\nu:={(g\circ M \circ \tilde{g})_\mu}^\nu = g_{\mu\alpha} g^{\nu\beta} {M^\alpha}_\beta$. An important thing to remember at this point is that $g$ and $\tilde{g}$ - both taken as maps rather than symmetric tensors - are bijections and, thus, allow for an identification of ${M^\mu}_\nu$ and ${M_\mu}^\nu$. However, they are not equal in general. The latter may be seen as a transported version of $M:V\rightarrow V$ to the vector space $V^*$ by $g:V\rightarrow V^*$. This is a well-known concept known under the name push-forward and also quite relevant in other forms, for example when dealing with spacetime symmetries on a manifold.

Anyway, to get to your question: As mentioned by G.Smith, there is a mistake in equation (3). It should be identical to equation (4). An easy way to spot potential mistakes in Einstein's index notation is to check the right amount (maximally 2) as well as the right position of the indices. In the case at hand, $${M^\mu}_\nu = g_{\mu\alpha} g^{\nu\beta} {M^\alpha}_\beta, \qquad \text{(wrong)}$$ both sides are valid by themselves, but they cannot be equal since their indices do not match. Leting go the coordinates for a second, the left-hand-side represents a map $V\rightarrow V$ while the right-hand-side is of the type $V^*\rightarrow V^*$. If it would not be expressed in coordinates, we would immediately conclude that there's something fishy.

So what are we supposed to conclude form all this? In my opinion, an important lesson is that notation is not a natural thing that god brought us in her generosity, writen down to be the one and only right thing. In other words, always be aware of the conventions and notations you are using and know their limitations. Also, having some knowlege of the background structure, for example the coordinate free in the present case, can help a lot in understanding the issue. Moreover, it can be utterly beautiful to glance behind the facade of notation and structures such as coordinates. I hope these words can help you in some way and do not lead to additional confusion! Also, they might inspire you to dig deeper into the world of differential geometry. :)

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.