1
$\begingroup$

I'm reading about Pattern recognition and when I read the appendix on my book I came across with the following derivation:

$J(\theta)$ is cost a function with parameter $\theta = (\theta_1, ..., \theta_d)$. If $J(\theta) = c$ then:

$$dc = 0 = \frac{\partial J(\theta)^T}{\partial \theta}d\theta \Rightarrow \frac{J(\theta)}{\partial \theta} \perp d\theta$$

This maybe a very easy question, but the derivation above confuses me...could someone write it out more explicitly what the author did :) I attached a picture showing more information taken from the book. I have highlighted the area I found confusing...what happened in the red area?

enter image description here

Thnx for guidance :)

$\endgroup$

2 Answers 2

3
$\begingroup$

Imagine the variable $\mathbf{\theta}$ depending on some parameter $t$. Then on a level curve,

$$J(\mathbf{\theta}(t))=c$$

we have

$$\frac{d}{dt} J(\mathbf{\theta}(t)) = \frac{dc}{dt}$$

The RHS is zero because $c$ is a constant. The LHS is, by the chain rule,

$$\left (\frac{dJ}{d\mathbf{\theta}}\right)^T \frac{d\mathbf{\theta}}{dt} = 0$$

The transpose arises because the chain rule dictates it. Recall that the $d/d\mathbf{\theta}$ operator is a gradient operator; work the derivatives out component by component. For any two vectors $\mathbf{a}$ and $\mathbf{b}$, the dot product may be written as $\mathbf{a}^T \mathbf{b}$. Therefore, when $\mathbf{a}^T \mathbf{b} = 0$, we can say that $\mathbf{a} \perp \mathbf{b}$.

"Multiply" through by $dt$ to get the equation you wanted. (Why is it possible to do this? It is OK if you consider the derivative as the limit of the ratio of two very small quantities; in this case, a small quantity is $dt$.)

$\endgroup$
0
1
$\begingroup$

$J=J(\theta_1,\dots,\theta_n)$ is a function of $n$ variables $\theta:=(\theta_1,\dots,\theta_n)$. Its gradient $\nabla J(\theta)$ is the vector

$$\frac{\partial J(\theta)}{\partial \theta}:=\nabla J(\theta)=(\frac{\partial J}{\partial \theta_1},\dots, \frac{\partial J}{\partial \theta_n}),$$

while

$$c=\frac{\partial J(\theta)}{\partial \theta}d\theta:=\langle \frac{\partial J(\theta)}{\partial \theta}, d\theta\rangle$$

is the differential of $J(\theta)$ at $\theta$. With $d\theta=(d\theta_1,\dots,d\theta_n)$ the author means the vector of infinitesimal increments of the point $\theta$.

The equation

$$c=0$$

is equivalent to the orthogonality of the vectors $\frac{\partial J(\theta)}{\partial \theta}$ and $d\theta$ w.r.t. the Euclidean scalar product $\langle,\rangle$.

$\endgroup$
6
  • 1
    $\begingroup$ you are welcome! $\endgroup$ Commented Jul 8, 2013 at 11:40
  • $\begingroup$ Hi I wanted to ask about your answer on this post :) You said that $\frac{\partial J(\theta)}{\partial \theta}d\theta$ is the differential of $J(\theta)$ at $\theta$. Isn't $\frac{\partial J(\theta)}{\partial \theta}d\theta$ the same here as $\frac{dJ(\theta)}{d\theta}d\theta$? I just get confused with the notation easily :) I want everything to be crystal clear on my mind x) $\endgroup$ Commented Jul 11, 2013 at 6:23
  • 1
    $\begingroup$ In my notation $\theta$ is the vector $\theta=(\theta_1,...,\theta_n)$ of $n$ variables for $J$. Then you compute partial derivatives w.r.t. $\theta$'s, not ordinary derivatives, and the gradient of $J$ at $\theta=(\theta_1,...,\theta_n)$ has notation $\frac{\partial J}{\partial \theta}$. Such vector is given in the first formula of the above question. I hope it helps :) $\endgroup$ Commented Jul 11, 2013 at 12:16
  • 1
    $\begingroup$ If you formally cancel $d\theta$ at denominator with $d\theta$ on the right, then in your notation $dJ(\theta)$ is the differential of $J$ at the vector $\theta$ IF, when you need to define it, you write $dJ(\theta)=\frac{\partial J}{\partial \theta}d\theta$. This is correct :) $\endgroup$ Commented Jul 11, 2013 at 12:57
  • 1
    $\begingroup$ Note that $\frac{\partial J}{\partial \theta}d\theta$ is the scalar product $\frac{\partial J}{\partial \theta}d\theta:=\langle\nabla J(\theta),d\theta\rangle$, with $d\theta=(d\theta_1,...d\theta_n)$. $\endgroup$ Commented Jul 11, 2013 at 12:58

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.