1
$\begingroup$

I am trying to understand a neuroscience article by Karl Friston. In it he gives three equations that are, as I understand him, equivalent or inter-convertertable and refer to both physical and Shannon entropy. They appear as equation (5) in the article at http://www.fil.ion.ucl.ac.uk/spm/doc/papers/Action_and_behavior_A_free-energy_formulation.pdf (DOI 10.1007/s00422-010-0364-z). Here they are:

  • Energy minus entropy
  • Divergence plus surprise
  • Complexity minus accuracy

$$\begin{align*}F &=−\left\langle \ln p (\tilde{s},\Psi |m)\right\rangle_q+\left\langle \ln q(Ψ|\mu)\right\rangle_q \\ &= D\left(q(\Psi|\mu)\ ||\ p(\Psi|\tilde{s},m)\right)−\ln p(\tilde{s}|m) \\ &= D\left(q(\Psi|\mu)\ ||\ p(\Psi|m)\right)−\left\langle \ln p(\tilde{s}|\Psi,m)\right\rangle_q \end{align*}$$

The things I am struggling with at this point are 1) the meaning of the || in the 2nd and 3rd versions of the equations, 2) the negative logs. Any help in understanding how each of these equations amounts to what Fristen describes it to be (at the left of the equation) would be greatly appreciated. For example, in the 1st equation, in what sense are the terms energy and entropy, etc? Is the entropy, Shannon or thermodynamic or both?

$\endgroup$
3
  • $\begingroup$ I have edited your post to match the exact presentation of this portion of the paper. I cannot answer your question, however. $\endgroup$ Commented Jul 12, 2014 at 16:34
  • $\begingroup$ @Arkamis, thanks for your edits and the references, which are helpful. $\endgroup$ Commented Jul 12, 2014 at 21:17
  • $\begingroup$ Cross-posted on Data Science, Cross Validated, Cognitive Sciences $\endgroup$ Commented Nov 17, 2015 at 19:05

2 Answers 2

0
$\begingroup$

Not an answer, but it must be formatted in a way that a comment cannot hold.


If you continue reading, the author defines this notation:

Here, $\langle \cdot \rangle_q$ means the expectation or mean under the density $q$ and $D(\cdot\ ||\ \cdot)$ is the cross-entropy or Kullback-Leibler divergence between the two densities.

I have taken liberties to embed links to the relevant Wikipedia articles in the quote

$\endgroup$
0
0
$\begingroup$

I have presented my own research on Karl Friston's Free Energy Principal so I will outline what each term means and then explain how they work within the three equations for you.

The symbols in the three equations have the following meanings.

$\tilde{s}$ denotes the observed data.

$\Psi$ denotes hidden (latent) variables of the model.

$m$ denotes the generative model.

$\mu$ denotes parameters of the variational distribution.

$p(\tilde{s},\Psi|m)$ is the joint probability of the data and latent variables under the model.

$p(\Psi|\tilde{s},m)$ is the true Bayesian posterior distribution.

$p(\Psi|m)$ is the prior distribution over the latent variables.

$p(\tilde{s}|\Psi,m)$ is the likelihood of the data given the latent variables.

$q(\Psi|\mu)$ is the variational approximation to the posterior.

$\langle f(\Psi)\rangle_q$ denotes the expectation of a function with respect to $q(\Psi|\mu)$: $$ \langle f(\Psi)\rangle_q = \int q(\Psi|\mu)f(\Psi)d\Psi $$

$D(q||p)$ denotes the Kullback–Leibler divergence $$ D(q||p) = \int q(\Psi|\mu)\ln\frac{q(\Psi|\mu)}{p(\Psi)}d\Psi $$

$p(\tilde{s}|m)$ is the model evidence (marginal likelihood) $$ p(\tilde{s}|m)=\int p(\tilde{s},\Psi|m)d\Psi $$

$F$ denotes the variational free energy.

The first equation $$ F=-\langle\ln p(\tilde{s},\Psi|m)\rangle_q+\langle\ln q(\Psi|\mu)\rangle_q $$ defines free energy as the expected negative log joint probability plus the expected log of the variational density.

The second equation $$ F=D(q(\Psi|\mu)\ ||\ p(\Psi|\tilde{s},m))-\ln p(\tilde{s}|m) $$ shows that free energy equals the KL divergence between the variational posterior $q(\Psi|\mu)$ and the true posterior $p(\Psi|\tilde{s},m)$ minus the log model evidence.

Since the KL divergence is non-negative, minimizing $F$ minimizes the divergence between the approximate and true posterior.

The third equation $$ F=D(q(\Psi|\mu)\ ||\ p(\Psi|m))-\langle\ln p(\tilde{s}|\Psi,m)\rangle_q $$ decomposes free energy into a complexity term and an accuracy term.

The first term $$ D(q(\Psi|\mu)\ ||\ p(\Psi|m)) $$ measures the divergence between the variational posterior and the prior.

The second term $$ \langle\ln p(\tilde{s}|\Psi,m)\rangle_q $$ is the expected log likelihood of the observations under the variational distribution.

Thus free energy can be interpreted as $$ F = \text{complexity} - \text{accuracy}. $$

All three expressions represent the same quantity but highlight different interpretations of variational free energy.

$\endgroup$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.