3
$\begingroup$

From what I understand the monogenic signal is the analytic signal equivalent but for a 2D signals like an image.

If so, I am having trouble visualization what the envelope of a image and instantaneous amplitude/phase means.

Can anyone kindly explain this?

What information does it have that a regular Fourier transform cannot convey?

$\endgroup$
1
  • 1
    $\begingroup$ I don't have time to grok this paper, but it seems to do something like what you're asking. See particularly (28) and (29) and the definitions used therein for instantaneous amplitude. $\endgroup$ Commented 22 hours ago

1 Answer 1

2
$\begingroup$

You are correct that the monogenic signal is the 2D analog of the analytic signal, but with one extra quantity, because in 2D you also need orientation.

Definition

Replace the Hilbert transform (1D multiply by $i\,\omega/\|\omega\|$) with the two Riesz transforms (multipliers $R_1 = i\,\omega_1/\|\omega\|$ and $R_2 = -i\,\omega_2/\|\omega\|$). Note that the sign conventions vary, but it doesn't matter as long as you are consistent. For an image $f$:

$$f_M(u) = \big(f(u),\ R_1 f(u),\ R_2 f(u)\big).$$

In spherical-polar form this gives three local quantities at every pixel:

  • Local amplitude $\;A=\sqrt{f^2+(R_1 f)^2+(R_2 f)^2}$
  • Local phase $\;\varphi=\arctan\!\left(\sqrt{(R_1 f)^2+(R_2 f)^2}\,/\,f\right)\in[0,\pi]$
  • Local orientation $\;\theta=\operatorname{atan2}(R_2 f,\ R_1 f)$

The underlying local model is that at every pixel $u_o$, the image is approximated as a 1D cosine wave passing through that neighborhood:

$$f(u) \approx A(u_o)\cos\!\big((u-u_o)\cdot n(u_o) + \varphi(u_o)\big),$$

where $n(u_o)$ is a unit vector giving the wave direction.

Reading the cosine's argument: "how far am I from $u_o$, projected onto the wave direction, plus what part of the cycle $u_o$ itself sits at." So the model says: zoom in anywhere, and the image looks like alternating bright/dark stripes running in some direction. $(A, \varphi, \theta)$ are exactly the parameters of that local cosine.

Caveat: The model assumes a single dominant local orientation (it can describe edges and lines but not corners) and slowly varying $A$, which only holds inside a narrow frequency band. In practice you bandpass the image first (log-Gabor is the standard choice) and compute the monogenic signal of that. This gives you local structure at a chosen scale.

Intuition: How to visualize on a real image

  • $A$ — the "envelope": bright on edges, lines, and textures; dark on flat regions. Direct 2D analog of the 1D envelope.
  • $\varphi$ — what kind of feature you're sitting on: $\varphi\!\approx\!0$ is a bright ridge, $\varphi\!\approx\!\pi$ a dark valley, $\varphi\!\approx\!\pi/2$ a step edge.
  • $\theta$ — the wave-propagation / gradient direction. The visible edge or line runs perpendicular to $\theta$.

So instead of (envelope, phase) like in 1D, you get (envelope, phase, orientation).

Concrete example. Take vertical stripes: $f(x,y) = \cos(2\pi x/10)$, a cosine grating with period 10 pixels. Direct computation gives $R_1 f = -\sin(2\pi x/10)$ and $R_2 f = 0$. Plug into the formulas:

  • $A = 1$ — uniform amplitude, exactly what a pure cosine should give.
  • $\theta$ aligns with the x-axis — perpendicular to the stripes, along the gradient. ($\theta=0$ on rising half-cycles, $\pi$ on falling ones)
  • $\varphi$ cycles through $[0,\pi]$ as you move along x: $\varphi=0$ at bright peaks, $\varphi=\pi$ at dark troughs, $\varphi=\pi/2$ at zero-crossings (the steepest edge between stripes).

So $A$ says "there's structure with magnitude 1 here," $\theta$ says "the stripes run vertically," and $\varphi$ says "this pixel is on a peak / edge / trough." Three numbers per pixel that together reconstruct the local cosine.

What does this give you that the Fourier Transform doesn't?

The Fourier Transform loses none of the image information (i.e., it's invertible). The monogenic signal is just another transform of the same information into another form better suited to certain tasks. Specifically, the Fourier Transform is global. It says, "what frequencies at what orientation exist in the entire image?" In contrast, the monogenic signal gives you amplitude, phase, and orientation locally in the image.

For a clear introduction with worked examples and figures, see Bridge's tutorial highlighted by Peter in the comments. It's a bit easier to follow than the original paper you cited by Felsberg & Sommer (2001).

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.