while(!Myself->is_full_knowledge()) Myself->learn(Myself->enjoy);

Tuesday, October 10, 2006

An Intuitive Explanation of Fourier Theory

Fourier theory is pretty complicated mathematically. But there are some beautifully simple holistic concepts behind Fourier theory which are relatively easy to explain intuitively. There are other sites on the web that can give you the mathematical formulation of the Fourier transform. I will present only the basic intuitive insights here, as applied to spatial imagery.

Basic Principles: How space is represented by frequency

Higher Harmonics: "Ringing" effects

An Analog Analogy: The Optical Fourier Transform

Fourier Filtering: Image Processing using Fourier Transforms


Basic Principles

Fourier theory states that any signal, in our case visual images, can be expressed as a sum of a series of sinusoids. In the case of imagery, these are sinusoidal variations in brightness across the image. For example the sinusoidal pattern shown below can be captured in a single Fouier term that encodes 1: the spatial frequency, 2: the magnitude (positive or negative), and 3: the phase.

These three values capture all of the information in the sinusoidal image. The spatial frequency is the frequency across space (the x-axis in this case) with which the brightness modulates. For example the image below shows another sinusoid with a higher spatial frequency.

The magnitude of the sinusoid corresponds to its contrast, or the difference between the darkest and brightest peaks of the image. A negative magnitude represents a contrast-reversal, i.e. the brights become dark, and vice-versa. The phase represents how the wave is shifted relative to the origin, in this case it represents how much the sinusoid is shifted left or right.

A Fourier transform encodes not just a single sinusoid, but a whole series of sinusoids through a range of spatial frequencies from zero (i.e. no modulation, i.e. the average brightness of the whole image) all the way up to the "nyquist frequency", i.e. the highest spatial frequency that can be encoded in the digital image, which is related to the resolution, or size of the pixels. The Fourier transform encodes all of the spatial frequencies present in an image simultaneously as follows. A signal containing only a single spatial frequency of frequency f is plotted as a single peak at point f along the spatial frequency axis, the height of that peak corresponding to the amplitude, or contrast of that sinusoidal signal.

There is also a "DC term" corresponding to zero frequency, that represents the average brightness across the whole image. A zero DC term would mean an image with average brightness of zero, which would mean the sinusoid alternated between positive and negative values in the brightness image. But since there is no such thing as a negative brightness, all real images have a positive DC term, as shown here too.

Actually, for mathematical reasons beyond the scope of this tutorial, the Fourier transform also plots a mirror-image of the spatial frequency plot reflected across the origin, with spatial frequency increasing in both directions from the origin. For mathematical reasons beyond the scope of this explanation, these two plots are always mirror-image reflections of each other, with identical peaks at f and -f as shown below.

What I have shown is actually the Fourier transform of a single scan line of the sinusoidal image, which is a one-dimensional signal. A full two-dimensional Fourier transform performs a 1-D transform on every scan-line or row of the image, and another 1-D transform on every column of the image, producing a 2-D Fourier transform of the same size as the original image.

The image below shows a sinusoidal brightness image, and its two-dimensional Fourier transform, presented here also as a brightness image. Every pixel of the Fourier image is a spatial frequency value, the magnitude of that value is encoded by the brightness of the pixel. In this case there is a bright pixel at the very center - this is the DC term, flanked by two bright pixels either side of the center, that encode the sinusoidal pattern. The brighter the peaks in the Fourier image, the higher the contrast in the brightness image. Since there is only one Fourier component in this simple image, all other values in the Fourier image are zero, depicted as black.

Brightness Image
Fourier transform

Here is another sinusoidal brightness image, this time with a lower spatial frequency, together with it's two-dimensional Fourier transform showing three peaks as before, except this time the peaks representing the sinusoid are closer to the central DC term, indicating a lower spatial frequency.

Brightness Image
Fourier transform

The significant point is that the Fourier image encodes exactly the same information as the brightness image, except expressed in terms of amplitude as a function of spatial frequency, rather than brightness as a function of spatial displacement. An inverse Fourier transform of the Fourier image produces an exact pixel-for-pixel replica of the original brightness image.

The orientation of the sinusoid correlates with the orientation of the peaks in the Fourier image relative to the central DC point. In this case a tilted sinusoidal pattern creates a tilted pair of peaks in the Fourier image.

Brightness Image
Fourier transform

Different Fourier coefficients combine additively to produce combination patterns. For example the sinusoidal image shown below is computed as the sum of the tilted sinusoid shown above, and the vertical sinusoid of lower spatial frequency shown above that.

Brightness Image
Fourier transform

The brightness and the Fourier images are completely interchangable, because they contain exactly the same information. The combined brightness image shown above could have been produced by a pixel-for-pixel adding of the two brightness images, or by a pixel-for-pixel addition of the corresponding Fourier transforms, followed by an inverse transform to go back to the brightness domain. Either way the result would be exactly identical.

Back to top


Higher Harmonics and "Ringing" effects

The basis set for the Fourier transform is the smooth sinusoidal function, which is optimized for expressing smooth rounded shapes. But the Fourier transform can actually represent any shape, even harsh rectilinear shapes with sharp boundaries, which are the most difficult to express in the Fourier code, because they need so many higher order terms, or higher harmonics. How these "square wave" functions are expressed as smooth sinusoids will be demonstrated by example.

The figure below shows four sinusoidal brightness images of spatial frequency 1, 3, 5, and 7. The first one, of frequency 1, is the fundamental, and the others are higher harmonics on that fundamental, because they are integer multiples of the fundamental frequency. These are in fact the "odd harmonics" on the fundamental, and each one exhibits a bright vertical band through the center of the image. The Fourier transform for each of these patterns is shown below.

1
3
5
7

The next table shows the result of progressively adding higher harmonics to the fundamental. Note how the central vertical band gets sharper and stronger with each additional higher harmonic, while the background drops down towards a uniform dark field. Note also how the higher harmonics produce peaks in the Fourier images that spread outward from the fundamental, defining a periodic pattern in frequency space.

1
1+3
1+3+5
1+3+5+7

The images below show what would happen if this process were continued all the way out to the Nyquist frequency - it would produce a thin vertical stripe in the brightness image, with sharp boundaries, i.e. a "square wave" in brightness along the x dimension. The Fourier transform of this image exhibits an "infinite" series of harmonics or higher order terms, although these do not actually go out to infinity due to the finite resolution of the original image. This is how the Fourier transform encodes sharp square-wave type features as the sum of a series of smooth sinusoids.

Brightness Image
Fourier transform

Back to top


The Optical Fourier Transform

A great intuitive advance can be made in understanding the principles of the Fourier transform once you learn that a simple lens can perform a Fourier transform in real-time as follows. Place an image, for example a slide transparency, at the focal length of the lens, and illuminate that slide with coherent light, like a colimated laser beam. At the other focus of the lens place a frosted glass screen. Thats it! The lens will automatically perform a Fourier transform on the input image, and project it onto the frosted glass screen. For example if the input image is a sinusoidal grating, as shown below, the resultant Fourier image will have a bright spot at the center, the DC term, with two flanking peaks on either side, whose distance from the center will vary with the spatial frequency of the sinusoid.

We can now see the holistic principle behind the Fourier transform. Every point on the input image radiates an expanding cone of rays towards the lens, but since the image is at the focus of the lens, those rays will be refracted into a parallel beam that illuminates the entire image at the ground-glass screen. In other words, every point of the input image is spread uniformly over the Fourier image, where constructive and destructive interference will automatically produce the proper Fourier representation.

Conversely, parallel rays from the entire input image are focused onto the single central point of the Fourier image, where it defines the central DC term by the average brightness of the input image.

Note that the optical Fourier transformer automatically operates in the reverse direction also, where it performs an inverse Fourier transform, converting the Fourier representation back into a spatial brightness image. Mathematically the forward and inverse transforms are identical except for a minus sign that reverses the direction of the computation.

Back to top


Fourier Filtering

I will now show how the Fourier transform can be used to perform filtering operations to adjust the spatial frequency content of an image. We begin with an input image shown below, and perform a Fourier transform on it, then we do an inverse transform to reconstruct the original image. This reconstructed image is identical, pixel-for-pixel, with the original brightness image.

Brightness Image
Fourier Transform
Inverse Transformed

I will now demonstrate how we can manipulate the transformed image to adjust its spatial frequency content, and then perform an inverse transform to produce the Fourier filtered image. We begin with a low-pass filter, i.e. a filter that allows the low spatial-frequency components to pass through, but cuts off the high spatial frequencies. Since the low frequency components are found near the central DC point, all we have to do is define a radius around the DC point, and zero-out every point in the Fourier image that is beyond that radius. In other words the low-pass filtered transform is identical to the central portion of the Fourier transform, with the rest of the Fourier image set to zero. An inverse Fourier transform applied to this low-pass filtered image produces the inverse transformed image shown below.

Low-Pass Filtered
Inverse Transformed

We see that the low-pass filtered image is blurred, preserving the low frequency broad smooth regions of dark and bright, but losing the sharp contours and crisp edges. Mathematically, low-pass filtering is equivalent to an optical blurring function.

Next we try the converse, high-pass filtering, where we use the same spatial frequency threshold to define a radius in the Fourier image. All spatial frequency components that fall within that radius are eliminated, preserving only the higher spatial frequency components. After performing the inverse transform on this image we see the effect of high-pass filtering, which is to preserve all of the sharp crisp edges from the original, but it loses the larger regions of dark and bright.

High-Pass Filtered
Inverse Transformed

If the low-pass filtered inverse-transformed image is added pixel-for-pixel to the high-pass inverse-transformed image, this would exactly restore the original unfiltered image. These images are complementary therefore, each one encodes the information which is missing from the other.

Next we will demonstrate a band-pass filtering that preserves only those spatial frequencies that fall within a band, greater than a low cut-off, but less than a higher cut-off.

Band-Pass Filtered
Inverse Transformed

The next simulation is the same as above, except with a narrower band of spatial frequencies.

Band-Pass Filtered
Inverse Transformed

The next simulation shows band-pass filtering about a higher spatial-frequency band,

Band-Pass Filtered
Inverse Transformed

and finally the same as above except again using a narrower spatial-frequency band.

Band-Pass Filtered
Inverse Transformed

These computer simulations demonstrate that the Fourier representation encodes image information in a holistic distributed manner that allows manipulation of the global information content of the image by spatial manipulations of the transformed image.



INTRODUCTION TO FOURIER TRANSFORMS FOR IMAGE PROCESSING


BASIS FUNCTIONS:

The Fourier Transform ( in this case, the 2D Fourier Transform ) is the series expansion of an image function ( over the 2D space domain ) in terms of "cosine" image (orthonormal) basis functions.

The definitons of the transform (to expansion coefficients) and the inverse transform are given below:

 F(u,v) = SUM{ f(x,y)*exp(-j*2*pi*(u*x+v*y)/N) }
and
f(x,y) = SUM{ F(u,v)*exp(+j*2*pi*(u*x+v*y)/N) }

where u = 0,1,2,...,N-1 and v = 0,1,2,...,N-1
x = 0,1,2,...,N-1 and y = 0,1,2,...,N-1
j = SQRT( -1 )
and SUM means double summation over proper
x,y or u,v ranges
First we will investigate the "basis" functions for the Fourier Transform (FT). The FT tries to represent all images as a summation of cosine-like images. Therefore images that are pure cosines have particularly simple FTs.

This shows 2 images with their Fourier Transforms directly underneath. The images are a pure horizontal cosine of 8 cycles and a pure vertical cosine of 32 cycles. Notice that the FT for each just has a single component, represented by 2 bright spots symmetrically placed about the center of the FT image. The center of the image is the origin of the frequency coordinate system. The u-axis runs left to right through the center and represents the horizontal component of frequency. The v-axis runs bottom to top through the center and represents the vertical component of frequency. In both cases there is a dot at the center that represents the (0,0) frequency term or average value of the image. Images usually have a large average value (like 128) and lots of low frequency information so FT images usually have a bright blob of components near the center. Notice that high frequencies in the vertical direction will cause bright dots away from the center in the vertical direction. And that high frequencies in the horizontal direction will cause bright dots away from the center in the horizontal direction.

Here are 2 images of more general Fourier components. They are images of 2D cosines with both horizontal and vertical components. The one on the left has 4 cycles horizontally and 16 cycles vertically. The one on the right has 32 cycles horizontally and 2 cycles vertically. (Note: You see a gray band when the function goes through gray = 128 which happens twice/cycle.) You may begin to notice there is a lot of symmetry. For all REAL (as opposed to IMAGINARY or COMPLEX) images, the FT is symmetrical about the origin so the 1st and 3rd quadrants are the same and the 2nd and 4th quadrants are the same. If the image is symmetrical about the x-axis (as the cosine images are) 4-fold symmetry results.

MAGNITUDE VS. PHASE:

Recall that the definition of the Fourier Transform is:
 F(u,v) = SUM{ f(x,y)*exp(-j*2*pi*(u*x+v*y)/N) }
and
f(x,y) = SUM{ F(u,v)*exp(+j*2*pi*(u*x+v*y)/N) }

where u = 0,1,2,...,N-1 and v = 0,1,2,...,N-1
x = 0,1,2,...,N-1 and y = 0,1,2,...,N-1
and SUM means double summation over proper
x,y or u,v ranges
Note that f(x,y) is the image and is REAL, but F(u,v) (abbreviate as F) is the FT and is, in general, COMPLEX. Generally, F is represented by its MAGNITUDE and PHASE rather that its REAL and IMAGINARY parts, where:
 MAGNITUDE(F) = SQRT( REAL(F)^2+IMAGINARY(F)^2 )
PHASE(F) = ATAN( IMAGINARY(F)/REAL(F) )
Briefly, the MAGNITUDE tells "how much" of a certain frequency component is present and the PHASE tells "where" the frequency component is in the image. To illustrate this consider the following.

Note that the FT images we look at are just the MAGNITUDE images. The images displayed are horizontal cosines of 8 cycles, differing only by the fact that one is shifted laterally from the other by 1/2 cycle (or by PI in phase). Note that both have the same FT MAGNITUDE image. The PHASE images would be different, of course. We generally do not display PHASE images because most people who see them shortly thereafter succomb to hallucinogenics or end up in a Tibetan monastery. Nevertheless, it is wise to remember that when one looks at a common FT image and thinks about "high" frequency power and "low" frequency power, this is only the MAGNITUDE part of the FT.

By the way, you may have heard of the FFT and wondered if was different from the FT. FFT stands for "Fast" Fourier Transform and is simply a fast algorithm for computing the Fourier Transform.

ROTATION AND EDGE EFFECTS:

In general, rotation of the image results in equivalent rotation of its FT. To see that this is true, we will take the FT of a simple cosine and also the FT of a rotated version of the same function. The results can be seen by:

At first, the results seem rather surprising. The horizontal cosine has its normal, very simple FT. But the rotated cosine seems to have an FT that is much more complicated, with strong diagonal components, and also strong "plus sign" shaped horizontal and vertical components. The question is, where did these horizontal and vertical components come from? The answer is that the FT always treats an image as if it were part of a periodically replicated array of identical images extending horizontally and vertically to infinity. And there are strong edge effects between the neighbors of such a periodic array as can be seen by:

Thus, what we see as the FT in the "slant" image (lower right of the image before last) is actually the combination of the actual FT of the cosine function and that caused by the edge effects of looking at a finite part of the image. These edge effects can be significantly reduced by "windowing" the image with a function that slowly tapers off to a medium gray at the edge. The result can be seen by:

The windowed image is shown in the upper left. Its FT is shown in the lower left. The non-windowed FT is shown in the upper right and the actual, true FT of a cosine is shown in the lower right. These images are all scaled differently and the comparison is only qualitative, but it can be seen that the windowed image FT is much closer to the true FT and eliminates many of the edge effects.

SOME IMAGE TRANSFORMS:

Now, with the above introduction, the best way to become familiar with Fourier Transforms is to see lots of images and lots of their FTs. First, an interesting pair of images, one sharp and clear, and the other blurred and noisy.

There are 2 images, goofy and the degraded goofy, with FTs below each. Notice that both suffer from edge effects as evidenced by the strong vertical line through the center. The major effect to notice is that in the transform of the degraded goofy the high frequencies in the horizontal direction have been significantly attenuated. This is due to the fact that the degraded image was formed by smoothing only in the horizontal direction. Also, if you look carefully you can see that the degraded goofy has a slightly larger background noise level at high frequencies. This is difficult to see and perhaps not even meaningful because the images are scaled differently, but if really there, it is due to the random noise added to the degraded goofy. Notice also that it is difficult to make much sense out of the low frequency information. This is typical of real life images.

The next images show the effects of edges in images:

Notice the strong periodic component, especially in the vertical direction for the bricks image. Horizontal components appear closer together in the FT. In the blocks image, notice a bright line going to high frequencies perpendicular to the strong edges in the image. Anytime an image has a strong-contrast, sharp edge the gray values must change very rapidly. It takes lots of high frequency power to follow such an edge so there is usually such a line in its magnitude spectrum.

Now lets look at a bunch of different shapes and their FTs.

Notice that the letters have quite different FTs, especially at the lower frequencies. The FTs also tend to have bright lines that are perpendicular to lines in the original letter. If the letter has circular segments, then so does the FT.

Now lets look at some collections of similar objects:

Notice the concentric ring structure in the FT of the white pellets image. It is due to each individual pellet. That is, if we took the FT of just one pellet, we would still get this pattern. Remember, we are looking only at the magnitude spectrum. The fact that there are many pellets and information about exactly where each one is is contained mostly in the phase. The coffee beans have less symmetry and are more variably colored so they do not show the same ring structure. You may be able to detect a faint "halo" in the coffee FT. What do you think this is from?

Here are our first truly general images. Notice there is very little structure. You can see a top left to bottom right slanting line in the girl image FT. It is probably due to the edge between her hat and her hair. There are also some small edge effects in both images. The mandril image appears to have more high frequency power, probably due to the hair.

The seafan image has a lot of little holes that are about the same size and somewhat randomly oriented. The size of the holes is about 2 pixels wide so that corresponds to frequency components about 1/2 way out to the maximum. The strong horizontal components in the lake image is probably due to the tree trunk edges.

Now, here is your first quiz. Consider an image that is all black except for a single pixel wide stripe from the top left to the bottom right. What is its FT? Also, consider an image that is totally random. That is, every pixel is some random value, independent of all other pixels. What is its FT?

Do you believe it? If not, you can check it yourself. By the way, notice the single bright dot in the middle of the noise FT image. Why is it there? Why does the noise FT look dark gray?

SOME FILTERS:

Now we start to illustrate the use of some filters on the girl image. The first is a lowpass filter. The upper left is the original image. The lower left is produced by:
 fft2d 128 <> girlfft
mag2d 128 <> girlmag
The lower right is then produced by:
 fftfilt 128 low ideal 50 <> lpgirlfft
mag2d 128 <> lpgirlmag
Finally, the upper right is produced by:
 ifft2d 128 <> lpgirl
To see the results:

The left side of the image we have seen before. In the lower right, notice how sharply the high frequencies are cut off by the "ideal" lowpass filter. Notice also that not very much power is being thrown away beyond the circle that is cut off. In the upper right, the reconstructed image is obviously blurrier due to the loss of high frequencies. Overall contrast is still pretty good due to that fact that not too much power was thrown away. Notice also that there are obvious "ringing" artifacts in the reconstructed image. This is due to the very sharp cutoff of the "ideal" filter. A Butterworth or Exponential filter with reasonably low order would not cause these.

Now we will do a highpass filter. The following image is produced in the same way as the previous one except:

 fftfilt 128 high butter 50 <> hpgirlfft
In other words, a butterworth filter of 1st order is used.

Notice in the lower right that this filter does not cut off sharply at the 50% point as the lowpass did. However, the center bright spot, which accounts for most of the power in the image, is clearly gone. The image in the upper right, which looks totally black, in fact is not totally black. If you use the colormap capability of "dym" to stretch the gray values from 0-20 out over the entire range, you can see that this highpass filter has preserved the image information where there are very rapid changes in gray level. Such a process is frequently what is desired in an edge detector. However, it is not an improvement in the image. There are 2 problems. First, it is too dark. This can be fixed by rescaling or re-contrast- stretching the image after filtering. This is commonly done and is easy. Second, and harder, is the fact that too much of the low frequency tonal information is gone.

Image sharpening requires a "sharpening" filter or high frequency emphasis filter. This kind of filter preserves some of the low frequency information but relatively boosts the higher frequencies. To do such a thing, we will construct our own filter which will be piecewise-linear. The filter will be circularly symmetrical and will have coefficients as follows:

   0  0.5
96 4.0
127 4.0
In other words, Fourier coefficients of frequency-distance 0 from the origin will be multiplied by 0.5. As you go away from the origin or zero frequency, out to frequency-distance 96, the multiplier will be interploated between 0.5 and 4.0. From then outward, the multiplier will be 4.0. So higher frequency coefficients are multiplied by values greater than 1.0 and lower frequency coefficients are multiplied by values less thatn 1.0. The overall net effect on the image power is that it is unchanged. The above values are in a file called "filter_coeffs". To apply the filter, the following steps are carried out:
 filttabler <> filter_file
fftfilt 128 file filterfile <> mfgirlfft
The rest of the image is constructed as before. To see the result:

Notice the relative brightness at high frequencies in the lower right image. Which upper image is sharper? Which upper image looks better? Portraits are one of the few contradictions to the general principal that sharper is better.

Filtering can also be used to reduce noise. It is particularly effective when the noise is confined to just a few frequencies:

The image on the upper left is goofy with a superimposed cosine added to it, representing noise. In the lower left, notice the strong cosine "dots" just to the left and right of the origin. In the lower right, these "dots" have been removed ( I actually did it with the "trace" capability in dym ). The resulting magnitude file is then used with the "filter" command to filter the Fourier coefficients. The file of coefficients is then inverse FT'd to get the upper right image. The cosine "noise" is gone.

Life is not always this easy as is shown in the next example:

In this case, a grid has been placed over goofy. The lower left shows the resulting FT. Notice that the grid is quite sharp so it has lots of high frequencies so its impact on the frequency domain is very spread out. Dym was again used to "paint" out the grid frequencies as much as possible. The right half of the lower right image is not painted because it is the symmetric reflection of the left half and is not used by the filter.

YOUR ASSIGNMENT: (SHOULD YOU CHOOSE TO ACCEPT IT)

(1) Pick an image.
(2) FFT it and find the magnitude spectrum.
see man for fft2d and mag2d
(3) Do something to the spectrum or the fft.
ex: filter
fftfilt
something like:
cm
double
multiply by alternating +1,-1
take phase only
take magnitude only
(4) Reconstruct an image by inverse fft.
see man for ifft2d
(5) Put the results together like the above images using "group"
see man for group
(6) Explain your results (1-2 pages).
More credit will be given to the imagination of what you do than
to the correctness of your explanation.