Skip to content Skip to navigation

Connexions

You are here: Home » Content » Signal Processing in Processing: Miscellanea

Navigation

Content Actions

  • Download module PDF
  • Add to ...
    Add the module to:
    • My Favorites
    • A lens
    • An external social bookmarking service
    • My Favorites (What is 'My Favorites'?)
      'My Favorites' is a special kind of lens which you can use to bookmark modules and collections directly in Connexions. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need a Connexions account to use 'My Favorites'.
    • A lens (What is a lens?)

      Definition of a lens

      Lenses

      A lens is a custom view of Connexions content. You can think of it as a fancy kind of list that will let you see Connexions through the eyes of organizations and people you trust.

      What is in a lens?

      Lens makers point to Connexions materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

      Who can create a lens?

      Any individual Connexions member, a community, or a respected organization.

    • External bookmarks
  • E-mail the authors

Recently Viewed

Signal Processing in Processing: Miscellanea

Module by: Pietro Polotti, Davide Rocchesso Based on: Signal Processing in Processing: Miscellanea by Davide Rocchesso

Summary: Miscellaneous topics and effects in audio and image processing.

Economic Color Representations

In Media Representation in Processing we saw how one devotes 8 8 bits to each channel corresponding to a primary color. If we add to these the alpha channel, the total number of bits per pixel becomes 32 32. We do not always have the possibility to use such a big amount of memory for colors. Therefore, one has to adopt various strategies in order to reduce the number of bits per pixel.

Palette

A first solution comes from the observation that usually in an image, not all of the 224 2 24 representable colors are present at the same time. Supposing that the number of colors necessary for an ordinary image is not greater than 256 256, one can think about memorizing the codes of the colors in a table (palette), whose elements are accessible by means of an index of only 8 8 bits. Thus, the image will require a memory space of 8 8 bits per pixel plus the space necessary for the palette. For examples and further explanations see color depth in Wikipedia.

Dithering

Alternatively, in order to have a low number of bits per pixel, one can apply a digital processing technique called dithering. The idea is that of obtaining a mixture of colors in a perceptual way, exploiting the proximity of small points of different color. An exhaustive presentation of the phenomenon can be found at the voice dithering of Wikipedia.

Floyd-Steinberg's Dithering

The Floyd-Steinberg's algorithm is one of the most popular techniques for the distribution of colored pixels in order to obtain a dithering effect. The idea is to minimize the visual artifacts by processing the error-diffusion. The algorithm can be resumed as follows:

  • While proceeding top down and from left to right for each considered pixel,
    • calculate the difference between the goal color and the closest representable color (error)
    • spread the error on the contiguous pixels according to the mask 1160000X7351 1 16 0 0 0 0 X 7 3 5 1 . That is, add 716 7 16 of the error to the pixel on the right of the considered one, add 316 3 16 of the error to the pixel bottom left with respect to the considered one, and so on.
By means of this algorithm it is possible to reproduce an image with different gray levels by means of a device able to define only white and black points. The mask of the Floyd-Steinberg's algorithm was chosen in a way that a uniform distribution of gray intensities 12 1 2 produces a chessboard layout pattern.

Exercise 1

By means of Processing, implement a program to process the file lena, a "dithered" black and white version of the famous Lena image. The image, treated only in the left half, should result similar to that of Figure 1

Figure 1
Figure 1 (lena_dithered.png)

Solution 1



size(300, 420); 
PImage a;  // Declare variable "a" of type PImage 
a = loadImage("lena.jpg"); // Load the images into the program 
image(a, 0, 0); // Displays the image from point (0,0) 
 
int[][] output = new int[width][height]; 
for (int i=0; i<width; i++)
  for (int j=0; j<height; j++) {
    output[i][j] = (int)red(get(i, j)); 
    }
int grayVal;
float errore;
float val = 1.0/16.0;
float[][] kernel = { {0, 0, 0}, 
                     {0, -1, 7*val}, 
                     {3*val, 5*val, val }}; 

for(int y=0; y<height; y++) { 
  for(int x=0; x<width; x++) { 
    grayVal = output[x][y];// (int)red(get(x, y));
    if (grayVal<128) errore=grayVal;
      else errore=grayVal-256;
    for(int k=-1; k<=1; k++) { 
      for(int j=-1; j<=0 /*1*/; j++) { 
        // Reflect x-j to not exceed array boundary 
        int xp = x-j; 
        int yp = y-k; 
        if (xp < 0) { 
          xp = xp + width; 
        } else if (x-j >= width) { 
          xp = xp - width;  
        } 
        // Reflect y-k to not exceed array boundary 
        if (yp < 0) { 
          yp = yp + height; 
        } else if (yp >= height) { 
          yp = yp - height; 
      }
      output[xp][yp] = (int)(output[xp][yp] + errore * kernel[-j+1][-k+1]);
      } 
    } 
  } 
} 
 
for(int i=0; i<height; i++)  
  for(int j=0; j<width; j++) 
    if (output[j][i] < 128) output[j][i] = 0;
    else output[j][i] = 255; 
    
// Display the result of dithering on half image
loadPixels(); 
for(int i=0; i<height; i++) { 
  for(int j=0; j<width/2; j++) { 
    pixels[i*width + j] = 
      color(output[j][i], output[j][i], output[j][i]);
  } 
} 
updatePixels(); 

Economic Sound Representations

In case of audio signals, the use of dithering aims at reducing the perceptual effect of the error produced by the changes in the quantization resolution, that one typically performs when recording and processing audio signals. For example, when recoding music, more than 16-bits of quantization are usually employed. Furthermore, mathematical operations applied to the signal (as, for instance, simple dynamical variations) require an increasing of the bit depth that is of the number of bits. As soon as one reaches the final product, the audio CD, the number of quantization bits has to be reduced to 16. In each of these consecutives processes of re-quantization one introduces an error, that adds up. In case of a reduction of the number of bits, it is possible to truncate the values (that is the digits after the decimal point are neglected and set to zero) or round them (that is the decimal number is approximated to the closest integer). In both cases one introduces an error. In particular, when one considers signals with a well defined pitch (as in the case of musical signals), the error becomes periodic-like. From the example of the voice dithering of Wikipedia, the reason of this additional pseudo-periodic noise (i.e. harmonic-like) becomes clear. This distortion corresponds to a buzz-like sound that "follows" the pitch of the quantized sound. The whole result is quite annoying from a perceptual point of view.

In case of audio signals, thus, dithering has the function of transforming this buzz-like sound in a background noise similar to a rustling, less annoying from a listening point of view. In Figure 2 an example of some periods of the waveform of a clarinet quantized with 16 bits is reported. The result of a reduction of the number of bits to 8 is represented in Figure 3. It is clearly visible how the reduction of the quantization levels produces series of lines with constant amplitude. The application of dithering generates a further transformation that, how one can see in Figure 4 "breaks" the constant lines by means of the introduction of white noise. The Figure 5, Figure 6 and Figure 7 represent the Fourier transforms of the sounds of Figure 2, Figure 3 and Figure 4, respectively. In the frequecy representation, it is also visible how the change to a quantization at 8 bits introduces improper harmonics (Figure 6) not present in the sound at 16 bit (Figure 5). These harmonics are canceled by the effect of dithering (Figure 7).

Figure 2
Figure 2 (Clarinetto.png)
Figure 3
Figure 3 (Clarinetto8.png)
Figure 4
Figure 4 (Clarinetto8dith.png)
Figure 5
Figure 5 (FT-Clarinetto.png)
Figure 6
Figure 6 (FT-Clarinetto8.png)
Figure 7
Figure 7 (FT-Clarinetto8dith.png)

There exist also methods that exploit perceptual factors as the fact that our ear is more sensitive in the central region of the audio band and less sensitive in the higher region. This allows one to make the effect of a re-quantization less audible. This is the case of the noise shaping techniques. This method consists in "modeling" the quantization noise. Figure 8 represents the sound of a clarinet re-quantized with 8 bits. A dithering and a noise-shaping were applied to the sound. The result, apparently destructive from the point of view of the waveform, corresponds to a sound, whose spectrum is closer to that of the original sound at 16 bits, excepted for a considerable increasing of energy in the very high frequency region (Figure 9). This high frequency noise produces this "ruffled" waveform, i.e. high energy fast amplitude variations. This noise is anyway not audible. Thus, at a listening test, the final result is better than the previous one.

Figure 8
Figure 8 (Clarinetto8dithNS.png)
Figure 9
Figure 9 (FT-Clarinetto8dithNS.png)
It is possible to think about the noise shaping as the audio counterpart of the Floyd-Steinberg's algorithm for graphics. In the audio case the error propagation occurs in the time-domain instead of the space-domain. The most simple version of noise shaping can be obtained by means of the definition of the quantization error
en=yn-Qyn e n y n Q y n (1)
where
yn=xn+en-1 y n x n e n 1 (2)
and x x is the non-quantized signal. Further details about noise shaping can be found at Wikipedia, noise shaping. What presented above can be tested in the sound examples clarinet , clarinet at 8 bits, clarinet at 8 bits with dithering and clarinet at 8 bit with noise shaping that contain the clarinet sounds represented in Figure 2, Figure 3, Figure 4, e Figure 8, respectively.

Histogram-Based Processing.

Definition 1: Image Histograms
Graphic representation by means of vertical bars, where each bar represents the number of pixels present in the image for a given intensity of the gray scale (or color channel). Wikipedia definition.

Among the examples in Processing, one finds the code Histogram that overlap an image to its own histogram.

The histogram offers a synthetic representation of images, in which one looses the information concerning the pixel positions and considers only the chromatic aspects. This provides information about the Tonal Gamma of an image (gray intensity that are present) and about the Dynamics (extension of the Tonal Gamma). The image of a chess board, for example, has a Tonal Gamma that includes only two gray levels (black and white) but it has a maximal dynamics (since white and black are the two extremity of the representable gray levels).

The histogram is the starting point for various processing effects aiming at balancing or altering the chromatic contents of an image. In general, the question is building a map g o =f g i g o f g i for the gray levels (or color-channel levels) that can be applied to each pixel. The histogram can drive the definition of this map.

Translation and Expansion of an Histogram

If the map is of the kind g o = g i +k g o g i k the histogram is translated in the sense of a higher or lower brightness, according to the sign of k k. On the other side, if the map is of the kind g o =k g i g o k g i the histogram will be expanded or compressed, for values of k k smaller or greater than 1 1, respectively.

The contrast stretching is one of the operations of this kind of linear scaling that tries to extend the dynamic range of an image. The interval by means of which one performs the scaling is set on the basis of the histogram, neglecting, for example, the tails of the distribution corresponding to 10 % 10% of the darkest and brightest pixels. In order to find more information and practical suggestions see : A Guide to Contrast Stretching.

Non Linear Scaling

More in general, the map g o =f g i g o f g i can be non linear, and this allows a greater flexibility in the manipulation of the histogram. A useful instrument is the one that allows to manipulate interactively the scaling map and to see the results on the image and/or on the histogram in real time. The instrument Color Tools/Curves of the image processing software Gimp does this, using an interpolating spline. In Processing it is possible to build a similar instrument, as reported in Example 1.

Equalization of an Histogram

The non linear scaling is the tool to equalize the histogram, that is to shape it in a desirable way. An image has a balanced tonal gamma, if all of the gray levels are represented and if the distribution is approximately uniform. In other words, one aims at a flat histogram. Without entering too much into the mathematical details (more information is available in the Guide to the Equalization of the Histogram), one can say that the non linear map to be used for the equalization is obtained from the cumulated distribution of the histogram of the image f g i =k=0 g i hk f g i k 0 g i h k , where hk h k is the frequency, properly scaled by means of a normalization constant, with which the k k-th gray level appears .

Exercise 2

Modify the Processing code of Example 1 in order to add the operation of equalization of the histogram.

Solution 2



int grayValues = 256;
int[] hist = new int[grayValues]; 
int[] histc = new int[grayValues];
PImage a; 

void setup() {
  background(255);
  stroke(0,0,0);
  size(300, 420); 
  colorMode(RGB, width); 
  framerate(5);
  a = loadImage("lena.jpg"); 
  image(a, 0, 0);
}

void draw() {
    // calculate the histogram 
  for (int i=0; i<width; i++) { 
    for (int j=0; j<height; j++) { 
      int c = constrain(int(red(get(i,j))), 0, grayValues-1);
      hist[c]++;
    } 
  } 

  // Find the largest value in the histogram 
  float maxval = 0; 
  for (int i=0; i<grayValues; i++) { 
    if(hist[i] > maxval) { 
      maxval = hist[i]; 
    }  
  } 

  // Accumulate the histogram
  histc[0] = hist[0];
  for (int i=1; i<grayValues; i++) { 
    histc[i] = histc[i-1] + hist[i];    
  } 
  
  // Normalize the histogram to values between 0 and "height" 
  for (int i=0; i<grayValues; i++) { 
    hist[i] = int(hist[i]/maxval * height); 
  } 

  if (mousePressed == true) { //equalization
    for (int i=1; i<grayValues; i++) { 
    println(float(histc[i])/histc[grayValues-1]*256);
    } 
    loadPixels();
    println("click");
    for (int i=0; i<width; i++)
      for (int j=0; j<height; j++) {
        //normalized cumulated histogram mapping
        pixels[i+j*width] = color(
          int(
          float(histc[constrain(int(red(a.get(i,j))), 0, grayValues-1)])/
            histc[grayValues-1]*256)); 
      }
    updatePixels();
     }
 
  // Draw half of the histogram 
  stroke(50, 250, 0); 
  strokeWeight(2);
  for (int i=0; i<grayValues; i++) { 
    line(i, height, i, height-hist[i]); 
  } 
}

Segmentation and Contour Extraction

Contours

The objects that populate a scene represented in an image are usually detectable by means of their contours: Thread-like profiles that correspond to a fast variation of color or of gray intensity. The contour extraction is one of the typical operation of the image processing. From the point of view of the Mathematical Analysis, a fast variation of a function corresponds to a high value of the derivative function. In the frame of Digital Signal Processing (DSP), the derivative can be approximated by means of a difference operation, i.e. by means of a filter. The filters that let fast variations through and eliminate slow variations are of a high-pass type. It is not surprising, thus, that for contour extraction, one uses convolution masks similar to that seen in Elementary Filters employed for edge crispening. More in detail, one can say that in 2D one looks for points with maximum amplitude of the gradient, i.e. points, where the Laplacian of the image that is its second spatial derivative is necessarily zero. The Laplacian can be approximated (in the discrete space) by means of a convolution mask 0101-41010 0 1 0 1 -4 1 0 1 0 . The direct application of the Laplacian is often not satisfying, since the result is too sensitive to noise and to small details. It is, thus, useful to combine the Laplacian mask with that of a low-pass filter. The combination of a Laplacian and a Gaussian (LoG) filter produces, in the case 5 5 by 5 5, the mask 00-1000-1-2-10-1-216-2-10-1-2-1000-100 0 0 -1 0 0 0 -1 -2 -1 0 -1 -2 16 -2 -1 0 -1 -2 -1 0 0 0 -1 0 0 In the software Gimp, a contour enhancer is available, based on the difference between two Gaussian filters, corresponding to two Gaussian curves with different amplitude. This produces an inhibition effect outside the principal contours, in a way similar to what happens in our perceptual system.

Regions

In many applications it is necessary to isolate the various objects that populate a scene, starting from their representation, as pixel collections of an image. For example, it could be interesting to isolate an object in foreground from the background. In this case, one talks about segmentation or extraction of regions. The most simple way for isolating different regions is that of doing it on a color basis, or on a gray-intensity basis. Also in this case, the operation can be driven by the histogram that can be helpful in order to establish a gray threshold. All the darker pixels will be mapped to black, while all the lighter ones will be mapped to white. For example, if the histogram presents two evident maxima, it is possible to assign the region of one maximum to the foreground (since it is, for example, lighter) and the region corresponding to the other maximum to the background (since it is, for example, darker). The threshold will be chosen in between the two maxima. Sometimes it is necessary to establish a multiplicity of thresholds, in order to isolate different regions by means of different gray levels. For color images, the thresholds can be different for different RGB channels. For more details and practical suggestions see the guide to thresholding.

Exercise 3

Apply the Laplacian filter and the LoG filter to the image of Lena.

Exercise 4

Show that the extraction of a background figure by means of a threshold can be implemented by means of non linear map g o =f g i g o f g i . What should be the form of this map?

Solution 4

It is a step map, with transition from 0 0 to 255 255 set in correspondence of the chosen threshold.

Exercise 5

Employ a program for image processing (ex. Gimp) in order to isolate (by means of thresholding) the breaks in the image of the broken glass.

Audio Dynamic Compression

For audio signals, similarly to what seen for images, one considers the problem of reducing the data necessary to represent a sound, while preserving an acceptable quality of the signal from a perceptual point of view. It is not easy to define, what is meant by "acceptable quality", when one perform a data reduction or, better, a data compression. In general the qualitative evaluation parameters of the audio compression standards are statistical, based on the results of listening tests made on groups of listeners, representing a wide gamma of users. The audio compression standards are usually founded on the optimization of the dynamics of the signal, that is on the optimization of the number of bits employed for the quantization. A well known example of compression standard is that of mp3, in which one exploits psychoacoustic phenomena, as the fact that louder sounds mask (make inaudible) softer sounds. In the reproduction of digitalized sounds, the thing that one wants to mask is the quantization noise. In other words, if the sound has a wide dynamics (it is loud) one can adopt a greater quantization step, since the louder quantization noise produced by the rougher subdivision of the quantization levels is anyway masked by the reproduced sound. Still simplifying things in a drastic way, one could say that mp3 varies the step of quantization according to the dynamics of the sound and in a different way in different bands of frequency. In other words, the signal is divided into many frequency bands (in a similar way as the Equalizer of a Hi-fi system does) and each band is quantized separately. This allows a reduction of even 20 times of the number of bits with respect to a fixed 16 bit dynamics. Another compression technique is provided by the mu-law (µ-law). This standard is used mainly in audio systems for digital communication in North America and Japan. In this case, the main idea is to modify the dynamic range of the analogical audio signal before the quantization. Behind these compression techniques, there is once more a psychoacoustic phenomenon that is the fact that our perception of the intensity is not linear but logarithmic-like. In other words, our perception behaves approximately according to what shown in Figure 10.

Figure 10
Figure 10 (Loudness.png)

What the mu-law actually performs is a reduction of the dynamic range of the signal by means of an amplitude re-scaling according to the map described in Figure 11. It is visible how the effect is that of amplifying the small amplitudes, reducing the range of amplitude values of the signal (in the sense of big amplitudes) and, as a consequence, increasing the relationship (the amplitude difference) between the sound and the quantization noise. Afterwards, a linear quantization of the non linearly distorted signal is performed. As one wants to play back the digital signal, this is first converted into an analogical signal and then transformed by means of a curve performing an inverse amplitude distortion with respect to that of Figure 11. The global result is equivalent to a non linear quantization of sound, where the quantization step is bigger (rougher) for bigger amplitudes and smaller (more detailed) for smaller amplitudes. This corresponds, at least from a qualitative point of view, to the way of functioning of our perceptual system. We are more sensitive to the intensity differences in case of soft sounds and less sensitive in case of loud and very loud sounds. The A-law, adopted in the digital systems in Europe, is very similar to the mu-law.

Figure 11
Figure 11 (mu-law.png)

Comments, questions, feedback, criticisms?

Send feedback