Statistical Greyscale Images

article by Ben Lincoln

Statistical greyscale images are really an extension of the relatively simple calculated images discussed in the previous article (Calculated Greyscale Images). However, because the "abstraction distance" between them and the source material is so much greater, I thought it would be more clear to treat them as a separate topic.

An Introduction To Statistical Functions

The field of statistics has a handful of mathematical functions which are used to analyze sets of data. These functions are sometimes referred to as "moments". Most explanations I've seen of them are poorly-written, do little to illustrate to the non-specialist what their value is, and get mired in the language of statistics without explaining what unfamiliar terms mean. This article will not make the reader an expert in statistics, but hopefully it will provide a basic understanding of some of the concepts involved, while avoiding those common pitfalls.

Three of the standard moments are familiar to just about everyone: the minimum (the lowest value in a set of data), the maximum (the highest value in a set of data), and the average (the sum of all values in the set, divided by the number of values in the set). Most people have also heard of the median (the middle value in a set of data), which is distinct from the average in several important ways. The median is not usually considered a "standard" moment, but I've included it here (and in The Mirror's Surface Breaks), because I find it very useful. These four functions yield straightforward, basic information about numeric data: how big does it get, how small does it get, and roughly where is the middle point?[1]

Three additional standard moments transform the raw data into a figure which represents how much variation/spread there is across the set of data: the standard deviation, the average deviation, and the variance. Understanding the specific differences between these three functions requires delving into the sort of "TLDR"[2] language that I want to avoid here, but the underlying concept is simple: if a random single data point from the data set is examined, how much is it likely to differ from the rest of the data points (as calculated using the average, median, mode, etc.)? Here again, I've added a function that is not usually included in the standard moments: the range (the maximum value minus the minimum value). Range is certainly not as elegant as the other three of this type, but it can also provide an interesting perspective on certain types of data.

Two final functions round out the standard moments: the skewness, and the kurtosis. Skewness (as its name implies) represents (more or less) how evenly-distributed (or not) the data points are across their range. For example, if twenty random adults in the US and Paul Allen have their net monetary worth analyzed using statistical functions, Paul Allen's will almost certainly "throw off" the results, because his wealth is so much greater than that of most people in the US. Skewness provides a method of showing the presence of an "odd factor" like this.

Kurtosis is... different. Depending on who you ask, you may be told that it is supposed to represent the "peakedness" of a data set, "the heaviness of [its] tails"[3], its "bimodality"[8], et cetera. If you can be bothered to attempt to figure out what those terms mean in a statistical context, you will probably still be confused[9]. Fortunately, there will not be a quiz at the end of this article. As will be illustrated shortly, the easiest way to think of kurtosis is as being kind of like skewness, but different enough to be interesting in its own way.

Simple Data Set Examples

As with many mathematical topics, it can be very helpful to see the concepts illustrated. Here's a simple set of seven series of data, with annotations explaining why I set them to certain values at certain points along the X-axis:

Example Data Series
 Annotated Example Data

The set of seven data series that will be used for the statistical function examples.

As you can see, moving from left to right provides a wide variety of groupings and spreads for the data points. This should provide a good basis for showing how each of the ten functions described above responds differently to those types of values. The following comparisons are provided in two different ways: with the input data and function output in separate (aligned) graphs, and also with both graphs overlaid on top of each other. Some readers may prefer one over the other, and some aspects may be easier to see in one representation versus the other.

Maximum, Minimum, Average, and Median
 Separate
 Overlaid

In this first graph, note the difference between the behaviour of the average and the median. The average is a smoothly-changing value that lies exactly where one would expect it to be - its value is interpolated from the actual data. By contrast, the median lies as close as possible to the middle as it can while still being equal to one of the real data points (as long as there are an odd number of data points[4]). This is the key factor that separates these two functions: the average may represent a value that is nothing like any one of the real data points, whereas the median preserves the distinction between drastically different values.

Going back to the previous example involving Paul Allen, the average wealth of the set of the 21 people will be dramatically increased by his inclusion, but the median will not be biased to anything like that degree. On the other hand, if the person analyzing the data only looks at the median, they will be completely oblivious to the presence of Paul Allen in the sample. This is one of the reasons to use multiple, complementary statistical functions.

This is also one of the main reasons I included the median: aside from the minimum and maximum, it is the only common statistical function which returns data that was actually present in the sample, instead of introducing synthetic/derived values[5].

Range, Average Deviation, Standard Deviation, and Variance
 Separate
 Overlaid

Moving on to the next four functions, it becomes immediately obvious why range is often described by statisticians as "clumsy [and] random" compared to the "elegant... more civilized" average deviation, standard deviation, and variance[6]. At the first major transition, when a single value jumps to the top, the range makes the same jump, and stays "pegged" at that value until all of the values drop back down. Meanwhile, the three more-refined functions reveal more detail about how the values are changing.

Skewness and Kurtosis
 Separate
 Overlaid

In this final graph, it becomes obvious that I wasn't just trying to be clever about kurtosis being hard to describe. Again, it's sort of like the skewness, just... different. Of particular note are the last two (rightmost) data points. In the original data set, I deliberately had one of the values go from being biased towards the positive values to being biased towards the negative values. You can see this reflected in the skewness (which goes from negative to positive as a result), but the kurtosis is unchanged because the amount of bias has not changed, only whether that bias was positive or negative. In addition, note the peaks and valleys of the kurtosis from points 39 to 46 on the X-axis, where skewness remains flat.

One last interesting fact about these two functions: they are the only two standard moments which can return negative values when all of the input data is positive.

Statistical Interpretation of Image Data

Now that I've loosely described and glossed over important aspects of the vast and complex field of statistics, how does all of this apply to the original subject: multispectral imaging?

Digital image data is, of course, a set of numeric values, and therefore can be processed using these same functions. It's not particularly useful to calculate a single number for an entire image, so the data needs to be divided up into subsets for analysis. In terms of an image cube, the sections can logically be one-dimensional (across multiple images), two-dimensional (within a single image), or three-dimensional (again, across multiple images). This image attempts to illustrate the last of these, in which 3-dimensional boxes of data are sampled from the cube as a whole. Each box will be used to calculate one pixel in a derived greyscale image. Other geometric shapes besides boxes (cylinders, etc.) could be used, but the relatively simple software that I wrote for this purpose doesn't support them.

Image Cube Statistical Sampling
 Statistical Sampling

An illustration of sampling data along the Z-axis of the image cube.

Note that this image is a simplification, in that in actuality, I've found that the one-dimensional method is the most useful overall, and when using the three-dimensional method, the "window" for each box moves only one pixel for each sub-sample, instead of moving by the entire width/height of the box for each sub-sample.

Like the statistical graphs above, the resulting images each represent a different facet of the image cube.[7] You can think of the "stack" of derived statistical images as a second "cube" of sorts, and this is the theoretical model used in The Mirror's Surface Breaks.

Statistical Images of Flowers
 Maximum
 Minimum
 Average
 Median
 Range
 Average Deviation
 Standard Deviation
 Variance
 Skewness
 Kurtosis

Date Shot: 2010-10-16
Camera Body: Nikon D70 (Modified)
Lens: Nikon Micro-Nikkor 105mm f/4
Filters: Standard Set
Date Processed: 2010-12-19
Version: 1.0

This type of processing can be very useful with satellite imagery in particular. Note how these example images (generated from OnEarth imagery of Big Bend National Park) each emphasize different aspects of the geological features.

Statistical Images of Big Bend From Space
 Maximum
 Minimum
 Average
 Median
 Range
 Average Deviation
 Standard Deviation
 Variance
 Skewness
 Kurtosis

The Effect Of Sample/Window Size

The statistical functions are generally at their strongest when used with as much data as possible. In order to further illustrate the differences and similarities between them, here are some sample images processed using an increasing "window size". In this context, a window size of 81 represents a square 81 pixels wide and tall, or having a total area of 6561 pixels. As previously mentioned, the use of odd numbers is so that the median is truly the middle value, and not the average of the two closest to the middle. Unlike the previous images, these were sampled using the "2-dimensional" method, in which all of the data comes from a single spectral band. Because the results are more obvious in colour, the same processing was applied to the red, green, and blue bands of the image, so the colours in these images are based on the human-visible version of the original shot (taken from Kerry Park in Seattle).

As you look at the various results, note how certain functions highlight various aspects of the image, such as the speck of dust on the sensor above and to the left of Mount Rainier. Normally I would edit the flaw out of the original image and re-run the processing, but in this case I left it in for example purposes.

Note also how most of the statistical functions are affected to some degree by the shape of the sampling window, similar to the way in which the shape of the aperture in a camera lens determines how "bokeh" appears in images taken with that lens.

Window Sizes: Maximum
 WS03
 WS05
 WS07
 WS09
 WS11
 WS13
 WS17
 WS27
 WS81

Window Sizes: Minimum
 WS03
 WS05
 WS07
 WS09
 WS11
 WS13
 WS17
 WS27
 WS81

Window Sizes: Average
 WS03
 WS05
 WS07
 WS09
 WS11
 WS13
 WS17
 WS27
 WS81

Taking the average is one method of blurring an image.

Window Sizes: Median
 WS03
 WS05
 WS07
 WS09
 WS11
 WS13
 WS17
 WS27
 WS81
 Quasi-Depth-of-Field Composite

Because I thought this was such an interesting effect, I included a manually-composited "quasi-depth-of-field" variation, in which the window size increases with the distance from the camera.

The largest window size provides a dramatic illustration of how range lacks the precision of the average deviation, standard deviation, and variance functions. All four of these functions provide results reminiscent of edge detection algorithms, because they operate using a similar principle - basing the brightness of a given pixel not on its absolute value, but on how much variation there is in a set of adjacent pixels. One important difference is that edge detection (at least as I'm familiar with it) gives the center pixel a special place in the analysis, whereas using this statistical method treats all pixels in the window equally.

Window Sizes: Range
 WS03
 WS05
 WS07
 WS09
 WS11
 WS13
 WS17
 WS27
 WS81

Window Sizes: Average Deviation
 WS03
 WS05
 WS07
 WS09
 WS11
 WS13
 WS17
 WS27
 WS81

Window Sizes: Standard Deviation
 WS03
 WS05
 WS07
 WS09
 WS11
 WS13
 WS17
 WS27
 WS81

Window Sizes: Variance
 WS03
 WS05
 WS07
 WS09
 WS11
 WS13
 WS17
 WS27
 WS81

Once again we see that skewness and kurtosis stand apart from the other functions. They are also unique among those presented here in that the amount of noise they introduce into the image increases dramatically with smaller window sizes.

Window Sizes: Skewness
 WS03
 WS05
 WS07
 WS09
 WS11
 WS13
 WS17
 WS27
 WS81

Window Sizes: Kurtosis
 WS03
 WS05
 WS07
 WS09
 WS11
 WS13
 WS17
 WS27
 WS81

Image Artifacts

Every once in awhile, statistical prcessing of this type will result in artifacts caused by the physical sensor design, image compression algorithms, or other factors. At least, I assume that's what's going on here. The only other possibility I can think of is that I'm living in some sort of Thirteenth Floor-esque simulation, and just happened to take a photograph in which the invisible grid lines were in nearly perfect alignment with not only my camera, but also the old gasworks processing tower I was aiming at.

Processing Artifacts
 Gasworks - Skewness [Grey]

Date Shot: 2010-06-29
Camera Body: Nikon D70 (Modified)
Lens: Nikkor-H 85mm f/1.8 @ f/8
Filters: Standard Set
Date Processed: 2010-07-01
Version: 1.0

This particular case actually resulted in an image that I liked the appearance of. Usually the effect just highlights things such as dead lines of photosites on my camera's CCD.

You've come to the end of the articles on greyscale imagery. From here on out, it's all colour. If you still haven't had enough of statistics, give the audio samples at the end of the Statistical Audio Synthesis article a listen.

Footnotes
 1 There is a fifth function in this class called the "mode". The mode is the most common value out of a set of data, making it distinct from - but similar to - both the average and the median. I have left out the mode from this discussion because it is really only useful when analyzing very large sets of integer (whole number) data (which is not usually the case with images), and because calculating it is even more computationally-intensive than calculating the median. It effectively requires generating a histogram for each set being analyzed. 2 "Too long, didn't read". And you know that when someone who writes as verbosely as I tend to uses a phrase like that, it means something. 3 Statistical Background, SAS Elementary Statistics Procedures. 4 If there are an even number of data points, the median is calculated as the average of the two closest to the middle. This is why in the example data (and in The Mirror's Surface Breaks), I've tried to set the data up so that it has an odd number of values. 5 Again, as long as the number of data points is odd. The mode (discussed in a previous footnote) is also "special" in the same sense of not introducing artificial values. 6 Note: statisticians may not actually describe the functions in this manner. 7 Note that with the exception of the maximum, minimum, average, and median versions, these images have been "normalized". That is, black represents whichever value happened to be lowest, and white represents whichever value happened to be highest, with the intermediate values scaled accordingly. This was done to improve contrast, but also because as previously discussed, there is no good way to represent positive versus negative values in a greyscale image, which would be necessary for non-normalized skewness and kurtosis. 8 Is Kurtosis Really "Peakedness?", Richard B. Darlington, The American Statistician, Vol. 24, No. 2 (Apr., 1970), pp. 19-22. 9 For those who are curious, the general idea is that "peakedness" refers to whether or not the data in a set tends to be distributed evenly across its range. A data set with "heavy tails" is one in which the number of values that approach the minimum and maximum (as opposed to the middle) are greater than expected based on some other criteria. A "bimodal" data set is one in which there are two primary "clusters" of values, rather than one. As you can see, one of the few things about kurtosis that isn't confusing is why it's not part of the popular vocabulary.