Posts Tagged ‘wavelets’

Starting wavelet difference reduction

Friday, February 26th, 2010

Its been slightly under a month since my last post. In the interim I have done much battling with a teething baby that has resulted in a lack of motivation to attack a complicated problem. The past couple of nights, though, I have set back to the WDR compressor with the intention of getting it working.

I now, fortunately, have a limited compressor/decompressor going.

In the previous article I described the basics behind a Daub 5/3 forward wavelet transform. In this article I will be describing how to do the significance pass of the WDR algorithm.

The basic idea is to run through the high pass sections of the image (LH, HL and HH) with a specific threshold. At each threshold level the algorithm needs to find the number of pixels to the next pixel that is greater than (or equal to) the threshold.

For example, take an image that has these values stored in its high pass:

0     0   -2    0     0    8    0   0

0     9     0  -3     0    4    0   0

0     0     1    7     0    0   6    0

0     5     0    0    0   -8   0    1

0     0     0    0    0    0   2    0

0     0   -4     0    6    0   0    0

0     3     0    0    0    7   0    0

0   -8     0    0    0    0    5    0

0    0     0    0   -2    0    0   0

First start with an arbitrary threshold. In this example I will choose 8. We now start from the beginning and scan through for the first value greater-than-or-equal-to 8. Pixel 6 is the first, pixel 10 is the next, pixel 30 the 3rd and so on. In WDR we want the difference between those pixels. So We would store 6, 4 and 20 for the 3 pixels above. The other obvious part is the sign. In fact we’d want to store those first 3 pixels as +6, +4 and -20. When you have found all the pixels in the image that are above your threshold then you write a + to indicate the end of a run (Though I’m pretty sure it would be quite simple to eliminate the trailing +). You then halve the threshold and process again (Though ignoring the pixels you have already processed).

Thats the basics. Of course its far more complicated than that. For one each high pass section of the image is read in a different way and not as simply as I explained above. This is explained in many papers on WDR (An example). The scan orders are as follows:

For LH:

1   2    3    4

8   7   6    5

9   10 11 12

16 15 14 13

(ie You go form left to right on odd lines and right to left on even forming a weaving pattern)

For HL:

1 8   9   16

2 7  10 15

3 6  11  14

4 5  12  13

(ie Similar to LH but vertical scans instead of horizontal)

For HH:

1    2   6    7

3    5   8   13

4    9  12  14

10 11 15  16

(ie This one is performed by zig-zagging across the texture).

The reason the image is processed in this way is to try and maximise the runs of pixels that you will get. Again this is all explained well in various papers (Though it may well be interesting to play with these to see if you get better compressions by running across the pixels in different ways).

So now you have your list of differences and signs for a set of thresholds. The final part of the WDR algorithm is to bit encode your data. This is something that is poorly explained in many of the articles that I have read, I would truly have been lost if it was not this paper. If we look at the bit patterns of each of the numbers we retrieved earlier (ie 6 = 110, 4 = 100 and 20 = 10100) we can see that the highest bit is always 1 this is the case for all our numbers (If this doesn’t seem obvious now scribble out some binary numbers and you will see why its quite obvious!). Thus when bit encoding we can ignore the leading 1 and store the subsequent bits (ie 10, 00, 0100). Brilliant. We do, however, need to store the sign along side this number. For this reason we use a 2 bit per digit scheme. – = 11, + = 10, 1 = 01, 0 = 00. Armed with this information we can store the numbers in a bit stream as 100100 — 100000 — 11000100. When we build this up over each pass for each threshold we end up saving quite a few bits. This is the main meat of the compression.

Of course writing the bits is not the most fun of things. For this reason I wrote my own templated bitstream class for handling easy addition and lookup of the bit patterns I was after.

My code looked something like this:

 int count	= 0;
int max	 = (width >> 1) * (height >> 1);
while( count < max ) { const int pixelStart	= buffStart	+ iter->GetX() + (iter->GetY() * pitch);
const int16_t val	 = plane.pixels[pixelStart];
const int16_t absVal	= abs( val );
if ( absVal > threshold && absVal < (threshold << 1))
{
	// Push sign.
	bitStreamOut.wdrBitStream += (val < 0) ? BitEncoding::Bit_Minus : BitEncoding::Bit_Plus;
	// Push numbers into bit stream.
	uint8_t bitNum	 = 0;//HighestBitSet( pixelPos );;
	uint8_t bitMax	 = HighestBitSet( pixelPos );
	while( bitNum < bitMax )
	{
		bitStreamOut.wdrBitStream	+= (((pixelPos & (1 << bitNum)) >> bitNum) == 0) ? BitEncoding::Bit_Zero : BitEncoding::Bit_One;
		bitNum++;
	}
	pixelPos	= 1;
}
pixelPos++;
iter->Next();
count++;
}
//Emit +remainingPixels+ to indicate end …
bitStreamOut.wdrBitStream += BitEncoding::Bit_Plus;
uint8_t bitNum	 = 0;//HighestBitSet( pixelPos );;
uint8_t bitMax	 = HighestBitSet( pixelPos );
while( bitNum < bitMax )
{
	bitStreamOut.wdrBitStream	+= (((pixelPos & (1 << bitNum)) >> bitNum) == 0) ? BitEncoding::Bit_Zero : BitEncoding::Bit_One;
	bitNum++;
}
bitStreamOut.wdrBitStream += BitEncoding::Bit_Plus;

Something to note in the code ismy iter->Next() which is one of my 3 iterators that define the way we wind through each highpass section.
We now have a compressed bit stream. I used the now ubiquitous lena image for testing this out.
This is the result of the Wavelet transform:

The arrangement of this image is slightly backward. This is due to the way windows reverses images. You can however see the HL and HH passes on the top and the LL and LH passes on the bottom (instead of the other way round).
From there we need to decompress the bit stream and end up with something like the original image:

This decompression took me a while to get right. Basically, though, you need to go back over the bit stream dropping the appropriate bits in the correct places until you get back an image like the former. Running the inverse wavelet transform should result in the above image.
Easy for me to say. Firstly we need to drop the correct values in the correct places. To do this we need the original threshold value we started with. The bit stream will always start with a plus or minus. Following that we have the bits of the difference value. When we find another sign we are on to the next pixel. Don’t forget that the 1 needs to be added to the left of the number! When we get a number that takes us to the end of the image we will find an extra plus and we can go back to the beginning at half the threshold.
So the first value we get out is a 6 and we have a sign of +. This means we are writing a +8 at the 6th pixel position. We then read a +4 so we are writing a +8 at the 10th position. We next get that -20 which means we write a -8 at the 30th position and so on.
The quick (er than me) will noticed that this means that the value of 9 in our earlier example will get written back as an 8 (or a value of 7 written as a 4). This is where the refinement pass comes in. I will explain this in a further article (ie when I get it working ).
My code for doing this looked something like the following:

iter->SetToStart();

int pixelPos                = 0;
while( pixelPos < size )
{
	int sign                    = bitStream.wdrBitStream[bit];
	bit++;

	int pixelDiff               = 0;
	int bitPos                  = 0;
	BitEncoding::Bit readBit    = BitEncoding::Bit_Zero;
	while( readBit < BitEncoding::Bit_Plus )
	{
		readBit = bitStream.wdrBitStream[bit];
		switch( readBit )
		{
		case BitEncoding::Bit_One:
			pixelDiff   |= readBit << bitPos;
		case BitEncoding::Bit_Zero:
			bitPos++;
			break;

		case BitEncoding::Bit_Minus:
		case BitEncoding::Bit_Plus:
			break;

		}
		bit++;
	}
	pixelDiff   |= 1 << bitPos;   // Always a 1 at the top end.

	pixelPos    += (pixelDiff - 1);

	if ( pixelPos < size )
	{
		// Step forward through the iterator pixelPos times.
		uint32_t count  = 0;
		while( count < pixelDiff )           {               count++;               iter->Next();
		}

		const uint32_t writePos   = buffStart   + (iter->GetX() + iter->GetY() * pitch));
		planeOut.pixels[writePos] = (sign == BitEncoding::Bit_Plus) ? threshold : -threshold;

		bit--;
	}
}

Of course you can run the wavelet transform on the low-pass sub-image created from previous steps. This will give you something like the following:

Decompressing this image returns you to the following:

In this you can see some colour errors. I think this is caused by my lack of a refinement pass however its well worth noting the details on her hat brim and her boa. Both of these have returned and look good in the re-sized image.

Anyway … should anyone actually be reading my ramblings or have any questions then say hi in the comments!

Update:   I identified a bug whereby the software would get confused by a 0 and a 1 appearing in the WDR stream as both end up as the same value.  Zeros really shouldn’t be turning up so this has now been fixed.

The difficulties of Wavelets

Sunday, January 31st, 2010

I’ve now spent a few weekends working on my WDR system. The one overriding thing that gets to me is just how bad the information is out there. There are lots of things that aren’t quite spelled out well enough for non-maths gods like me.

Anyway. On to the basics of the transform. Anyone who has dealt with wavelets knows that you decompose the image into 4 sections.

LL LH
HL HH

You then go on to to the same process to the LL portion of the image. Rinse and repeat this method for as long or as short as you like (You don’t have to even do more than the first decomposition). Effectively what you now have is high pass information and low pass information. The low pass is the LL section. The highpass sections give you the horizontal high pass, the vertical high pass and the diagonal high pass (Effectively edge detect in 3 different directions).   These are generated by first applying the wavelet transform horizontally and then applying it on the resulting image vertically.  The joy of the wavelet transform is that it is reversible. Exactly so. This means you can re-construct the original image from the various high and low passes losslessly.   Its worth bearing in mind that this can be trivially extended into 3 dimensions to compress video.   I keep wondering whether you’d be able to re-sample frame rates nicely using this sort of video transform but thats an aside.

Ok so lets get on to a wavelet transform.    I’m going to use the Daub5/3 wavelet because it can be done entirely with integers which means there are no rounding issues.  It is truly lossless.

The equation is as follows (Taken from: A Low-power, Low-memory System for Wavelet-based Image Compression by Walker, Nguyen & Chen):
Daub5/3 Equation

My first thought on seeing that was WTF!?  However I carried on reading through the excellent paper linked above.

It turns out that there is a more efficient (And simpler) form of this wavelet transform known as a lifting wavelet transform.  The transform looks like the following:

Daub 5/3 Lifting Equation

Brilliant.  This looks like a far more friendly bit of maths.  So lets take a look at what these things mean.
  • s0 is the Image we are performing the transform on.  It is a very standard image in the 8-bit 0->255 range.
  • s1 is the low passed output of the transform.
  • d1 is the high passed output of the transform
  • The whole n thing is a bit confusing.  2n is every other pixel in the source image.  It is not, however, every other pixel in the destination image.  The indication of 2 indicates a step of 1 pixel when dealing with s1 and d1.  This is massively confusing for the likes of me.

First thing to notice is that the calculation for s1 contains usage of d1.  Not just d1 but the  previous pixel’s d1.

(At this juncture I’d like to point out that the aforementioned paper gives another equation as an “integer-to-integer” equation.  Try as I might I could never get this working.  The equation I’ve just posted works perfectly).

I’m going to move on to some code now.

const int n = (x) + (y * pixelPitch);

const int n2 = (x / 2) + (y * pixelPitch);

const int s = n2;

const int d = n2 + widthDiv2;

const int16_t d1 = pixelsIn[n + 1] –

(((pixelsIn[n] + pixelsIn[n + 2]) >> 1));

const int16_t s1 = pixelsIn[n] +

(((pixelsOut[d – 1] + d1) >> 2));

pixelsOut[d] = d1;

pixelsOut[s] = s1;

The above is a simple code version of the wavelet transform.    There is a problem with it, however.  You’ll note that due to the fact it uses the previous high pass pixel outputed in the s1 calculation and the next pixel from the input in the d1 calculation.

This means that we get a break down at either end of a line.  Fortunately this is fairly easy to rectify (Though I can find no mention of it anywhere).  For the first pixel on a line you can just use d1 twice.  For the last pixel you use the current pixel twice.  Its an easy solution and works well.

When doing a vertical decomposition the equation is much the same but instead of doing -1, +1 and +2 by adding/subtracing the pixel pitch (The width of the entire original texture) instead of 1.  Doing this twice over performs the appropriate 2D forward wavelet transform.

Inverting the wavelet transform is pretty simple as well.  You can easily get the reverse of the transform by re-arranging the equations.  Bear in mind, however, that you need to step from the right hand side of the image towards the left (ie backwards) to re-construct the image.  The trick above for handling the special cases at either end of the line works just as well for the inverse transform.

The code for the central part of the inverse transform will look something like this:

const int n = (x) + (y * pixelPitch);

const int n2 = (x / 2) + (y * pixelPitch);

const int s = n2;

const int d = n2 + widthDiv2;

const int16_t s0 = pixelsIn[s] –

(((pixelsIn[d – 1] + pixelsIn[d]) >> 2));

const int16_t s1 = pixelsIn[d] +

(((s0 + pixelsOut[n + 2]) >> 1));

pixelsOut[n] = s0;

pixelsOut[n + 1] = s1;

Anyway I hope thats slightly useful.  It certainly will be to me when I inevitably forget what I’ve done 🙂