25.07.2013 Views

Lempel-Ziv Coding p g — Adaptive Dictionary Compression Algorithm

Lempel-Ziv Coding p g — Adaptive Dictionary Compression Algorithm

Lempel-Ziv Coding p g — Adaptive Dictionary Compression Algorithm

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Lempel</strong>-<strong>Ziv</strong> p <strong>Coding</strong>g<br />

<strong>—</strong> <strong>Adaptive</strong> <strong>Dictionary</strong> <strong>Compression</strong> <strong>Algorithm</strong><br />

1. LZ77:Sliding Window <strong>Lempel</strong>-<strong>Ziv</strong> <strong>Algorithm</strong> [gzip, pkzip]<br />

Encode a string by finding the longest match anywhere<br />

within a window of past symbols and represents the string by<br />

a pointer to location of the match within the window and the<br />

length of the match.<br />

J. A. Storer and T. G. Szymanski, “Data <strong>Compression</strong> Via<br />

Textual Substitution, Substitution ” JJ. ACM ACM, 29(4) 29(4), pp.928-951, pp 928 951 1982<br />

1


AAssume we hhave a string i to bbe compressed df from a<br />

finite alphabet. A parsing S of a string is a division<br />

of the string into phrases separated by commas n x x x , , 1 2 L<br />

x1,<br />

x2,<br />

L,<br />

of the string into phrases, separated by commas.<br />

Let W be the length of the window. window<br />

Then the algorithm can be described as follows:<br />

Assume that we have compressed the string until time i −1.<br />

To find the next phrase, find the largest k such that for some j,<br />

i − 1 1− W ≤ j ≤ i − 1 1,<br />

th the string t i of f length l th k starting t ti at t j iis equal l<br />

to the string (of length k) starting at xi (i.e., x j+<br />

l xi+<br />

l for all =<br />

0 ≤ l < k k)<br />

).<br />

2


Th The next phrase h is i then h of f length l h k (i (i.e., xi, xi+<br />

1, L,<br />

xi+<br />

k−1<br />

) and d<br />

is represented by the pair (P, L), where P is the location of<br />

the beginning of the match and L is the length of the match match.<br />

If a match is not found in the window, window the next character is<br />

sent uncompressed.<br />

To distinguish between these two cases, a flag bit is needed,<br />

and hence the phrases p are of two types:(F, yp ( P, L) ) of ( (F, C), )<br />

where C represents an uncompressed character.<br />

3


NNote that h the h target of f a ( (pointer, i length) l h) pain i could ldextend d<br />

beyond the window, so that it overlaps with the new phrase.<br />

In theory, this match could be arbitrarily long; in practice,<br />

though though, the maximum phrase length is restricted to be less<br />

than some parameter.<br />

4


FFor example, l if W = 4 and d the h string i is i<br />

ABBABBABBBAABABA and the initial window is empty,<br />

the string will be parsed as follows:<br />

A, B, B, ABBABB, BA, A, BA, BA,<br />

which is represented by the sequence of “pointers”:<br />

(0,A), (0,B), (1,1,1), (1,3,6), (1,4,2), (1,1,1), (1,3,2), (1,2,2),<br />

where h the th flag fl bit is i 0 if there th is i no match t h and d 1 if there th is i a<br />

match, and the location of the match is measured backward<br />

from the end of the window. window<br />

5


4 3 2 1<br />

A B<br />

(0,A);(0,B) (1,1,1) B A B A ···<br />

(1,3,6)<br />

A B B ABBABBB···<br />

A B B A B B B ···<br />

longest match<br />

(1,3,2) ·· · B A B B B A A B A B A<br />

B B B A<br />

(1,1,1) A B A B A<br />

B B A A<br />

(132) (1,3,2) B BABA A B A<br />

A A B A<br />

(122) (1,2,2) B BA A<br />

6


2LZ78 2. LZ-78 Tree-structured T dL <strong>Lempel</strong>-<strong>Ziv</strong> lZi <strong>Algorithm</strong>s Al i h [GIF; [GIF<br />

compress on Unix]<br />

This algorithm parsed a string into phrases, where each<br />

phrase is the shortest phrase not seen so far far.<br />

This algorithm can be viewed as building a dictionary in<br />

the form of a tree, where the nodes correspond to phrases<br />

seen so far.<br />

This algorithm is simple to implement and has become<br />

popular p p as one of the early ystandard algorithms g for file<br />

compression on computers because of its speed and<br />

efficiency. It is also used for data compression in highspeed<br />

modems.<br />

7


FFor the h string: i : ABBABBABBBAABABAA ···,<br />

we parse it as<br />

A, B, BA, BB, AB, BBA, ABA, BAA,···<br />

After every comma, comma we look along the input sequence until<br />

we come to the shortest string that has not been marked off<br />

before before.<br />

Since this is the shortest string, all its prefixes must have<br />

occurred earlier. (Thus, we can build up a tree of these<br />

phrases.)<br />

8


IIn particular, i l the h string i consisting i i of fall ll but b the h last l bit bi of fthis hi<br />

string must have occurred earlier. We code this phrase by<br />

giving the location of the prefix and the value of the last<br />

symbol.<br />

Thus, the string above would be represented as:<br />

(0 (0,A), A) (0 (0,B), B) (2,A), (2 A) (2 (2,B), B) (1 (1,B), B) (4,A), (4 A) (5 (5,A), A) (3 (3,A), A) ···<br />

9


1 A 0A<br />

2 B 0B<br />

3 BA 2A<br />

4 BB 2B<br />

5 AB 1B<br />

6 BBA 4A<br />

7 ABA 5A<br />

8 BAA 3A<br />

0<br />

A B<br />

1<br />

2<br />

B A B<br />

7 ABA 5A 5 3 4<br />

A<br />

7<br />

A<br />

A<br />

8 6<br />

10


SSending di an uncompressed dcharacter h in i each hphrase h results l in i<br />

a loss of efficiency. It is possible to get around this by<br />

considering the extension character (the last character of the<br />

current phrase) as part of the next phrase.<br />

T. A. Welch, A technique for high-performance data<br />

compression, computer, 17(1) 17(1):pp.8-19, pp.8 19, 1984. LZW<br />

algorithm.<br />

11


Di <strong>Dictionary</strong> i<br />

current<br />

input look-ahead<br />

buffer<br />

c0 c1 L c j c j+<br />

1 L cM<br />

−1<br />

i ai+<br />

1<br />

a L<br />

ai+ 2 ai+N<br />

−1<br />

(1) Fi Find dth the longest l tmatch t hbbetween t the th strings ti stored t di in the th<br />

dictionary and the string, started at a , in the look-ahead<br />

i<br />

buffer buffer.<br />

Assume the starting address of the matched string in the<br />

dictionary is J and the longest match length is K (i.e.,<br />

A a , a , L,<br />

a })<br />

I<br />

= { i i+<br />

1 i+<br />

K−1<br />

12


(2) SSend d ( J , K K,<br />

ai+<br />

k ) tto th the ddecoder d<br />

update p the dictionary y by y ppushing g A I (the ( longest g matched<br />

string) + a i+<br />

k into the <strong>Dictionary</strong>.<br />

(3) fill the look-ahead buffer, starting at a i+<br />

k , from the<br />

following input string.<br />

(4) repeat step 1 until the end of the input.<br />

13

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!