Dictionary models: The LZ77 family of adaptive dictionary coders, The GZip variant of LZ77, The LZ78 family of adaptive dictionary coders, The LZW variant of LZ78.
Performance comparisons: Compression performance, Compression speed, Other performance considerations.
Further reading.
Chapter 3: Indexing
Sample document collections.
Inverted file indexing.
Inverted file compression: Nonparameterized models, Global Bernoulli model, Global observed frequency model, Local Bernoulli model, Skewed Bernoulli model, Local hyperbolic model, Local observed frequency model, Context-sensitive compression.
Performance of index compression methods.
Signature files and bitmaps: Signature files, Bitsliced signature files, Analysis of signature files, Bitmaps, Compression of signature files and bitmaps.
Comparison of indexing methods.
Case folding, stemming, and stop words: Case folding, Stemming, Effect on index size, Stop words.
Further reading.
Chapter 4: Querying
Accessing the lexicon: Access structures, Front coding, Minimal perfect hashing, Design of a minimal perfect hash function, Disk-based lexicon storage.
Partially specified query terms: Brute force string matching, Indexing using n-grams, Rotated lexicons.
Boolean query processingConjunctive queries, Term processing order, Random access and fast lookup, Blocked inverted files, Nonconjunctive queries.
Ranking and information retrieval: Coordinate matching, Inner product similarity, Vector space models.
Evaluating retrieval effectiveness: Recall and precision, Recall-precision curves, The TREC project, World Wide Web searching, Other effectiveness measures.
Implementation of the cosine measure: Within-document frequencies, Calculating the cosine value, Memory for document weights, Memory for accumulators, Fast query processing, Frequency-sorted indexes, Sorting.
JBIG: A standard for bilevel images: Resolution reduction, Templates and adaptive templates, Coding and probability estimation.
Lossless compression of continuous-tone images: The GIF and PNG lossless image formats, FELICS: Fast, efficient, lossless image compression system, CALIC: context-based adaptive lossless image codec, JPEG-LS: a new standard for lossless image compression.
JPEG: A standard for continuous-tone images.
Progressive transmission of images: Pyramid coding, Compression for pyramid coding, Median aggregation, Error modeling.
Summary of image compression techniques.
Further reading.
Chapter 7: Textual Images
The idea of textual image compression.
Lossy and lossless compression.
Extracting marks: Tracing the boundary of a mark, Removing the mark from the image, Sorting marks into natural reading order.
Template matching: Global template-matching, Local template-matching, Compression-based template-matching, Screening library templates, Evaluation of template-matching methods.
From marks to symbols: Library construction, Symbols and their offsets.
Coding the components of a textual image: Library, Symbol numbers, Symbol offsets, Original image.
Performance: lossy and lossless modes.
System considerations.
JBIG2: A standard for textual image compression.
Further reading.
Chapter 8: Mixed Text and Images
Orientation: Detecting straight lines using the Hough transform, Left-margin search, The projection profile, From slope histogram to docstrum.
Segmentation: Bottom-up segmentation methods, Top-down and combined segmentation methods, Mark-based segmentation, Segmenting short text strings, Segmentation using a document grammar.
Classification.
Further reading.
Chapter 9: Implementation
Text compression: Choice of compression model, Choice of coder, Limitations on Huffman codes, Length-limited coding.
Mandatory disclaimer:
This page, its content and style, are the responsibility of the
author and do not necessarily
represent the views, policies, or opinions of The University of Melbourne.