String Search Experimentation Using Massive Data


Alistair Moffat
Department of Computing and Information Systems, The University of Melbourne, Victoria 3010, Australia.

Simon Gog
Department of Computing and Information Systems, The University of Melbourne, Victoria 3010, Australia.


Status

Philosophical Transactions of the Royal Society A, 372(8), March 2014.

Abstract

Descriptions of new string search or indexing algorithms are often accompanied by an experimental evaluation. In this article, we provide guidance as to how such investigations can be carried out, drawing on our experience of measurement in this field. In particular, we describe methodologies for stratifying patterns according to their length and frequency, so that precise response-time measurements can be made; and we describe a metric for categorizing the extent of "repetitiveness" in a text, so that dataset type can also be factored into evaluations. We show that separating these concepts allows a greater understanding of the behaviour of string search algorithms.

Full text

http://dx.doi.org/10.1098/rsta.2013.0135