A Cost Model for Long-Term Compressed Data Retention
Kewen Liao
Department of Computing and Information Systems,
The University of Melbourne,
Victoria 3010, Australia.
Alistair Moffat
Department of Computing and Information Systems,
The University of Melbourne,
Victoria 3010, Australia.
Matthias Petri
Department of Computing and Information Systems,
The University of Melbourne,
Victoria 3010, Australia.
Anthony Wirth
Department of Computing and Information Systems,
The University of Melbourne,
Victoria 3010, Australia.
Status
Proc. 10th Int. Conf. Web Search and Data Mining,
Cambridge, England, February 2017,
pages 241-249.
Abstract
Vast amounts of data are collected and stored every day, as part of
corporate knowledge bases and as a response to legislative compliance
requirements.
To reduce the cost of retaining such data, compression tools are
often applied.
But simply seeking the best compression ratio is not necessarily the
most economical choice, and other factors also come in to play,
including compression and decompression throughput, the main memory
required to support a given level of on-going access to the stored
data, and the types of storage available.
Here we develop a model for the total retention cost (TRC) of a data
archiving regime, and by applying the charging rates associated with
a cloud computing provider, are able to derive dollar amounts for a
range of compression options, and hence guide the development of new
approaches that are more cost-effective than current mechanisms.
In particular, we describe an enhancement to the Relative Lempel Ziv
(RLZ) compression scheme, and show that in terms of TRC, it
outperforms previous approaches in terms of providing economical
long-term data retention.
Full text
http://doi.acm.org/10.1145/3018661.3018738.
(Author version (PDF)).
Errata
Looks like we committed the avoidable carelessness of having two
people use their own different bibtex tags for the same paper, and
not notice when we should have; refs [4] and [6] are the same.