Practical Compression for Multi-Alignment Genomic Files
Rodrigo Cánovas
Department of Computing and Information Systems
The University of Melbourne,
Victoria 3010, Australia.
Alistair Moffat
Department of Computing and Information Systems
The University of Melbourne,
Victoria 3010, Australia.
Status
Proc. 36th Australasian Computer Science Conference,
Adelaide, Australia, January 2013, pages 51-60.
Abstract
Genomic sequence data is being generated in massive quantities, and
must be stored in compressed form.
Here we examine the combined challenge of storing such data
compactly, yet providing bioinformatics researchers with the ability
to extract particular regions of interest without needing to fully
decompress multi-gigabyte data collections.
We focus on data produced in SAM format, which is particularly
voluminous in nature, and describe storage techniques that have the
desired blend of attributes.
Full text
http://crpit.com/confpapers/CRPITV135Canovas.pdf
.