Practical Compression for Multi-Alignment Genomic Files


Rodrigo Cánovas
Department of Computing and Information Systems The University of Melbourne, Victoria 3010, Australia.

Alistair Moffat
Department of Computing and Information Systems The University of Melbourne, Victoria 3010, Australia.


Status

Proc. 36th Australasian Computer Science Conference, Adelaide, Australia, January 2013, pages 51-60.

Abstract

Genomic sequence data is being generated in massive quantities, and must be stored in compressed form. Here we examine the combined challenge of storing such data compactly, yet providing bioinformatics researchers with the ability to extract particular regions of interest without needing to fully decompress multi-gigabyte data collections. We focus on data produced in SAM format, which is particularly voluminous in nature, and describe storage techniques that have the desired blend of attributes.

Full text

http://crpit.com/confpapers/CRPITV135Canovas.pdf .