Query Execution in the Presence of Data Skew in Parallel Databases

Kevin H. Liu, Yi Jiang and Clement H. C. Leung
Department of Computer and Mathematical Sciences, Victoria University of Technology, Footscray Campus, Melbourne, Vic 3000, Australia.
{kevin, jiang, clement}@matilda.vut.edu.au


The problem of data skew has been recognised as one of the obstacles in preventing maximum speedup in parallel query execution. Although techniques of skew handling have received considerable attention in recent years, most of them are devoted to solving or avoiding data skew for single binary relational operation such as join (intra-operation level). In this paper, we investigate the effect of data skew on the parallel execution of queries (intra-query level) that may involve a number of binary operations. Recognising that allocating a large number of processors does not improve the overall execution time significantly in the presence of data skew, our approach is to identify the skewed operations and group them with other operations in each execution phase, so that the performance degradation caused by the skewness is kept to a minimal level. Both unary and binary operations cost models are provided with the data skew factor modelled by the Zipf distribution. Three processor allocation strategies are presented and their performances are evaluated and compared in the presence of data skew.


Note that the first author name above links to a home page and if you have a personal home page please do this. (In the example the conference home page has been linked). The second author is assumed to not have a home page; please use this style for all authors that are not web-visible. Please do NOT link in any departmental or university pages, no matter how pretty they are.

Finally, please refrain from modifying the skeleton. The idea is that if everyone does as asked, we will have a consistent style for the whole on-line conference guide, and that is worth striving for, even at the expense of supressing individual desires for coloured hrules or whatever.

Conference Home Page