NICTA I2D2 Group at GeoCLEF 2006


Yi Li
Department of Computer Science and Software Engineering, The University of Melbourne, Victoria 3010, Australia.

Nicola Stokes
Department of Computer Science and Software Engineering, The University of Melbourne, Victoria 3010, Australia.

Lawrence Cavedon
Department of Computer Science and Software Engineering, The University of Melbourne, Victoria 3010, Australia.

Alistair Moffat
Department of Computer Science and Software Engineering, The University of Melbourne, Victoria 3010, Australia.


Status

Proc. GeoCLEF Workshop on Geo-Spatial IR, Alicante, Spain, September 2006.

Abstract

We report on the experiments undertaken by the NICTA I2D2 Group as part of GeoCLEF 2006. We experimented with geographic-based query expansion, using a gazetteer to extend geospatial terms to ``nearby'' locations, and included sublocations. The processing pipeline of the geographic information retrieval system included: a named entity recognition system for identifying location names; a toponym resolution component for assigning probabilistic likelihoods to geographic candidates obtained from a gazetteer (the Getty Thesaurus); and a probabilistic approach to Geographic Information Retrieval. We experimented with approaches involving expanding location names in both documents and queries. We used a normalization process to adjust term weights to ensure that geographic terms added to a query do not overwhelm the contribution of non-geographic query terms. We submitted five runs to the English-only GeoCLEF monolingual task, ranging from a baseline task of text-only retrieval based on topic title and description, to queries expanded using gazetteer-based toponym resolution. Our submitted runs showed little improvement for GIR runs over the baseline run. A refinement to the normalization process (post-submission) resulted in GIR runs showing 6.57% and 5.84% improvement over the baseline in overall MAP.

Full text