C. Paul Cook

This page is no longer being maintained (as of 11 July 2014). Please see my University of New Brunswick webpage instead.

Honorary Fellow, Language Technology Group, Department of Computing and Information Systems, The University of Melbourne. From June 2011 to May 2014 I was a McKenzie Postdoctoral Fellow.

Contact info

Paul Cook
Department of Computing and Information Systems
The University of Melbourne
Victoria 3010, Australia
email: paulcook at unimelb dot edu dot au
Twitter: @cpaulcook

News

I've accepted a tenure-track position in the Faculty of Computer Science at the University of New Brunswick starting 1 July.

Publications

My Google scholar profile

Forthcoming

Paul Cook, Jey Han Lau, Diana McCarthy and Timothy Baldwin. To appear. Novel word-sense identification. To appear in Proceedings of the 25th International Conference on Computational Linguistics (COLING 2014). Dublin, Ireland.

Paul Cook, Michael Rundell, Jey Han Lau and Timothy Baldwin. To appear. Applying a Word-sense Induction System to the Automatic Extraction of Diverse Dictionary Examples. To appear in Proceedings of the XVI EURALEX International Congress (EURALEX 2014). Bolzano, Italy.

Jey Han Lau, Paul Cook, Diana McCarthy, Spandana Gella and Timothy Baldwin. To appear. Learning Word Sense Distributions, Detecting Unattested Senses and Identifying Novel Senses Using Topic Models. To appear in Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL 2014). Baltimore, Maryland. Preprint

2014

Spandana Gella, Paul Cook and Timothy Baldwin. 2014. One Sense per Tweeter ... and Other Lexical Semantic Tales of Twitter. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2014), volume 2: Short Papers, pages 215–220. Gothenburg, Sweden.

Bahar Salehi, Paul Cook and Timothy Baldwin. 2014. Using Distributional Similarity of Multi-way Translations to Predict Multiword Expression Compositionality. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2014), pages 472–481. Gothenburg, Sweden.

Bo Han, Paul Cook and Timothy Baldwin. 2014. Text-based Twitter User Geolocation Prediction. Journal of Artificial Intelligence Research, 49:451–500.

2013

Marco Lui and Paul Cook. 2013. Classifying English Documents by National Dialect. In Proceedings of the Australasian Language Technology Association Workshop 2013 (ALTA 2013), pages 5–15. Brisbane, Australia.

Paul Cook, Jey Han Lau, Michael Rundell, Diana McCarthy and Timothy Baldwin. 2013. A lexicographic appraisal of an automatic approach for detecting new word-senses. In Electronic lexicography in the 21st century: thinking outside the paper. Proceedings of the eLex 2013 conference, pages 49–65. Tallinn, Estonia.

Timothy Baldwin, Paul Cook, Marco Lui, Andrew MacKinlay and Li Wang. 2013. How Noisy Social Media Text, How Diffrnt Social Media Sources? In Proceedings of the 6th International Joint Conference on Natural Language Processing (IJCNLP 2013), pages 356–364. Nagoya, Japan.

Long Duong, Paul Cook, Steven Bird and Pavel Pecina. 2013. Increasing the quality and quantity of source language data for unsupervised cross-lingual POS tagging. In Proceedings of the 6th International Joint Conference on Natural Language Processing (IJCNLP 2013), pages 1243–1249. Nagoya, Japan.

Bo Han, Paul Cook and Timothy Baldwin. 2013. unimelb: Spanish Text Normalisation. In Proceedings of the Tweet Normalization Workshop at SEPLN 2013 (Tweet-norm), pages 67–71. Madrid, Spain.

Long Duong, Paul Cook, Steven Bird and Pavel Pecina. 2013. Simpler unsupervised POS tagging with bilingual projections. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 634–639. Sofia, Bulgaria.

Bo Han, Paul Cook and Timothy Baldwin. 2013. A Stacking-based Approach to Twitter User Geolocation Prediction. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 7–12. Sofia, Bulgaria.

Bahar Salehi and Paul Cook. 2013. Predicting the Compositionality of Multiword Expressions Using Translations in Multiple Languages. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity, pages 266–275. Atlanta, Georgia.

Spandana Gella, Paul Cook and Bo Han. 2013. Unsupervised Word Usage Similarity in Social Media Texts. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity, pages 248–253. Atlanta, Georgia. Best short paper

Spandana Gella, Bahar Salehi, Marco Lui, Karl Grieser, Paul Cook and Timothy Baldwin. 2013. UniMelb_NLP-CORE: Integrating predictions from multiple domains and feature sets for estimating semantic textual similarity. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity, pages 207–215. Atlanta, Georgia.

Jey Han Lau, Paul Cook and Timothy Baldwin. 2013. unimelb: Topic Modelling-based Word Sense Induction for Web Snippet Clustering. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), pages 217–221. Atlanta, Georgia.

Jey Han Lau, Paul Cook and Timothy Baldwin. 2013. unimelb: Topic Modelling-based Word Sense Induction. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), pages 307–311. Atlanta, Georgia.

Paul Cook and Graeme Hirst. 2013. Automatically Assessing Whether a Text is Clichéd, with Applications to Literary Analysis. In Proceedings of the 9th Workshop on Multiword Expressions (MWE 2013), pages 52–57. Atlanta, Georgia.

Bo Han, Paul Cook and Timothy Baldwin. 2013. Lexical Normalisation of Short Text Messages. ACM Transactions on Intelligent Systems and Technology 4(1), pages 5:1–5:27.

2012

Bo Han, Paul Cook and Timothy Baldwin. 2012. Geolocation Prediction in Social Media Data by Finding Location Indicative Words. In Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012), pages 1045–1062. Mumbai, India.

Paul Cook and Marco Lui. langid.py for better language modelling. 2012. In Proceedings of the Australasian Language Technology Association Workshop 2012 (ALTA 2012), pages 107–112. Dunedin, New Zealand. .pdf code

Paul Cook and Scott Nowson, editors. 2012. Proceedings of the Australasian Language Technology Association Workshop 2012 (ALTA 2012). Dunedin, New Zealand. .pdf

Paul Cook. 2012. Using social media to find English lexical blends. In Proceedings of the 15th EURALEX International Congress (EURALEX 2012), pages 846–854. Oslo, Norway. .pdf code Sample output (Tweets show us how people write, so there's lots of obscenity in the data; you've been warned.)
I wrote a post for the MacMillan Dictionary Blog on this work.

Bo Han, Paul Cook and Timothy Baldwin. 2012. Automatically Constructing a Normalisation Dictionary for Microblogs. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and Natural Language Learning (EMNLP-CoNLL 2012), pages 421–432. Jeju, Korea. .pdf
The normalisation dictionary we produced in this paper is available.

Paul Cook and Graeme Hirst. 2012. Do Web corpora from top-level domains represent national varieties of English? In Actes des 11es Journées Internationales d'Analyse Statistique des Données Textuelles / Proceedings of the 11th International Conference on Textual Data Statistical Analysis, pages 281–293. Liège, Belgium. .pdf

Jey Han Lau, Paul Cook, Diana McCarthy, David Newman and Timothy Baldwin. 2012. Word Sense Induction for Novel Sense Detection. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2012), pages 591–601. Avignon, France. .pdf

Timothy Baldwin, Paul Cook, Bo Han, Aaron Harwood, Shanika Karunasekera and Masud Moshtaghi. 2012. A Support Platform for Event Detection using Social Intelligence. In Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics, pages 69–72. Avignon, France. .pdf

2011

Paul Cook and Graeme Hirst. 2011. Automatic identification of words with novel but infrequent senses. In Proceedings of the 25th Pacific Asia Conference on Language Information and Computation (PACLIC 25), pages 265–274. Singapore. .pdf

Paul Cook. 2011. Book review of  “A Way with Words: Recent advances in lexical theory and analysis: A Festschrift for Patrick Hanks.” Gilles-Maurice de Schryver (editor). Computational Linguistics 37(2):403–406. .pdf

2010

Paul Cook. 2010. Exploiting linguistic knowledge to infer properties of neologisms. Ph.D. thesis, University of Toronto, November. .pdf

Paul Cook and Suzanne Stevenson. 2010. No sentence is too confusing to ignore. In Proceedings of the ACL 2010 Workshop on NLP and Linguistics: Finding the Common Ground, pages 61–69. Uppsala, Sweden. .pdf

Paul Cook and Anna Feldman, editors. 2010. Proceedings of the NAACL HLT 2010 Second Workshop on Computational Approaches to Linguistic Creativity (CALC-10). Los Angeles, California. .pdf

Paul Cook and Suzanne Stevenson. 2010. Automatically identifying changes in the semantic orientation of words. In Proceedings of the 7th International Conference on Language Resources and Evaluation, pages 28–34. Valletta, Malta. .pdf

Paul Cook and Suzanne Stevenson. 2010. Automatically identifying the source words of lexical blends in English. Computational Linguistics. 36(1):129–149. .pdf
The dataset of blends used in this study is available. Please contact me if you're interested.

2009

Paul Cook and Suzanne Stevenson. 2009. An unsupervised model for text message normalization. In Proceedings of the NAACL HLT 2009 Workshop on Computational Approaches to Linguistic Creativity, pages 71–78. Boulder, Colorado. .pdf

Afsaneh Fazly, Paul Cook and Suzanne Stevenson. 2009. Unsupervised type and token identification of idiomatic expressions. Computational Linguistics 35(1):61–103. .pdf

2008

Paul Cook, Afsaneh Fazly and Suzanne Stevenson. 2008. The VNC-Tokens Dataset. In Proceedings of the LREC Workshop: Towards a Shared Task for Multiword Expressions (MWE 2008), pages 19–22. Marrakech, Morocco. .pdf
The VNC-Tokens dataset (also available from the Multiword Expressions Web)

2007

Paul Cook and Suzanne Stevenson. 2007. Automagically inferring the source words of lexical blends. In Proceedings of the 10th Conference of the Pacific Association for Computational Linguistics (PACLING 2007), pages 289–297. Melbourne, Australia. .pdf

Paul Cook, Afsaneh Fazly and Suzanne Stevenson. 2007. Pulling their weight: Exploiting syntactic forms for the automatic identification of idiomatic expressions in context. In Proceedings of the ACL Workshop on A Broader Perspective on Multiword Expressions (MWE 2007), pages 41–48. Prague, Czech Republic. .pdf

2006

Paul Cook. 2006. Automatically Classifying English Verb-Particle Constructions by Particle Semantics. M.Sc. thesis, University of Toronto, August. .pdf

Paul Cook and Suzanne Stevenson. 2006. Classifying particle semantics in English verb-particle constructions. In Proceedings of the ACL/COLING Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties (MWE 2006), pages 45–53. Sydney, Australia. .pdf

Places to spot Paul when he's not in his office

At home in Brunswick

Hanging out with Hannah

Valid XHTML 1.0 Strict