Text Processing of Social Media
This series of guest lectures was presented by Tim Baldwin at the University of Saarland in the
summer of 2014, as part of Language Technology II. It revolves largely (and
selfishly) around his personal research (with various collaborators) on social
media, but includes various pointers to prominent research in each topic
area. The slides, along with pointers to papers each lecture builds off, are
provided below:
- Lecture 1: Overview, Language Content Analysis, Language
Identification
- Lecture slides
- Papers:
- Baldwin, Timothy, Paul Cook, Marco Lui, Andrew MacKinlay and Li Wang
(2013) How Noisy Social
Media Text, How Diffrnt Social Media Sources?, In Proceedings of
the 6th International Joint Conference on Natural Language Processing
(IJCNLP 2013), Nagoya, Japan, pp. 356—364.
- Lui, Marco and Timothy Baldwin (2014) Accurate Language
Identification of Twitter Messages, In Proceedings of the EACL
2014 Workshop on Language Analysis in Social Media (LASM 2014),
Gothenburg, Sweden, pp. 17—25.
- Lui, Marco and Timothy Baldwin (2012) langid.py: An
Off-the-shelf Language Identification Tool, In Proceedings of the
50th Annual Meeting of the Association for Computational Linguistics (ACL
2012), Demo Session, Jeju, Republic of Korea, pp. 25—30.
- Lui, Marco and Timothy Baldwin (2011) Cross-domain Feature
Selection for Language Identification, In Proceedings of the Fifth
International Joint Conference on Natural Language Processing (IJCNLP
2011), Chiang Mai, Thailand, pp. 553—561.
- Lecture 2: lexical normalisation, user geolocation
- Lecture slides
- Papers:
- Han, Bo and Timothy Baldwin (2011) Lexical Normalisation of
Short Text Messages: Makn Sens a #twitter, In Proceedings of the
49th Annual Meeting of the Association for Computational Linguistics:
Human Language Technologies (ACL HLT 2011), Portland, USA,
pp. 368—378.
- Han, Bo, Paul Cook and Timothy Baldwin (2012) Automatically Constructing
a Normalisation Dictionary for Microblogs, In Proceedings of
EMNLP-CoNLL 2012, Jeju, Republic of Korea, pp. 421—432.
- Han, Bo, Paul Cook and Timothy Baldwin (2013) Lexical Normalisation of Short Text
Messages, ACM Transactions on Intelligent Systems and
Technology 4(1), pp. 5:1—5:27.
- Han, Bo, Paul Cook and Timothy Baldwin (2013) unimelb: Spanish Text Normalisation, In
Proceedings
of the Tweet Normalization Workshop at SEPLN 2013 (Tweet-norm),
Madrid, Spain, pp. 67—71.
- Han, Bo, Paul Cook and Timothy Baldwin (2013) A Stacking-based Approach
to Twitter User Geolocation Prediction, In Proceedings of the 51st
Annual Meeting of the Association for Computational Linguistics (ACL
2013), Demo Session, Sofia, Bulgaria, pp. 7—12.
- Han, Bo, Paul Cook and Timothy Baldwin (2014) Text-based
Twitter User Geolocation Prediction, Journal of Artificial
Intelligence Research 49, pp. 451—500.
- Lecture 3: user geolocation (cont.), Twitter POS Tagging, Semantic and
Discourse Analysis of Social Media, Restrictions and Ethics of Social Media
Usage
- Lecture slides
- Papers:
- Wang, Li, Marco Lui, Su Nam Kim, Joakim Nivre and Timothy Baldwin
(2011) Predicting
Thread Discourse Structure over Technical Web Forums, In
Proceedings of the 2011 Conference on Empirical Methods in Natural
Language Processing (EMNLP 2011), Edinburgh, UK,
pp. 13—25.
- Gella, Spandana, Paul Cook and Timothy Baldwin (2014) One Sense per Tweeter
... and Other Lexical Semantic Tales of Twitter, In Proceedings of
the 14th Conference of the European Chapter of the Association for
Computational Linguistics (EACL 2014), volume 2: Short Papers,
Gothenburg, Sweden, pp. 215—220.
- Wang, Li, Su Nam Kim and Timothy Baldwin (to appear) The Utility of Discourse Structure in Forum
Thread Retrieval, In Proceedings of the Ninth Asian Information
Retrieval Societies Conference (AIRS 2013), Singapore,
pp. 284—295.
Tim Baldwin
Last modified: Thu Jul 17 14:18:31 CEST 2014