[Main Page]

Data61-UoM Summer Scholarships 2016

Overview

As part of Data61's Summer Scholarships program, Data61 is offering five summer scholarships to University of Melbourne Department of Computing and Information Systems (CIS) students for the 2016/17 summer, valued at $5,000 each.

Eligibility

Scholarships are open to:

Duration

Scholarships will run for three months between November 2016 and February 2017.

Applying

To apply, please consult the list of topics below. Then email toby.murray@unimelb.edu.au by Tuesday November 1st 2016, 11:59pm and include the following:

  1. CV
  2. Statement of results or academic record
  3. Brief statement outlining why you have chosen a particular project, what skills or knowledge you will bring to it, and what benefits it will provide.

Closing Date

Applications close: Tuesday November 1st 2016, 11:59pm

Projects

Your application must nominate a project you wish to work on from the following.


Topic 1: Hybridising count-based and deep learning for accurate and robust language modelling

UoM Supervisor: Trevor Cohn

Description: Many of the most commonly used machine models in NLP are simple based on simple count-based frequency estimates, for instance language models which storage of n-grams and their counts. This simplicity means that they are simple to apply, fast to run, and, with some care, can be scaled to massive datasets. However they do not generalise well between different words that have similar meanings, treating words (or characters) as atomic units.

In contrast, deep learning (neural) models learn to generalise between words in the lexicon and having a less rigid notion of context, and have shown to be more accurate that count based techniques. On the flip-side, deep learning models are very slow to train and don’t scale up to massive datasets without resorting to coarse approximations and considerable engineering effort. This project aims to develop a hybrid method that gets the best from both technique, using a count-based approach for its speed and coverage, and then developing a deep learning method to further refine this solution and account for its short-comings. This has the potential to improve accuracy beyond either model alone, and greatly decrease the training times.

Specifically we plan to build on an infinite order count-based language model using compressed suffix trees developed by Cohn & colleagues in 2015 & 2016 that provides near optimal compression while supporting efficient search. This project will couple this into a neural architecture, and empirically compare its performance with commonly used count-based and neural models.


Topic 2: Exploring UBER's Market Entry in China: A Longitudinal Computational Analysis of Social Media Data

UoM Supervisor: Christoph Breidbach

Description: This study will examine how UBER, a US-based ride-sharing company, entered the Chinese transportation market. Exploring service in global environments is crucial, because these interactions need to be customized and adapted to local cultural backgrounds (Ostrom, Parasuraman, Bowen, Patrício, & Voss, 2015). However, empirical insights into this process are, especially in the Chinese context, unavailable to date. This lack of knowledge represents a significant challenge to service firms attempting to enter the Chinese market, and this project therefore aims to provide a customer-centric perspective on how service experiences transform over time when a foreign provider (Uber) enters a market that is dominated by a local incumbent (Didi Taxi).

The study will be based on a longitudinal analysis of Chinese social media data, which will be gathered from Sina Weibo (China’s most popular local social media site). The netnography research method is adopted to conduct this empirical study. The key advantage of using netnography is less time consuming and less cost and fast data collection (Bertilsson, 2014, p. 137; V. Kozinets, 2009, p. 104). Additionally, using this method also leads to unobtrusive observation and identifying true and naturalistic inputs from customers (Bertilsson, 2014, p. 142)

We plan to collect Sina Weibo posts for the period from July 2013 (one month before Uber enters China market) to July 2016 (when Uber announced its exit) for several major cities (considering Uber currently operates in 16 cities). The data collection process will involve using python script to gather post data from Sina Weibo. Then, a manual data processing will be performed to ensure the data quality after collecting the data.


Topic 3: Exploiting the User in High-Security Systems

UoM Supervisor: Toby Murray

Description: High-assurance, cross domain systems protect highly sensitive information from well-resourced attackers, while processing it alongside public information. They inherently trust their users to make correct security decisions: a user who is tricked into typing their classified password into an unclassified window can catastrophically compromise security.

In this project, you will develop a suite of attack scenarios to be implemented as custom Windows applications that attempt to fool users into unintentionally leaking sensitive information, to serve as foundation for future empirical evaluations or for building formal (i.e. mathematical) models of user behaviour under different threat models.


Topic 4: Real-World Concurrent Information Flow Security

UoM Supervisor: Toby Murray

Description: Recent work has produced the world's first theory for proving that concurrent programs do not leak information, i.e. are information flow secure. However, so far it is restricted to systems with very simple security policies. In this project, you will extend that theory to support more diverse security policies. A simple first step will be to accommodate multiple security classifications in the theory, which will serve as an ideal introduction to the Isabelle/HOL proof assistant, in which the theory has been formalised. Later steps will include supporting declassification, a necessary ingredient for verifying many real-world systems.


Topic 5: Side-Channel Testing

UoM Supervisor: Toby Murray

Description: Side-channels are a common form of vulnerability in cryptographic code that can allow attackers to infer secret keys and other sensitive material to compromise system security. So far side-channels have been diagnosed manually or by applying formal verification techniques, which have limited scalability and require specialist expertise. This project will apply state-of-the-art testing techniques (fuzz testing and dynamic symbolic execution) to attempt to systematically identify and characterise side-channels in cryptographic code. This work will form a foundation that would be ideal to pursue via further research as part of a Masters of PhD.