ALTA 2018

16th Annual Workshop of
The Australasian Language Technology Association

The University of Otago, Dunedin, New Zealand

10th - 12th December 2018

ALTA 2018 Programme

Except where otherwise noted, the workshop venue is the Otago Business School building, on the corner of Clyde St and Union St. Registration desk is outside room 1.19. Long papers and presentations are 15+5 minutes and short papers are 5+5 minutes.

10th December 2018 (Monday) Tutorial
13:00	Registration
Tutorial Session (Presenter: Professor Phil Cohen, Monash University) *Room 1.19*
14:00	Tutorial: Towards Collaborative Dialogue
17:00	End of Tutorial
18:00	Emerson's brewery tour: Meet in the restaurant (the Tap room) at 70 Anzac Ave.
11th December 2018 (Tuesday) Day 1
8:30	Registration
9:00	Opening *Room 1.17*
Keynote 1 (Chair: Andrew Trotman) *Room 1.17*
9:15	Jon Degenhardt, eBay	An Industry Perspective on Search and Search Applications
10:15	Morning Tea
Session A: text mining and applications (Chair: Shunichi Ishihara) *Room 1.19*
10:45	Rolando Coto Solano, Sally Akevai Nicholas and Samantha Wray	Development of Natural Language Processing Tools for Cook Islands Māori
11:05	Bayzid Ashik Hossain and Rolf Schwitter	Specifying Conceptual Models Using Restricted Natural Language
11:25	Jenny McDonald and Adon Moskal	Quantext: a text analysis tool for teachers
11:45	Xuanli He, Quan Tran, William Havard, Laurent Besacier, Ingrid Zukerman and Gholamreza Haffari	Exploring Textual and Speech information in Dialogue Act Classification with Speaker Domain Adaptation
12:05	Lunch
Keynote 2 (Chair: Xiuzhen Jenny Zhang) *Room 1.17*
13:00	Associate Professor Alistair Knott, University of Otago	Learning to talk like a baby
14:15	Poster Session 1 *Room G34 of the Owheo Building, 133 Union St*
15:15	Afternoon Tea
15:45	Poster Session 2 *Room G34 of the Owheo Building, 133 Union St*
16:50	End of Day 1
19:00	Hāngi (Māori) Dinner: The Otago University Staff Club
12th December 2018 (Wednesday) Day 2
Keynote 3 (Chair: Bevan Koopman) *Room 1.17*
9:00	Professor David Bainbridge, University of Waikato	Can You Really Do That? Exploring new ways to interact with Web content and the desktop
10:15	Morning Tea
Session B: machine translation and speech (Chair: Stephen Wan) *Room 1.19*
10:45	Cong Duy Vu Hoang, Gholamreza Haffari and Trevor Cohn	Improved Neural Machine Translation using Side Information
11:05	Qiongkai Xu, Lizhen Qu and Jiawei Wang	Decoupling Stylistic Language Generation
11:25	Satoru Tsuge and Shunichi Ishihara	Text-dependent Forensic Voice Comparison: Likelihood Ratio Estimation with the Hidden Markov Model (HMM) and Gaussian Mixture Model – Universal Background Model (GMM-UBM) Approaches
11:45	Nitika Mathur, Timothy Baldwin and Trevor Cohn	Towards Efficient Machine Translation Evaluation by Modelling Annotators
11:55	Lunch
Keynote 4 (Chair: Tim Baldwin) *Room 1.17*
12:55	Dr. Kristin Stock, Massey University	"Where am I, and what am I doing here?" Extracting geographic information from natural language text
13:55	Break 10 minutes
Session C: shared with ADCS (Chair: Christopher Jones) *Room 1.17*
14:05	Alfan Farizki Wicaksono and Alistair Moffat	Exploring Interaction Patterns in Job Search
14:30	Xavier Holt and Andrew Chisholm	Extracting structured data from invoices
14:50	Bevan Koopman, Anthony Nguyen, Danica Cossio, Mary-Jane Courage and Gary Francois	Extracting Cancer Mortality Statistics from Free-text Death Certificates: A View from the Trenches
15:05	Hanieh Poostchi and Massimo Piccardi	Cluster Labeling by Word Embeddings and WordNet's Hypernymy
15:15	Afternoon Tea
Session D: word semantics (Chair: Trevor Cohn) *Room 1.19*
15:35	Lance De Vine, Shlomo Geva and Peter Bruza	Unsupervised Mining of Analogical Frames by Constraint Satisfaction
15:55	Navnita Nandakumar, Bahar Salehi and Timothy Baldwin	A Comparative Study of Embedding Models in Predicting the Compositionality of Multiword Expressions
Shared Task Session (Chair: Diego Mollá-Aliod) *Room 1.19*
16:05	Diego Mollá-Aliod and Dilesha Seneviratne	Overview of the 2018 ALTA Shared Task: Classifying Patent Applications
	Fernando Benites, Shervin Malmasi, Marcos Zampieri	Classifying Patent Applications with Ensemble Methods
	Jason Hepburn	Universal Language Model Fine-tuning for Patent Classification
16:35	Break 10 minutes
16:45	Best Paper Awards
16:55	Business Meeting
	Closing
17:20	End of Day 2
19:30	Boat Trip: Boat trip on the Monarch. Meet at 20 Fryatt Street at 7:30pm.

Title: Learning to talk like a baby

Speaker: Associate Professor Alistair Knott, University of Otago / Soul Machines

Alistair Knott

Alistair Knott is an Associate Professor in the Computer Science department at the University of Otago. He studied Psychology and Philosophy at Oxford University, then took a MSc and PhD in Artificial Intelligence at the University of Edinburgh. His PhD research was on theories of discourse structure, focussing on how coherence relations are signalled by sentence and clause connectives. His postdoc work was in text generation, on Edinburgh University’s ILEX project, which developed one of the first text generators to be deployed on the web. After moving to New Zealand, Ali developed an interest in dialogue models, building a mixed-initiative multi-speaker dialogue system that combined HPSG and Discourse Representation Theory. Aside from these topics, Ali's main research interest for the last 20 years has been in models of how language is implemented in the brain. His focus is on models of the interface between language and the sensorimotor system, that address how it is we can talk about the things we see and do. In 2012 he published a programmatic model of this interface (‘Sensorimotor Cognition and Natural Language Syntax’, MIT Press). This model proposes that certain elements of syntactic structure have their origin in the structure of the sensorimotor routines involved in perceiving events in the world, and in the structure of the circuits which store these events in working memory. In 2017, Ali began working on a commercial contract with the Auckland-based AI startup Soul Machines. This company creates biologically realistic avatars that can engage in dialogues with human users. There is an emphasis on modelling dialogue agents’ physical bodies and sensory systems, and how these interface with actual brain mechanisms.. which makes it an ideal environment for Ali. Ali also works on the ethical and social implications of AI. In January 2016 co-founded the AI and Society discussion group at Otago University. This year he co-founded Otago’s Centre for AI and Public Policy, which is actively engaging with the New Zealand government to provide oversight of the predictive analytics tools used by government departments.

Abstract:

In recent years, computational linguists have embraced neural network models, and the vector-based representations of words and meanings they use. But while computational linguists have readily adopted the machinery of neural network models, they have been slower to embrace the original aim of neural network research, which was to understand how brains work. A large community of neural network researchers continues to pursue this ‘cognitive modelling’ aim, with very interesting results. But the work of these more cognitively minded modellers has not yet percolated deeply into computational linguistics. In my talk, I will argue the cognitive modelling tradition of neural networks has much to offer computational linguistics. I will outline a research programme that situates language modelling in a broader cognitive context. The programme is distinctive in two ways. Firstly, the initial object of study is a baby, rather than an adult. Computational linguistics models typically aim to reproduce adult linguistic competence in a single training process, that presents an ‘empty’ network with a corpus of mature language. I’ll argue that this training process doesn’t correspond to anything in human experience, and that we should instead aim to model a more gradual developmental process, that first achieves babylike language, then childlike language, and so on. Secondly, the new programme studies the baby's language system as it interfaces with her other cognitive systems, rather than by itself. It pays particular attention to the sensory and motor systems through which a baby engages with the physical world, which are the primary means by which it activates semantic representations. I’ll argue that the structure of these sensorimotor systems, as expressed in neural network models, offer interesting insights about certain aspects of linguistic structure. I will conclude by demoing a model of the interface between language and the sensorimotor system, as it operates in a baby at an early stage of language learning.

Title: "Where am I, and what am I doing here?" Extracting geographic information from natural language text

Speaker: Dr Kristin Stock, Massey University

Kristin Stock

Dr Kristin Stock is Director of the Massey Geoinformatics Collaboratory, and a Senior Lecturer in Information Technology. She has 25 years’ experience in geospatial information management in the private, public and University sectors, has led a number of large international geospatial projects in Europe, Australia and New Zealand and played a key role in Europe-wide data sharing projects such as INSPIRE and EuroGEOSS. Her research focuses on geospatial natural language in collaboration with researchers in the Europe and Australia, most specifically on the development of methods for the extracting location information from text in order to map objects and events that cannot otherwise be located. Dr Stock was recently AI on a $2.74m MBIE Research Programme grant to develop a Maori land classification system, as well as receiving grants from MBIE (Our Land and Water National Science Challenge), the European Union FP7 programme and numerous industry-funders.

Abstract:

The extraction of place names (toponyms) from natural language text has received a lot of attention in recent years, but location is frequently described in more complex ways, often using other objects as reference points. Examples include: ‘The accident occurred opposite the Orewa Post Office, near the pedestrian crossing’ or ‘the sample was collected on the west bank of the Waikato River, about 3km upstream from Huntly’. These expressions can be vague, imprecise, underspecified, rely on access to information about other objects in the environment, and the semantics of spatial relations like ‘opposite’ and ‘on’ are still far from clear. Furthermore, many of these kinds of expressions are context sensitive, and aspects such as scale, geometry and type of geographic feature may influence the way the expression is understood. Both machine learning and rule-based approaches have been developed to try to firstly parse expressions of this kind, and secondly to determine the geographic location that the expression refers to. Several relevant projects will be discussed, including the development of a semantic rather than syntactic approach to parsing geographic location descriptions; the creation of a manually annotated training set of geographic language; the challenges highlighted from human descriptions of location in the emergency services context; the interpretation and geocoding of descriptions of flora and fauna specimen collections; the development of models of spatial relations using social media data and the use of instance-based learning to interpret complex location descriptions.

ALTA 2018 Accepted Papers

Long Papers

Improved Neural Machine Translation using Side Information. Cong Duy Vu Hoang, Gholamreza Haffari and Trevor Cohn
Text-dependent Forensic Voice Comparison: Likelihood Ratio Estimation with the Hidden Markov Model (HMM) and Gaussian Mixture Model. Satoru Tsuge and Shunichi Ishihara
Development of Natural Language Processing Tools for Cook Islands Māori. Rolando Coto Solano, Sally Akevai Nicholas and Samantha Wray
Unsupervised Mining of Analogical Frames by Constraint Satisfaction. Lance De Vine, Shlomo Geva and Peter Bruza
Specifying Conceptual Models Using Restricted Natural Language. Bayzid Ashik Hossain and Rolf Schwitter
Extracting structured data from invoices. Xavier Holt and Andrew Chisholm

Short Papers

Exploring Textual and Speech information in Dialogue Act Classification with Speaker Domain Adaptation. Xuanli He, Quan Tran, William Havard, Laurent Besacier, Ingrid Zukerman and Gholamreza Haffari
Cluster Labeling by Word Embeddings and WordNet's Hypernymy. Hanieh Poostchi and Massimo Piccardi
A Comparative Study of Embedding Models in Predicting the Compositionality of Multiword Expressions. Navnita Nandakumar, Bahar Salehi and Timothy Baldwin
Towards Efficient Machine Translation Evaluation by Modelling Annotators. Nitika Mathur, Timothy Baldwin and Trevor Cohn

16th Annual Workshop of The Australasian Language Technology Association

The University of Otago, Dunedin, New Zealand

10th - 12th December 2018

ALTA 2018 Programme

10th December 2018 (Monday) Tutorial

11th December 2018 (Tuesday) Day 1

12th December 2018 (Wednesday) Day 2

Invited Keynotes

Title: Learning to talk like a baby

Speaker: Associate Professor Alistair Knott, University of Otago / Soul Machines

Abstract:

Title: "Where am I, and what am I doing here?" Extracting geographic information from natural language text

Speaker: Dr Kristin Stock, Massey University

Abstract:

ALTA 2018 Accepted Papers

Long Papers

Short Papers

16th Annual Workshop of
The Australasian Language Technology Association